Machine Learning Curriculum
Machine Learning is a branch of Artificial Intelligence dedicated at making
machines learn from observational data without being explicitly programmed.
Machine learning and AI are not the same. Machine learning is an instrument in
the AI symphony — a component of AI. So what is Machine Learning — or ML —
exactly? It’s the ability for an algorithm to learn from prior data in order
to produce a behavior. ML is teaching machines to make decisions in situations
they have never seen.
This curriculum is made to guide you to learn machine learning, recommend tools, and help you to embrace ML lifestyle by suggesting media to follow.
I update it regularly to maintain freshness and get rid of outdated content and deprecated tools.
Machine Learning in General
Study this section to understand fundamental concepts and develop intuitions before going any deeper.
A computer program is said to learn from experience
E with respect to some
class of tasks
T and performance measure
P if its performance at tasks in
T, as measured by
P, improves with experience
Building a machine that senses the environment and then chooses the best policy
(action) to do at any given state to maximize its expected long-term scalar
reward is the goal of reinforcement learning.
Deep learning is a branch of machine learning where deep artificial neural
networks (DNN) — algorithms inspired by the way neurons work in the brain — find
patterns in raw data by combining multiple layers of artificial neurons. As the
layers increase, so does the neural network’s ability to learn increasingly
The simplest kind of DNN is
a Multilayer Perceptron
Full Stack Deep Learning Learn Production-Level Deep Learning from Top Practitioners
DeepLearning.ai a bunch of courses taught by Andrew Ng at Coursera; It's the sequel of Machine Learning course at Coursera.
Advanced Machine Learning Specialization consists of 7 courses on Coursera
- A friendly introduction to Deep Learning and Neural Networks
A Neural Network Playground Tinker with a simple neural network designed to help you visualize the learning process
Deep Learning Demystified - Youtube explain inspiration of deep learning from real neurons to artificial neural networks
Learn TensorFlow and deep learning, without a Ph.D. This 3-hour course (video + slides) offers developers a quick introduction to deep-learning fundamentals, with some TensorFlow thrown into the bargain.
A Guide to Deep Learning by YN^2 a curated maths guide to Deep Learning
Practical Deep Learning For Coders Course at Fast.ai taught by Jeremy Howard (Kaggle's #1 competitor 2 years running, and founder of Enlitic)
Deep learning - Udacity recommended for visual learner who knows some ML, this course provides high level ideas of deep learning, dense intuitive details put in a short amount of time, you will use TensorFlow inside the course
- Deep Learning Resources (Papers, Online Courses, Books) - deeplearning4j.org
- Introduction to Deep Neural Networks - deeplearning4j.org
NVIDIA Deep Learning Institute because GPU are efficient at training Neural Networks, NVIDIA notices this market !
Deep Learning Book recommended for math
nerds who want to understand the theoretical side, the book is crafted by our
deep learning wizards (Goodfellow, Bengio and Courville)
- CS224d: Deep Learning for Natural Language Processing
- Deep Learning Summer School, Montreal 2015
- Neural networks class - YouTube Playlist
http://neuralnetworksanddeeplearning.com/index.html a hands-on online book for deep learning maths intuition, I can say that after you finish this, you will be able to explain deep learning in a fine detail.
https://www.kadenze.com/courses/creative-applications-of-deep-learning-with-tensorflow-i You will implement a lot of things inside TensorFlow such as Autoencoders, Convolutional neural net, Feedforward neural nets, Generative models (Generative Adversarial Networks, Recurrent networks), visualizing the network, etc. You will have lots of assignments to finish. The course director (Parag) is also approachable and active.
6.S094: Deep Learning for Self-Driving Cars a course at MIT
The Neural Network Zoo a bunch of neural network models that you should know about (I know about half of them so don't worry that you don't know many because most of them are not popular or useful in the present)
6.S191: Introduction to Deep Learning a course for 2017
The GAN Zoo a list of GAN papers which have their own name
https://deeplearning.mit.edu/ MIT Deep Learning taught by Lex Fridman
Intro to TensorFlow for Deep Learning taught at Udacity
Advancing AI theory with a first-principles understanding of deep neural networks we use deep learning successfully for quite a long time but we don't know exactly why it works and how to really improve it from the ground up. This blog post contains a link to the paper that explains the first-principles attempt of understanding neural networks so that we can advance deep learning field further.
Convolutional Neural Networks
DNNs that work with grid data like sound waveforms, images and videos better
than ordinary DNNs. They are based on the assumptions that nearby input units
are more related than the distant units. They also utilize translation
invariance. For example, given an image, it might be useful to detect the same
kind of edges everywhere on the image.
They are sometimes called convnets or CNNs.
Recurrent Neural Networks
DNNs that have states. They also understand sequences that vary in length.
They are sometimes called RNNs.
Unsupervised Domain Adaptation
Unsupervised Domain Adaptation is a type of Transfer Learning that applies a model that was trained on source dataset to do well on a target dataset without any label on the target dataset. It's one of the technique that is practically useful in the real world when the cost of labeling target dataset is high. One of the example is to train a model on synthetic data with label and try to use it on real data without label.
Libraries and frameworks that are useful for practical machine learning
Machine learning building blocks
scikit-learn (Python) general machine learning library, high level abstraction, geared towards beginners
TensorFlow (Python); Awesome TensorFlow; computation graph framework built by Google, has nice visualization board, probably the most popular framework nowadays for doing Deep Learning
- Keras: Deep Learning library for Theano and TensorFlow (Python)
PyTorch (Python) PyTorch is a deep learning framework that puts Python first.
Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix symbolic and imperative programming to maximize efficiency and productivity.
Chainer (Python) A flexible framework of neural networks for deep learning
DeepLearning4j (Java) Model import deployment framework for retraining models (pytorch, tensorflow,keras) deploying in JVM Micro service environments, mobile devices, iot, and Apache Spark
Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning. There is a specific focus on reinforcement learning with several contextual bandit algorithms implemented and the online nature lending to the problem well.
H2O is an in-memory platform for distributed, scalable machine learning.
spektral Graph Neural Networks with Keras and Tensorflow 2.
Lobe a drag-and-drop tool for machine learning
Ludwig Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on top of TensorFlow.
Models that are used heavily in competitions because of their outstanding generalization performance.
https://awesomeopensource.com/project/dmlc/xgboost eXtreme Gradient Boosting, not actually AutoML but it is quite popular among Kaggle competitors
https://awesomeopensource.com/project/microsoft/LightGBM lightweight alternative compared to xgboost
https://awesomeopensource.com/project/catboost/catboost A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
https://awesomeopensource.com/project/tensorflow/decision-forests TensorFlow Decision Forests (TF-DF) is a collection of state-of-the-art algorithms for the training, serving and interpretation of Decision Forest models.
PyTorch/TensorFlow implementation of TabNet paper. Further read: TabNet balances explainability and model performance on tabular data, but can it dethrone boosted tree models?
Time Series Inference
Time series data require unique feature extraction process for them to be usable in most machine learning models because most models require data to be in a tabular format.
Or you can use special model architectures which target time series e.g. LSTM, TCN, etc.
Libraries that help you develop/debug/deploy the model in production (MLOps). There is more to ML than training the model.
https://awesomeopensource.com/project/allegroai/clearml Auto-Magical Suite of tools to streamline your ML workflow. Experiment Manager, ML-Ops and Data-Management
https://awesomeopensource.com/project/quantumblacklabs/kedro A Python framework for creating reproducible, maintainable and modular data science code.
https://awesomeopensource.com/project/determined-ai/determined Determined is an open-source deep learning training platform that makes building models fast and easy. I use it mainly for tuning hyperparameters.
https://awesomeopensource.com/project/iterative/cml Continuous Machine Learning (CML) is an open-source library for implementing continuous integration & delivery (CI/CD) in machine learning projects. Use it to automate parts of your development workflow, including model training and evaluation, comparing ML experiments across your project history, and monitoring changing datasets.
https://awesomeopensource.com/project/creme-ml/creme Python library for online machine learning. All the tools in the library can be updated with a single observation at a time, and can therefore be used to learn from streaming data.
https://awesomeopensource.com/project/aimhubio/aim A super-easy way to record, search and compare 1000s of ML training runs
https://awesomeopensource.com/project/Netflix/metaflow Metaflow is a human-friendly Python library that helps scientists and engineers build and manage real-life data science projects. Metaflow was originally developed at Netflix.
MLflow MLflow (currently in beta) is an open source platform to manage the ML lifecycle, including experimentation, reproducibility and deployment. It currently offers three components: MLflow Tracking, MLflow Projects, MLflow Models.
FloydHub a Heroku for Deep Learning (You focus on the model, they'll deploy)
comet.ml Comet enables data scientists and teams to track, compare, explain and optimize experiments and models across the model's entire lifecycle. From training to production
https://neptune.ai/ Manage all your model building metadata in a single place
https://wandb.ai/site Build better models faster with experiment tracking, dataset versioning, and model management
https://awesomeopensource.com/project/fastai/nbdev Create delightful python projects using Jupyter Notebooks
https://rapids.ai/ data science on GPUs
Data cleaning and data augmentation
Before you begin, please read this blog post to understand the motivation of searching in general: https://www.determined.ai/blog/stop-doing-iterative-model-development
Open your eyes to search-driven development. It will change you. Main benefit is that there will be no setbacks. Only progress and improvement are allowed. Imagine working and progressing everyday, instead of regressing backwards because your new solution doesn't work. This guaranteed progress is what search-driven development will do to you. Apply it to everything in optimization, not just machine learning.
My top opinionated preferences are determined, ray tune, and optuna because of parallelization (distributed tuning on many machines), flexibility (can optimize arbitrary objectives and allow dataset parameters to be tuned), library of SOTA tuning algorithms (e.g. HyperBand, BOHB, TPE, PBT, ASHA, etc), result visualization/analysis tools, and extensive documentations/tutorials.
Make machines learn without the tedious task of feature engineering, model selection, and hyperparameter tuning
that you have to do yourself. Let the machines perform machine learning for you!
Personally if I have a tabular dataset I would try FLAML and mljar first, especially if you want to get something working fast.
If you want to try gradient boosting frameworks such as XGBoost, LightGBM, CatBoost, etc but you don't know which one works best,
I suggest you to try AutoML first because internally it will try the gradient boosting frameworks mentioned previously.
Architectures that are state-of-the-art in its field.
Interesting Techniques & Applications
Nice Blogs & Vlogs to Follow
Geoffrey Hinton, he has been called
the godfather of deep learning
by introducing 2 revolutionizing techniques (ReLU and Dropout) with his students.
These techniques solve the Vanishing Gradient and Generalization problem of
deep neural networks. He also taught
a Neural Networks course at
Yann LeCun, he invented CNNs
(Convolutional neural networks), the kind of network that is really popular
among computer vision developers today
Yoshua Bengio another
serious professor at Deep Learning, you can
watch his TEDx talk here (2017)
Andrew Ng he discovered that GPUs make deep learning faster.
He taught 2 famous online courses, Machine Learning and Deep Learning specialization at Coursera.
Juergen Schmidhuber invented LSTM (a
particular type of RNN)
Jeff Dean, a
Google Brain engineer, watch his TEDx Talk
Ian Goodfellow, he invented
GANs (Generative Adversarial Networks), is an OpenAI engineer
David Silver this is
the guy behind AlphaGo and Artari reinforcement learning game agents at DeepMind
Demis Hassabis CEO of
DeepMind, has given a lot of talks about AlphaGo and Reinforcement Learning
achievements they have
Andrej Karparthy he teaches convnet
classes, wrote ConvNetJS, and produces a lot of content for DL community, he
also writes a blog (see Nice Blogs & Vlogs to Follow section)
Pedro Domingos he wrote the book
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will
Remake Our World, watch his TEDx talk here
Cutting-Edge Research Publishers
Steal the most recent techniques introduced by smart computer scientists (could be you).
Thoughtful Insights for Future Research
Other Big Lists
I am confused, too many links, where do I start?
If you are a beginner and want to get started with my suggestions, please read this issue:
From now on, this list is going to be compact and opinionated towards my own real-world ML journey and I will put only content that I think are truly beneficial for me and most people.
All the materials and tools that are not good enough (in any aspect) will be gradually removed, including:
- too difficult materials without much intuition; impractical content
- too much theory without real-world practice
- unstructured materials
- courses that I don't consider to enroll myself
- information overload content, unclear direction
- tools that are too specialized and not many people can use it in their works e.g. deepdream (because you can Google it if you want to use it in your work)
- tools that are beaten by other tools; not being state-of-the-art anymore
- commercial tools that look like it can die any time soon
NOTE: There is no particular rank for each link. The order in which they
appear does not convey any meaning and should not be treated differently.
How to contribute to this list
- Fork this repository, then apply your change.
- Make a pull request and tag me if you want.
- That's it. If your edition is useful, I'll merge it.
Or you can just submit a new issue
containing the resource you want me to include if you don't have time to send a pull request.
The resource you want to include should be free to study.