Have you heard about the amazing results achieved by Deepmind with AlphaGo Zero and by OpenAI in Dota 2? It's all about deep neural networks and reinforcement learning. Do you want to know more about it?
This is the right opportunity for you to finally learn Deep RL and use it on new and exciting projects and applications.
Here you'll find an in depth introduction to these algorithms. Among which you'll learn q learning, deep q learning, PPO, actor critic, and implement them using Python and PyTorch.
The ultimate aim is to use these general-purpose technologies and apply them to all sorts of important real world problems. Demis Hassabis
This repository contains:
Lectures (& other content) primarily from DeepMind and Berkley Youtube's Channel.
Algorithms (like DQN, A2C, and PPO) implemented in PyTorch and tested on OpenAI Gym: RoboSchool & Atari.
To learn Deep Learning, Computer Vision or Natural Language Processing check my 1-Year-ML-Journey
To learn Reinforcement Learning and Deep RL more in depth, check out my book Reinforcement Learning Algorithms with Python!!
Table of Contents
Those who cannot remember the past are condemned to repeat it - George Santayana
This week, we will learn about the basic blocks of reinforcement learning, starting from the definition of the problem all the way through the estimation and optimization of the functions that are used to express the quality of a policy or state.
Q-learning applied to FrozenLake - For exercise, you can solve the game using SARSA or implement Q-learning by yourself. In the former case, only few changes are needed.
This week we'll learn more advanced concepts and apply deep neural network to Q-learning algorithms.
DQN and some variants applied to Pong - This week the goal is to develop a DQN algorithm to play an Atari game. To make it more interesting I developed three extensions of DQN: Double Q-learning, Multi-step learning, Dueling networks and Noisy Nets. Play with them, and if you feel confident, you can implement Prioritized replay, Dueling networks or Distributional RL. To know more about these improvements read the papers!
Week 4 introduce Policy Gradient methods, a class of algorithms that optimize directly the policy. Also, you'll learn about Actor-Critic algorithms. These algorithms combine both policy gradient (the actor) and value function (the critic).
Vanilla PG and A2C applied to CartPole - The exercise of this week is to implement a policy gradient method or a more sophisticated actor-critic. In the repository you can find an implemented version of PG and A2C. Bug Alert! Pay attention that A2C give me strange result. If you find the implementation of PG and A2C easy, you can try with the asynchronous version of A2C (A3C).
This week is about advanced policy gradient methods that improve the stability and the convergence of the "Vanilla" policy gradient methods. You'll learn and implement PPO, a RL algorithm developed by OpenAI and adopted in OpenAI Five.
PPO applied to BipedalWalker - This week, you have to implement PPO or TRPO. I suggest PPO given its simplicity (compared to TRPO). In the project folder Week5 you find an implementation of PPO that learn to play BipedalWalker. Furthermore, in the folder you can find other resources that will help you in the development of the project. Have fun!
In the last year, Evolution strategies (ES) and Genetic Algorithms (GA) has been shown to achieve comparable results to RL methods. They are derivate-free black-box algorithms that require more data than RL to learn but are able to scale up across thousands of CPUs. This week we'll look at this black-box algorithms.
Evolution Strategies applied to LunarLander - This week the project is to implement a ES or GA. In the Week6 folder you can find a basic implementation of the paper Evolution Strategies as a Scalable Alternative to Reinforcement Learning to solve LunarLanderContinuous. You can modify it to play more difficult environments or add your ideas.
The algorithms studied up to now are model-free, meaning that they only choose the better action given a state. These algorithms achieve very good performance but require a lot of training data. Instead, model-based algorithms, learn the environment and plan the next actions accordingly to the model learned. These methods are more sample efficient than model-free but overall achieve worst performance. In this week you'll learn the theory behind these methods and implement one of the last algorithms.
MB-MF applied to RoboschoolAnt - This week I chose to implement the model-based algorithm described in this paper. You can find my implementation here. NB: Instead of implementing it on Mujoco as in the paper, I used RoboSchool, an open-source simulator for robot, integrated with OpenAI Gym.
This last week is about advanced RL concepts and a project of your choice.
Here you can find some project ideas.
Congratulation for completing the 60 Days RL Challenge!! Let me know if you enjoyed it and share it!
📚 Deep Reinforcement Learning Hands-On - by Maxim Lapan
📚 Deep Learning - Ian Goodfellow
📺 Reinforcement Learning course - by David Silver, DeepMind. Great introductory lectures by Silver, a lead researcher on AlphaGo. They follow the book Reinforcement Learning by Sutton & Barto.
📚 Awesome Reinforcement Learning. A curated list of resources dedicated to reinforcement learning
📚 GroundAI on RL. Papers on reinforcement learning
Any contribution is higly appreciated! Cheers!