There are many RL tutorials, courses, papers in the internet. This one summarizes all of the RL tutorials, RL courses, and some of the important RL papers including sample code of RL algorithms. It will continue to be updated over time.
Keywords: Dynamic Programming (Policy and Value Iteration), Monte Carlo, Temporal Difference (SARSA, QLearning), Approximation, Policy Gradient, DQN, Imitation Learning, Meta-Learning, RL papers, RL courses, etc.
NOTE: This tutorial is only for education purpose. It is not academic study/paper. All related references are listed at the end of the file.
Machine learning mainly consists of three methods: Supervised Learning, Unsupervised Learning and Reinforcement Learning. Supervised Learning provides mapping functionality between input and output using labelled dataset. Some of the supervised learning methods: Linear Regression, Support Vector Machines, Neural Networks, etc. Unsupervised Learning provides grouping and clustering functionality. Some of the unsupervised learning methods: K-Means, DBScan, etc. Reinforcement Learning is different from supervised and unsupervised learning. RL provides behaviour learning.
"A reinforcement learning algorithm, or agent, learns by interacting with its environment. The agent receives rewards by performing correctly and penalties for performing incorrectly. The agent learns without intervention from a human by maximizing its reward and minimizing its penalty" *. RL agents are used in different applications: Robotics, self driving cars, playing atari games, managing investment portfolio, control problems. I am believing that like many AI laboratories do, reinforcement learning with deep learning will be a core technology in the future.
[Sutton & Barto Book: RL: An Introduction]
[David Silver Lecture Notes]
A state St is Markov if and only if P[St+1|St] =P[St+1|S1,...,St]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
A policy is the agent’s behaviour. It is a map from state to action.
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
Advantages:
Disadvantages:
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
[David Silver Lecture Notes]
-V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, et al. “Playing Atari with Deep Reinforcement Learning”. (2013)
Paper: https://www.cs.cmu.edu/~sross1/publications/Ross-AIStats11-NoRegret.pdf
[David Silver Lecture Notes]