Reinforcement Learning An Introduction

Python Implementation of Reinforcement Learning: An Introduction
Alternatives To Reinforcement Learning An Introduction
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Reinforcement Learning An Introduction12,338
a month ago16mitPython
Python Implementation of Reinforcement Learning: An Introduction
Awesome Artificial Intelligence7,727
a month ago43
A curated list of Artificial Intelligence (AI) courses, books, video lectures and papers.
Tensorlayer7,1883464 months ago83February 15, 202230otherPython
Deep Learning and Reinforcement Learning Library for Scientists and Engineers
Palm Rlhf Pytorch6,381
2 months ago12mitPython
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Reinforcement Learning3,657
3 years ago2mitJupyter Notebook
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning
Polyaxon3,32841118 hours ago334June 05, 2022122apache-2.0
MLOps Tools For Managing & Orchestrating The Machine Learning LifeCycle
Tensorwatch3,310612 days ago14March 04, 202055mitJupyter Notebook
Debugging, monitoring and visualization for Python Machine Learning and Data Science
Dm_control3,1891218 days ago25May 23, 202269apache-2.0Python
DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Awesome Chatgpt3,161
5 days ago1
Data Science Best Resources2,466
13 days ago5mit
Carefully curated resource links for data science in one place
Alternatives To Reinforcement Learning An Introduction
Select To Compare

Alternative Project Comparisons

Reinforcement Learning: An Introduction

@@ I am looking for self-motivated students interested in RL at different levels! @@
@@ Visit for more details. @@

Build Status

Python replication for Sutton & Barto's book Reinforcement Learning: An Introduction (2nd Edition)

If you have any confusion about the code or want to report a bug, please open an issue instead of emailing me directly, and unfortunately I do not have exercise answers for the book.


Chapter 1

  1. Tic-Tac-Toe

Chapter 2

  1. Figure 2.1: An exemplary bandit problem from the 10-armed testbed
  2. Figure 2.2: Average performance of epsilon-greedy action-value methods on the 10-armed testbed
  3. Figure 2.3: Optimistic initial action-value estimates
  4. Figure 2.4: Average performance of UCB action selection on the 10-armed testbed
  5. Figure 2.5: Average performance of the gradient bandit algorithm
  6. Figure 2.6: A parameter study of the various bandit algorithms

Chapter 3

  1. Figure 3.2: Grid example with random policy
  2. Figure 3.5: Optimal solutions to the gridworld example

Chapter 4

  1. Figure 4.1: Convergence of iterative policy evaluation on a small gridworld
  2. Figure 4.2: Jack’s car rental problem
  3. Figure 4.3: The solution to the gambler’s problem

Chapter 5

  1. Figure 5.1: Approximate state-value functions for the blackjack policy
  2. Figure 5.2: The optimal policy and state-value function for blackjack found by Monte Carlo ES
  3. Figure 5.3: Weighted importance sampling
  4. Figure 5.4: Ordinary importance sampling with surprisingly unstable estimates

Chapter 6

  1. Example 6.2: Random walk
  2. Figure 6.2: Batch updating
  3. Figure 6.3: Sarsa applied to windy grid world
  4. Figure 6.4: The cliff-walking task
  5. Figure 6.6: Interim and asymptotic performance of TD control methods
  6. Figure 6.7: Comparison of Q-learning and Double Q-learning

Chapter 7

  1. Figure 7.2: Performance of n-step TD methods on 19-state random walk

Chapter 8

  1. Figure 8.2: Average learning curves for Dyna-Q agents varying in their number of planning steps
  2. Figure 8.4: Average performance of Dyna agents on a blocking task
  3. Figure 8.5: Average performance of Dyna agents on a shortcut task
  4. Example 8.4: Prioritized sweeping significantly shortens learning time on the Dyna maze task
  5. Figure 8.7: Comparison of efficiency of expected and sample updates
  6. Figure 8.8: Relative efficiency of different update distributions

Chapter 9

  1. Figure 9.1: Gradient Monte Carlo algorithm on the 1000-state random walk task
  2. Figure 9.2: Semi-gradient n-steps TD algorithm on the 1000-state random walk task
  3. Figure 9.5: Fourier basis vs polynomials on the 1000-state random walk task
  4. Figure 9.8: Example of feature width’s effect on initial generalization and asymptotic accuracy
  5. Figure 9.10: Single tiling and multiple tilings on the 1000-state random walk task

Chapter 10

  1. Figure 10.1: The cost-to-go function for Mountain Car task in one run
  2. Figure 10.2: Learning curves for semi-gradient Sarsa on Mountain Car task
  3. Figure 10.3: One-step vs multi-step performance of semi-gradient Sarsa on the Mountain Car task
  4. Figure 10.4: Effect of the alpha and n on early performance of n-step semi-gradient Sarsa
  5. Figure 10.5: Differential semi-gradient Sarsa on the access-control queuing task

Chapter 11

  1. Figure 11.2: Baird's Counterexample
  2. Figure 11.6: The behavior of the TDC algorithm on Baird’s counterexample
  3. Figure 11.7: The behavior of the ETD algorithm in expectation on Baird’s counterexample

Chapter 12

  1. Figure 12.3: Off-line λ-return algorithm on 19-state random walk
  2. Figure 12.6: TD(λ) algorithm on 19-state random walk
  3. Figure 12.8: True online TD(λ) algorithm on 19-state random walk
  4. Figure 12.10: Sarsa(λ) with replacing traces on Mountain Car
  5. Figure 12.11: Summary comparison of Sarsa(λ) algorithms on Mountain Car

Chapter 13

  1. Example 13.1: Short corridor with switched actions
  2. Figure 13.1: REINFORCE on the short-corridor grid world
  3. Figure 13.2: REINFORCE with baseline on the short-corridor grid-world



All files are self-contained



If you want to contribute some missing examples or fix some bugs, feel free to open an issue or make a pull request.

Popular Reinforcement Learning Projects
Popular Artificial Intelligence Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.
Artificial Intelligence
Reinforcement Learning