Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Baselines | 13,548 | 37 | 2 | 2 months ago | 6 | February 26, 2018 | 490 | mit | Python | |
OpenAI Baselines: high-quality implementations of reinforcement learning algorithms | ||||||||||
Reinforcement Learning With Tensorflow | 7,469 | 8 months ago | 58 | mit | Python | |||||
Simple Reinforcement learning tutorials, 莫烦Python 中文AI教学 | ||||||||||
Easy Rl | 6,137 | 16 hours ago | 41 | other | Jupyter Notebook | |||||
强化学习中文教程(蘑菇书),在线阅读地址:https://datawhalechina.github.io/easy-rl/ | ||||||||||
Tianshou | 5,996 | 4 | 4 days ago | 29 | July 04, 2022 | 42 | mit | Python | ||
An elegant PyTorch deep reinforcement learning library. | ||||||||||
Deep Reinforcement Learning | 4,419 | 21 days ago | 2 | mit | Jupyter Notebook | |||||
Repo for the Deep Reinforcement Learning Nanodegree program | ||||||||||
Reinforcement Learning | 3,637 | 3 years ago | 2 | mit | Jupyter Notebook | |||||
Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning | ||||||||||
Deeprl | 2,834 | 5 months ago | 5 | mit | Python | |||||
Modularized Implementation of Deep RL Algorithms in PyTorch | ||||||||||
Deep Reinforcement Learning With Pytorch | 2,741 | 5 days ago | 26 | mit | Python | |||||
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and .... | ||||||||||
Elegantrl | 2,715 | 1 | 10 days ago | 3 | January 08, 2022 | 87 | other | Python | ||
Cloud-native Deep Reinforcement Learning. 🔥 | ||||||||||
Cleanrl | 2,360 | 6 hours ago | 53 | other | Python | |||||
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG) |
CleanRL is a Deep Reinforcement Learning library that provides high-quality single-file implementation with research-friendly features. The implementation is clean and simple, yet we can scale it to run thousands of experiments using AWS Batch. The highlight features of CleanRL are:
ppo_atari.py
only has 340 lines of code but contains all implementation details on how PPO works with Atari games, so it is a great reference implementation to read for folks who do not wish to read an entire modular library.You can read more about CleanRL in our JMLR paper and documentation.
CleanRL only contains implementations of online deep reinforcement learning algorithms. If you are looking for offline algorithms, please check out tinkoff-ai/CORL, which shares a similar design philosophy as CleanRL.
ℹ️ Support for Gymnasium: Farama-Foundation/Gymnasium is the next generation of
openai/gym
that will continue to be maintained and introduce new features. Please see their announcement for further detail. We are migrating togymnasium
and the progress can be tracked in vwxyzjn/cleanrl#277.
⚠️ NOTE: CleanRL is not a modular library and therefore it is not meant to be imported. At the cost of duplicate code, we make all implementation details of a DRL algorithm variant easy to understand, so CleanRL comes with its own pros and cons. You should consider using CleanRL if you want to 1) understand all implementation details of an algorithm's varaint or 2) prototype advanced features that other modular DRL libraries do not support (CleanRL has minimal lines of code so it gives you great debugging experience and you don't have do a lot of subclassing like sometimes in modular DRL libraries).
Prerequisites:
To run experiments locally, give the following a try:
git clone https://github.com/vwxyzjn/cleanrl.git && cd cleanrl
poetry install
# alternatively, you could use `poetry shell` and do
# `python run cleanrl/ppo.py`
poetry run python cleanrl/ppo.py \
--seed 1 \
--env-id CartPole-v0 \
--total-timesteps 50000
# open another temrminal and enter `cd cleanrl/cleanrl`
tensorboard --logdir runs
To use experiment tracking with wandb, run
wandb login # only required for the first time
poetry run python cleanrl/ppo.py \
--seed 1 \
--env-id CartPole-v0 \
--total-timesteps 50000 \
--track \
--wandb-project-name cleanrltest
To run training scripts in other games:
poetry shell
# classic control
python cleanrl/dqn.py --env-id CartPole-v1
python cleanrl/ppo.py --env-id CartPole-v1
python cleanrl/c51.py --env-id CartPole-v1
# atari
poetry install --with atari
python cleanrl/dqn_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/c51_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/ppo_atari.py --env-id BreakoutNoFrameskip-v4
python cleanrl/sac_atari.py --env-id BreakoutNoFrameskip-v4
# NEW: 3-4x side-effects free speed up with envpool's atari (only available to linux)
poetry install --with envpool
python cleanrl/ppo_atari_envpool.py --env-id BreakoutNoFrameskip-v4
# Learn Pong-v5 in ~5-10 mins
# Side effects such as lower sample efficiency might occur
poetry run python ppo_atari_envpool.py --clip-coef=0.2 --num-envs=16 --num-minibatches=8 --num-steps=128 --update-epochs=3
# pybullet
poetry install --with pybullet
python cleanrl/td3_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/ddpg_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
python cleanrl/sac_continuous_action.py --env-id MinitaurBulletDuckEnv-v0
# procgen
poetry install --with procgen
python cleanrl/ppo_procgen.py --env-id starpilot
python cleanrl/ppg_procgen.py --env-id starpilot
# ppo + lstm
python cleanrl/ppo_atari_lstm.py --env-id BreakoutNoFrameskip-v4
You may also use a prebuilt development environment hosted in Gitpod:
To make our experimental data transparent, CleanRL participates in a related project called Open RL Benchmark, which contains tracked experiments from popular DRL libraries such as ours, Stable-baselines3, openai/baselines, jaxrl, and others.
Check out https://benchmark.cleanrl.dev/ for a collection of Weights and Biases reports showcasing tracked DRL experiments. The reports are interactive, and researchers can easily query information such as GPU utilization and videos of an agent's gameplay that are normally hard to acquire in other RL benchmarks. In the future, Open RL Benchmark will likely provide an dataset API for researchers to easily access the data (see repo).
We have a Discord Community for support. Feel free to ask questions. Posting in Github Issues and PRs are also welcome. Also our past video recordings are available at YouTube
If you use CleanRL in your work, please cite our technical paper:
@article{huang2022cleanrl,
author = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
title = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
journal = {Journal of Machine Learning Research},
year = {2022},
volume = {23},
number = {274},
pages = {1--18},
url = {http://jmlr.org/papers/v23/21-1342.html}
}