Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Muzero General | 2,074 | 2 months ago | 49 | mit | Python | |||||
MuZero | ||||||||||
Awesome Deep Rl | 1,225 | 4 months ago | mit | HTML | ||||||
For deep RL and the future of AI. | ||||||||||
Textworld | 1,044 | 1 | 7 days ago | 30 | February 08, 2022 | 14 | other | Jupyter Notebook | ||
TextWorld is a sandbox learning environment for the training and evaluation of reinforcement learning (RL) agents on text-based games. | ||||||||||
Smac | 828 | 2 months ago | 12 | mit | Python | |||||
SMAC: The StarCraft Multi-Agent Challenge | ||||||||||
Flappybird | 555 | 8 months ago | 2 | mit | Java | |||||
基于Java基础类库编写的Flappy Bird | ||||||||||
Rl_games | 453 | 9 hours ago | 13 | June 29, 2022 | 25 | mit | Jupyter Notebook | |||
RL implementations | ||||||||||
Snake Ga | 360 | 2 years ago | 9 | Python | ||||||
AI Agent that learns how to play Snake with Deep Q-Learning | ||||||||||
Bakkesmodsdk | 187 | 12 days ago | 20 | C++ | ||||||
The current BakkesModSDK (Unofficial SDK for Rocket League) | ||||||||||
Learning To Communicate Pytorch | 187 | 4 years ago | apache-2.0 | Python | ||||||
Learning to Communicate with Deep Multi-Agent Reinforcement Learning in PyTorch | ||||||||||
Atari | 185 | 2 years ago | 9 | mit | Python | |||||
AI research environment for the Atari 2600 games 🤖. |
Implemented in Pytorch:
Implemented in Tensorflow 1.x (was removed in this version):
Explore RL Games quick and easily in colab notebooks:
For maximum training performance a preliminary installation of Pytorch 1.9+ with CUDA 11.1+ is highly recommended:
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch -c nvidia
or:
pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
Then:
pip install rl-games
To run CPU-based environments either Ray or envpool are required pip install envpool
or pip install ray
To run Mujoco, Atari games or Box2d based environments training they need to be additionally installed with pip install gym[mujoco]
, pip install gym[atari]
or pip install gym[box2d]
respectively.
To run Atari also pip install opencv-python
is required. In addition installation of envpool for maximum simulation and training perfromance of Mujoco and Atari environments is highly recommended: pip install envpool
If you use rl-games in your research please use the following citation:
@misc{rl-games2021,
title = {rl-games: A High-performance Framework for Reinforcement Learning},
author = {Makoviichuk, Denys and Makoviychuk, Viktor},
month = {May},
year = {2021},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/Denys88/rl_games}},
}
poetry install
# install cuda related dependencies
poetry run pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
NVIDIA Isaac Gym
Download and follow the installation instructions of Isaac Gym: https://developer.nvidia.com/isaac-gym
And IsaacGymEnvs: NVIDIA-Omniverse/IsaacGymEnvs
Ant
python train.py task=Ant headless=True
python train.py task=Ant test=True checkpoint=nn/Ant.pth num_envs=100
Humanoid
python train.py task=Humanoid headless=True
python train.py task=Humanoid test=True checkpoint=nn/Humanoid.pth num_envs=100
Shadow Hand block orientation task
python train.py task=ShadowHand headless=True
python train.py task=ShadowHand test=True checkpoint=nn/ShadowHand.pth num_envs=100
Other
Atari Pong
poetry install -E atari
poetry run python runner.py --train --file rl_games/configs/atari/ppo_pong.yaml
poetry run python runner.py --play --file rl_games/configs/atari/ppo_pong.yaml --checkpoint nn/PongNoFrameskip.pth
Brax Ant
poetry install -E brax
poetry run pip install --upgrade "jax[cuda]==0.3.13" -f https://storage.googleapis.com/jax-releases/jax_releases.html
poetry run python runner.py --train --file rl_games/configs/brax/ppo_ant.yaml
poetry run python runner.py --play --file rl_games/configs/brax/ppo_ant.yaml --checkpoint runs/Ant_brax/nn/Ant_brax.pth
rl_games support experiment tracking with Weights and Biases.
poetry install -E atari
poetry run python runner.py --train --file rl_games/configs/atari/ppo_breakout_torch.yaml --track
WANDB_API_KEY=xxxx poetry run python runner.py --train --file rl_games/configs/atari/ppo_breakout_torch.yaml --track
poetry run python runner.py --train --file rl_games/configs/atari/ppo_breakout_torch.yaml --wandb-project-name rl-games-special-test --track
poetry run python runner.py --train --file rl_games/configs/atari/ppo_breakout_torch.yaml --wandb-project-name rl-games-special-test -wandb-entity openrlbenchmark --track
We use torchrun
to orchestrate any multi-gpu runs.
torchrun --standalone --nnodes=1 --nproc_per_node=2 runner.py --train --file rl_games/configs/ppo_cartpole.yaml
Field | Example Value | Default | Description |
---|---|---|---|
seed | 8 | None | Seed for pytorch, numpy etc. |
algo | Algorithm block. | ||
name | a2c_continuous | None | Algorithm name. Possible values are: sac, a2c_discrete, a2c_continuous |
model | Model block. | ||
name | continuous_a2c_logstd | None | Possible values: continuous_a2c ( expects sigma to be (0, +inf), continuous_a2c_logstd ( expects sigma to be (-inf, +inf), a2c_discrete, a2c_multi_discrete |
network | Network description. | ||
name | actor_critic | Possible values: actor_critic or soft_actor_critic. | |
separate | False | Whether use or not separate network with same same architecture for critic. In almost all cases if you normalize value it is better to have it False | |
space | Network space | ||
continuous | continuous or discrete | ||
mu_activation | None | Activation for mu. In almost all cases None works the best, but we may try tanh. | |
sigma_activation | None | Activation for sigma. Will be threated as log(sigma) or sigma depending on model. | |
mu_init | Initializer for mu. | ||
name | default | ||
sigma_init | Initializer for sigma. if you are using logstd model good value is 0. | ||
name | const_initializer | ||
val | 0 | ||
fixed_sigma | True | If true then sigma vector doesn't depend on input. | |
cnn | Convolution block. | ||
type | conv2d | Type: right now two types supported: conv2d or conv1d | |
activation | elu | activation between conv layers. | |
initializer | Initialier. I took some names from the tensorflow. | ||
name | glorot_normal_initializer | Initializer name | |
gain | 1.4142 | Additional parameter. | |
convs | Convolution layers. Same parameters as we have in torch. | ||
filters | 32 | Number of filters. | |
kernel_size | 8 | Kernel size. | |
strides | 4 | Strides | |
padding | 0 | Padding | |
filters | 64 | Next convolution layer info. | |
kernel_size | 4 | ||
strides | 2 | ||
padding | 0 | ||
filters | 64 | ||
kernel_size | 3 | ||
strides | 1 | ||
padding | 0 | ||
mlp | MLP Block. Convolution is supported too. See other config examples. | ||
units | Array of sizes of the MLP layers, for example: [512, 256, 128] | ||
d2rl | False | Use d2rl architecture from https://arxiv.org/abs/2010.09163. | |
activation | elu | Activations between dense layers. | |
initializer | Initializer. | ||
name | default | Initializer name. | |
rnn | RNN block. | ||
name | lstm | RNN Layer name. lstm and gru are supported. | |
units | 256 | Number of units. | |
layers | 1 | Number of layers | |
before_mlp | False | False | Apply rnn before mlp block or not. |
config | RL Config block. | ||
reward_shaper | Reward Shaper. Can apply simple transformations. | ||
min_val | -1 | You can apply min_val, max_val, scale and shift. | |
scale_value | 0.1 | 1 | |
normalize_advantage | True | True | Normalize Advantage. |
gamma | 0.995 | Reward Discount | |
tau | 0.95 | Lambda for GAE. Called tau by mistake long time ago because lambda is keyword in python :( | |
learning_rate | 3e-4 | Learning rate. | |
name | walker | Name which will be used in tensorboard. | |
save_best_after | 10 | How many epochs to wait before start saving checkpoint with best score. | |
score_to_win | 300 | If score is >=value then this value training will stop. | |
grad_norm | 1.5 | Grad norm. Applied if truncate_grads is True. Good value is in (1.0, 10.0) | |
entropy_coef | 0 | Entropy coefficient. Good value for continuous space is 0. For discrete is 0.02 | |
truncate_grads | True | Apply truncate grads or not. It stabilizes training. | |
env_name | BipedalWalker-v3 | Envinronment name. | |
e_clip | 0.2 | clip parameter for ppo loss. | |
clip_value | False | Apply clip to the value loss. If you are using normalize_value you don't need it. | |
num_actors | 16 | Number of running actors/environments. | |
horizon_length | 4096 | Horizon length per each actor. Total number of steps will be num_actors*horizon_length * num_agents (if env is not MA num_agents==1). | |
minibatch_size | 8192 | Minibatch size. Total number number of steps must be divisible by minibatch size. | |
minibatch_size_per_env | 8 | Minibatch size per env. If specified will overwrite total number number the default minibatch size with minibatch_size_per_env * nume_envs value. | |
mini_epochs | 4 | Number of miniepochs. Good value is in [1,10] | |
critic_coef | 2 | Critic coef. by default critic_loss = critic_coef * 1/2 * MSE. | |
lr_schedule | adaptive | None | Scheduler type. Could be None, linear or adaptive. Adaptive is the best for continuous control tasks. Learning rate is changed changed every miniepoch |
kl_threshold | 0.008 | KL threshould for adaptive schedule. if KL < kl_threshold/2 lr = lr * 1.5 and opposite. | |
normalize_input | True | Apply running mean std for input. | |
bounds_loss_coef | 0.0 | Coefficient to the auxiary loss for continuous space. | |
max_epochs | 10000 | Maximum number of epochs to run. | |
max_frames | 5000000 | Maximum number of frames (env steps) to run. | |
normalize_value | True | Use value running mean std normalization. | |
use_diagnostics | True | Adds more information into the tensorboard. | |
value_bootstrap | True | Bootstraping value when episode is finished. Very useful for different locomotion envs. | |
bound_loss_type | regularisation | None | Adds aux loss for continuous case. 'regularisation' is the sum of sqaured actions. 'bound' is the sum of actions higher than 1.1. |
bounds_loss_coef | 0.0005 | 0 | Regularisation coefficient |
use_smooth_clamp | False | Use smooth clamp instead of regular for cliping | |
zero_rnn_on_done | False | True | If False RNN internal state is not reset (set to 0) when an environment is rest. Could improve training in some cases, for example when domain randomization is on |
player | Player configuration block. | ||
render | True | False | Render environment |
deterministic | True | True | Use deterministic policy ( argmax or mu) or stochastic. |
use_vecenv | True | False | Use vecenv to create environment for player |
games_num | 200 | Number of games to run in the player mode. | |
env_config | Env configuration block. It goes directly to the environment. This example was take for my atari wrapper. | ||
skip | 4 | Number of frames to skip | |
name | BreakoutNoFrameskip-v4 | The exact name of an (atari) gym env. An example, depends on the training env this parameters can be different. |
simple test network
This network takes dictionary observation.
To register it you can add code in your init.py
from rl_games.envs.test_network import TestNetBuilder
from rl_games.algos_torch import model_builder
model_builder.register_network('testnet', TestNetBuilder)
simple test environment example environment
Additional environment supported properties and functions
Field | Default Value | Description |
---|---|---|
use_central_value | False | If true than returned obs is expected to be dict with 'obs' and 'state' |
value_size | 1 | Shape of the returned rewards. Network wil support multihead value automatically. |
concat_infos | False | Should default vecenv convert list of dicts to the dicts of lists. Very usefull if you want to use value_boostrapping. in this case you need to always return 'time_outs' : True or False, from the env. |
get_number_of_agents(self) | 1 | Returns number of agents in the environment |
has_action_mask(self) | False | Returns True if environment has invalid actions mask. |
get_action_mask(self) | None | Returns action masks if has_action_mask is true. Good example is SMAC Env |
1.6.1 (Unreleased)
1.6.0
zero_rnn_on_done
.1.5.2
1.5.1
1.5.0
horovod
in favor of torch.distributed
(#171).1.4.0
1.3.2
1.3.1
1.3.0
1.2.0
1.1.4
1.1.3
clip_actions
for switching off internal action clipping and rescaling1.1.0
pip install rl-games
steps_num
should be changed to horizon_length
amd lr_threshold
to kl_threshold