Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Tianshou | 5,959 | 4 | 3 days ago | 29 | July 04, 2022 | 43 | mit | Python | ||
An elegant PyTorch deep reinforcement learning library. | ||||||||||
Deep Reinforcement Learning With Pytorch | 2,504 | 4 months ago | 25 | mit | Python | |||||
PyTorch implementation of DQN, AC, ACER, A2C, A3C, PG, DDPG, TRPO, PPO, SAC, TD3 and .... | ||||||||||
Rl Baselines Zoo | 1,025 | 5 months ago | 5 | mit | Python | |||||
A collection of 100+ pre-trained RL agents using Stable Baselines, training and hyperparameter optimization included. | ||||||||||
Pytorch Rl | 638 | 2 years ago | 6 | mit | Python | |||||
PyTorch implementation of Deep Reinforcement Learning: Policy Gradient methods (TRPO, PPO, A2C) and Generative Adversarial Imitation Learning (GAIL). Fast Fisher vector product TRPO. | ||||||||||
Hands On Reinforcement Learning With Python | 596 | 2 years ago | 2 | Jupyter Notebook | ||||||
Master Reinforcement and Deep Reinforcement Learning using OpenAI Gym and TensorFlow | ||||||||||
Modular_rl | 523 | 5 years ago | 10 | mit | Python | |||||
Implementation of TRPO and related algorithms | ||||||||||
Reinforcement Learning Algorithms | 407 | 2 years ago | 4 | Python | ||||||
This repository contains most of pytorch implementation based classic deep reinforcement learning algorithms, including - DQN, DDQN, Dueling Network, DDPG, SAC, A2C, PPO, TRPO. (More algorithms are still in progress) | ||||||||||
Reinforcement Implementation | 380 | 8 months ago | 1 | Python | ||||||
Implementation of benchmark RL algorithms | ||||||||||
Deep_rl | 372 | 2 years ago | 1 | mit | Python | |||||
PyTorch implementations of deep reinforcement learning algorithms | ||||||||||
Machine Learning Is All You Need | 253 | a year ago | Python | |||||||
🔥🌟《Machine Learning 格物志》: ML + DL + RL basic codes and notes by sklearn, PyTorch, TensorFlow, Keras & the most important, from scratch!💪 This repository is ALL You Need! |
Status: Active (under active development, breaking changes may occur)
This repository will implement the classic and state-of-the-art deep reinforcement learning algorithms. The aim of this repository is to provide clear pytorch code for people to learn the deep reinforcement learning algorithm.
In the future, more state-of-the-art algorithms will be added and the existing codes will also be maintained.
Note that tensorflow does not support python3.7
pip install -r requirements.txt
If you fail:
pip install gym
please go to official webisite to install it: https://pytorch.org/
Recommend use Anaconda Virtual Environment to manage your packages
pip install tensorboardX
pip install tensorflow==1.12
cd Char10\ TD3/
python TD3_BipedalWalker-v2.py --mode test
You could see a bipedalwalker if you install successfully.
BipedalWalker:
# clone the openai baselines
git clone https://github.com/openai/baselines.git
cd baselines
pip install -e .
Here I uploaded two DQN models which is trianing CartPole-v0 and MountainCar-v0.
This is a sparse binary reward task. Only when car reach the top of the mountain there is a none-zero reward. In genearal it may take 1e5 steps in stochastic policy. You can add a reward term, for example, to change to the current position of the Car is positively related. Of course, there is a more advanced approach that is inverse reinforcement learning.
This is value loss for DQN, We can see that the loss increaded to 1e13, however, the network work well. Because the target_net and act_net are very different with the training process going on. The calculated loss cumulate large. The previous loss was small because the reward was very sparse, resulting in a small update of the two networks.
Use the following command to run a saved model
python Run_Model.py
Use the following command to train model
python pytorch_MountainCar-v0.py
policyNet.pkl
This is a model that I have trained.
This is an algorithmic framework, and the classic REINFORCE method is stored under Actor-Critic.
Episode reward in Pendulum-v0:
Advantage Policy Gradient, an paper in 2017 pointed out that the difference in performance between A2C and A3C is not obvious.
The Asynchronous Advantage Actor Critic method (A3C) has been very influential since the paper was published. The algorithm combines a few key ideas:
Original paper: https://arxiv.org/abs/1602.01783
This is not the implementation of the author of paper!!!
Episode reward in Pendulum-v0:
This is not the implementation of the author of paper!!!
Episode reward in Pendulum-v0:
Episode reward in BipedalWalker-v2:
If you want to use the test your model:
python TD3_BipedalWalker-v2.py --mode test
[01] A Brief Survey of Deep Reinforcement Learning
[02] The Beta Policy for Continuous Control Reinforcement Learning
[03] Playing Atari with Deep Reinforcement Learning
[04] Deep Reinforcement Learning with Double Q-learning
[05] Dueling Network Architectures for Deep Reinforcement Learning
[06] Continuous control with deep reinforcement learning
[07] Continuous Deep Q-Learning with Model-based Acceleration
[08] Asynchronous Methods for Deep Reinforcement Learning
[09] Trust Region Policy Optimization
[10] Proximal Policy Optimization Algorithms
[11] Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation
[12] High-Dimensional Continuous Control Using Generalized Advantage Estimation
[13] Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
[14] Addressing Function Approximation Error in Actor-Critic Methods