**Note:** At the moment, only running the code from the docker container (below) is supported. Docker allows for creating a single environment that is more likely to work on all systems. Basically, I install and configure all packages for you, except docker itself, and you just run the code on a tested environment.

To install docker, I recommend a web search for "installing docker on <your os here>". For running the code on a GPU, you have to additionally install nvidia-docker. NVIDIA Docker allows for using a host's GPUs inside docker containers. After you have docker (and nvidia-docker if using a GPU) installed, follow the three steps below.

- Clone this repo:

`git clone --depth 1 https://github.com/mimoralea/gdrl.git && cd gdrl`

- Pull the gdrl image with:

`docker pull mimoralea/gdrl:v0.14`

- Spin up a container:
- On Mac or Linux:

`docker run -it --rm -p 8888:8888 -v "$PWD"/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14`

- On Windows:

`docker run -it --rm -p 8888:8888 -v %CD%/notebooks/:/mnt/notebooks/ mimoralea/gdrl:v0.14`

- NOTE: Use
`nvidia-docker`

if you are using a GPU.

- On Mac or Linux:
- Open a browser and go to the URL shown in the terminal (likely to be: http://localhost:8888). The password is:
`gdrl`

https://www.manning.com/books/grokking-deep-reinforcement-learning

- Introduction to deep reinforcement learning
- Mathematical foundations of reinforcement learning
- Balancing immediate and long-term goals
- Balancing the gathering and utilization of information
- Evaluating agents' behaviors
- Improving agents' behaviors
- Achieving goals more effectively and efficiently
- Introduction to value-based deep reinforcement learning
- More stable value-based methods
- Sample-efficient value-based methods
- Policy-gradient and actor-critic methods
- Advanced actor-critic methods
- Towards artificial general intelligence

- (Livebook)
- (No Notebook)

- (Livebook)
- (Notebook)
- Implementations of several MDPs:
- Bandit Walk
- Bandit Slippery Walk
- Slippery Walk Three
- Random Walk
- Russell and Norvig's Gridworld from AIMA
- FrozenLake
- FrozenLake8x8

- Implementations of several MDPs:

- (Livebook)
- (Notebook)
- Implementations of methods for finding optimal policies:
- Policy Evaluation
- Policy Improvement
- Policy Iteration
- Value Iteration

- Implementations of methods for finding optimal policies:

- (Livebook)
- (Notebook)
- Implementations of exploration strategies for bandit problems:
- Random
- Greedy
- E-greedy
- E-greedy with linearly decaying epsilon
- E-greedy with exponentially decaying epsilon
- Optimistic initialization
- SoftMax
- Upper Confidence Bound
- Bayesian

- Implementations of exploration strategies for bandit problems:

- (Livebook)
- (Notebook)
- Implementation of algorithms that solve the prediction problem (policy estimation):
- On-policy first-visit Monte-Carlo prediction
- On-policy every-visit Monte-Carlo prediction
- Temporal-Difference prediction (TD)
- n-step Temporal-Difference prediction (n-step TD)
- TD(λ)

- Implementation of algorithms that solve the prediction problem (policy estimation):

- (Livebook)
- (Notebook)
- Implementation of algorithms that solve the control problem (policy improvement):
- On-policy first-visit Monte-Carlo control
- On-policy every-visit Monte-Carlo control
- On-policy TD control: SARSA
- Off-policy TD control: Q-Learning
- Double Q-Learning

- Implementation of algorithms that solve the control problem (policy improvement):

- (Livebook)
- (Notebook)
- Implementation of more effective and efficient reinforcement learning algorithms:
- SARSA(λ) with replacing traces
- SARSA(λ) with accumulating traces
- Q(λ) with replacing traces
- Q(λ) with accumulating traces
- Dyna-Q
- Trajectory Sampling

- Implementation of more effective and efficient reinforcement learning algorithms:

- (Livebook)
- (Notebook)
- Implementation of a value-based deep reinforcement learning baseline:
- Neural Fitted Q-iteration (NFQ)

- Implementation of a value-based deep reinforcement learning baseline:

- (Livebook)
- (Notebook)
- Implementation of "classic" value-based deep reinforcement learning methods:
- Deep Q-Networks (DQN)
- Double Deep Q-Networks (DDQN)

- Implementation of "classic" value-based deep reinforcement learning methods:

- (Livebook)
- (Notebook)
- Implementation of main improvements for value-based deep reinforcement learning methods:
- Dueling Deep Q-Networks (Dueling DQN)
- Prioritized Experience Replay (PER)

- Implementation of main improvements for value-based deep reinforcement learning methods:

- (Livebook)
- (Notebook)
- Implementation of classic policy-based and actor-critic deep reinforcement learning methods:
- Policy Gradients without value function and Monte-Carlo returns (REINFORCE)
- Policy Gradients with value function baseline trained with Monte-Carlo returns (VPG)
- Asynchronous Advantage Actor-Critic (A3C)
- Generalized Advantage Estimation (GAE)
- [Synchronous] Advantage Actor-Critic (A2C)

- Implementation of classic policy-based and actor-critic deep reinforcement learning methods:

- (Livebook)
- (Notebook)
- Implementation of advanced actor-critic methods:
- Deep Deterministic Policy Gradient (DDPG)
- Twin Delayed Deep Deterministic Policy Gradient (TD3)
- Soft Actor-Critic (SAC)
- Proximal Policy Optimization (PPO)

- Implementation of advanced actor-critic methods:

- (Livebook)
- (No Notebook)

Get A Weekly Email With Trending Projects For These Topics

No Spam. Unsubscribe easily at any time.

jupyter-notebook (6,449)

deep-learning (4,081)

machine-learning (3,740)

docker (2,975)

pytorch (2,470)

artificial-intelligence (672)

reinforcement-learning (590)

algorithms (484)

neural-networks (450)

gpu (380)

numpy (267)

deep-reinforcement-learning (209)

pytorch-tutorials (24)