Awesome Open Source
Awesome Open Source

Deep Reinforcement Learning for Robotic Grasping from Octrees

The focus of this project is to apply Deep Reinforcement Learning to acquire a robust policy that allows robots to grasp diverse objects from compact 3D observations in form of octrees. It is part of my Master's Thesis conducted at Aalborg University, Denmark.

Below are some animations of employing learned policies on novel scenes for Panda and UR5 robots.

Evaluation of a trained policy on novel scenes for Panda robot

Evaluation of a trained policy on novel scenes for UR5 robot

Example of Sim2Real transfer on UR5 can be seen below (trained inside simulation, no re-training in real world).

Sim2Real evaluation of a trained policy on a real UR5 robot


Local Installation (click to expand)

If you just want to try this project without lengthy installation, consider using Docker instead.


  • OS: Ubuntu 20.04 (Focal)
    • Others might work, but they were not tested.
  • GPU: CUDA is required to process octree observations on GPU.
    • Everything else should function normally on CPU, i.e. environments with other observation types.


These dependencies are required to use the entirety of this project. If no "(tested with version)" is specified, the latest release from a relevant distribution is expected to function properly.

Several other dependencies can be installed via pip with this one-liner.

pip3 install numpy scipy optuna seaborn stable-baselines3[extra] sb3-contrib open3d trimesh pcg-gazebo

All other dependencies are pulled from git and built together with this repository, see drl_grasping.repos for more details.

In case you run into any problems with dependencies along the way, check Dockerfile that includes the full instructions.


Clone this repository and import VCS dependencies. Then build with colcon.

# Create workspace for the project
mkdir -p drl_grasping/src && cd drl_grasping/src
# Clone this repository
git clone
# Import and install dependencies
vcs import < drl_grasping/drl_grasping.repos && cd ..
rosdep install -r --from-paths src -i -y --rosdistro ${ROS_DISTRO}
# Build with colcon
colcon build --merge-install --symlink-install --cmake-args "-DCMAKE_BUILD_TYPE=Release"

Use git clone --recursive if you wish to use one of the pre-trained agents.

Docker (click to expand)


  • OS: Any system that supports Docker should work (Linux, Windows, macOS).
    • Only Ubuntu 20.04 was tested.
  • GPU: CUDA is required to process octree observations on GPU. Therefore, only Docker images with CUDA support are currently available, however, it should be possible to use the pre-built image even on systems without a dedicated GPU.


Before starting, make sure your system has a setup for using Nvidia Docker, e.g.:

# Docker
curl | sh \
  && sudo systemctl --now enable docker
# Nvidia Docker
distribution=$(. /etc/os-release; echo $ID$VERSION_ID) \
  && curl -s -L | sudo apt-key add - \
  && curl -s -L$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker

Pre-built Docker Image

The easiest way to try out this project is by using a pre-built Docker image that can be pulled from Docker Hub. Currently, there is only a development image available (large, but allows editing and recompiling), which also contains the default testing datasets for ease of use. You can pull the latest tag with the following command (~7.5 GB with all parent images).

docker pull andrejorsula/drl_grasping:latest

For running of the container, please use the included docker/run.bash script that is included with this repo. It significantly simplifies the setup with volumes and allows use of graphical interfaces for Ignition Gazebo GUI client and RViZ.

<drl_grasping dir>/docker/run.bash andrejorsula/drl_grasping:latest /bin/bash

If desired, you can also run examples and scripts directly with this setup, e.g. enjoying of pre-trained agents discussed below.

<drl_grasping dir>/docker/run.bash andrejorsula/drl_grasping:latest ros2 run drl_grasping ex_enjoy_pretrained_agent.bash

If you are struggling to get CUDA working on your system with Nvidia GPU (no nvidia-smi output), you might need to use a different version of CUDA base image that supports the version of your driver.

Building a New Image

Dockerfile is included with this repo but all source code is pulled from GitHub when building an image. There is nothing special about it, so just build it as any other Dockerfile (docker build . -t ...) and adjust arguments or the recipe itself if needed.

Sourcing of the Workspace Overlay (click to expand)


Before running any commands, remember to source the ROS 2 workspace overlay. You can skip this step for Docker build as it is done automatically inside the entrypoint.

source <drl_grasping dir>/install/local_setup.bash

This enables:

  • Use of drl_grasping Python module
  • Execution of scripts and examples via ros2 run drl_grasping <executable>
  • Launching of setup scripts via ros2 launch drl_grasping <launch_script>
Using Pre-trained Agents (click to expand)

Enjoy Pre-trained Agents

The pretrained_agents submodule contains a selection of few agents that are already trained and ready to be enjoyed (remember to git clone --recursive/git submodule update --init if you wish to use these). To use them, you can use ex_enjoy_pretrained_agent.bash. You should see RViZ 2 and Ignition Gazebo GUI client with an agent trying to grasp one of four objects in a fully randomised novel environment, while the performance of the agent is logged in your terminal.

ros2 run drl_grasping ex_enjoy_pretrained_agent.bash

The default agent is for Grasp-OctreeWithColor-Gazebo-v0 environment with Panda robot and TQC. You can modify these to any of the other pre-trained agent directly in the example script according to the support matrix from AndrejOrsula/drl_grasping_pretrained_agents.

Under the hood, all examples launch a setup ROS 2 script for interfacing MoveIt 2 and Ignition, and a corresponding Python script for enjoying or training. All examples print these commands out if you are interested in running the commands separately.

Training New Agents (click to expand)

Training of Agent

To train your own agent, you can start with the ex_train.bash example. You can customise this example script, configuration of the environment and all hyperparameters to your needs (see below). By default, headless mode is used during training to reduce computational load. If you want to see what is going on, use ign gazebo -g or ROS_DOMAIN_ID=69 rviz2 and visualise point cloud of the scene.

ros2 run drl_grasping ex_train.bash

Depending on your hardware and hyperparameter configuration, the training can be a very lengthy process. It takes nearly three days to train an agent for 500k steps on a 130W laptop with a dedicated GPU.

Enjoying of Trained Agents

To enjoy an agent that you have trained yourself, look into ex_enjoy.bash example. Similar to training, change the environment ID, algorithm and robot model. Furthermore, select a specific checkpoint that you want to run. RViZ 2 and Ignition Gazebo GUI client are enabled by default.

ros2 run drl_grasping ex_enjoy.bash


This repository contains environments for robotic manipulation that are compatible with OpenAI Gym. All of these make use of Ignition Gazebo robotic simulator, which is interfaced via Gym-Ignition.

Currently, the following environments are included inside this repository. Take a look at their gym environment registration and source code if you are interested in configuring them. There is a lot of parameters trying different RL approaches and techniques, so it is currently a bit messy (might get cleaned up if I have some free time for it).

  • Grasp task (the focus of this project)
    • Observation variants
      • GraspOctree, with and without color features
      • GraspColorImage (RGB image) and GraspRgbdImage (RGB-D image) are implemented on image_obs branch. However, their implementation is currently only for testing and comparative purposes.
    • Curriculum Learning: Task includes GraspCurriculum, which can be used to progressively increase difficulty of the task by automatically adjusting the following environment parameters based on the current success rate.
      • Workspace size
      • Number of objects
      • Termination state (task is divided into hierarchical sub-tasks with aim to further guide the agent).
        • This part does not bring any improvements based on experimental results, so do not bother using it.
    • Demonstrations: Task contains a simple scripted policy that can be applied to collect demonstrations, which can then be used to pre-load a replay buffer for training with off-policy RL algorithms.
      • It provides a slight increase for early learning, however, experiments indicate that it degrades the final success rate (probably due to introduction of bias early on). Therefore, do not use demonstrations if possible, at least not with this environment.
  • Reach task (a simplistic environment for testing stuff)

Domain Randomization

These environments can be wrapped by a randomizer in order to introduce domain randomization and improve generalization of the trained policies, which is especially beneficial for Sim2Real transfer.

Examples of domain randomization for the Grasp task

The included ManipulationGazeboEnvRandomizer allows randomization of the following properties at each reset of the environment.

  • Object model - primitive geometry
    • Random type (box, sphere and cylinder are currently supported)
    • Random color, scale, mass, friction
  • Object model - mesh geometry
  • Object pose
  • Ground plane texture
  • Initial robot configuration
  • Camera pose

Dataset of Object Models

For dataset of objects with mesh geometry and material texture, this project utilizes Google Scanned Objects collection from Ignition Fuel. You can also try to use a different Fuel collection or just a couple of models stored locally (although some tweaks might be required to support certain models).

All models are automatically configured in several ways before their insertion into the world:

  • Inertial properties are automatically estimated (uniform density is assumed)
  • Collision geometry is decimated in order to improve performance
  • Models can be filtered and automatically blacklisted based on several aspects, e.g too much geometry or disconnected components

This repository includes few scripts that can be used to simplify interaction with the dataset and splitting into training/testing subsets. By default they include 80 training and 20 testing models.

Texture Dataset

DRL_GRASPING_PBR_TEXTURES_DIR environment variable can be exported if ground plane texture should be randomized. It should lead to a directory with the following structure.

├── ./ # Directory pointed to by `DRL_GRASPING_PBR_TEXTURES_DIR`
├── texture_0
  ├── *albedo*.png || *basecolor*.png
  ├── *normal*.png
  ├── *roughness*.png
  └── *specular*.png || *metalness*.png
├── ...
└── texture_n

There are several databases with free PBR textures that you can use. Alternatively, you can clone AndrejOrsula/pbr_textures with 80 training and 20 testing textures.

Supported Robots

Only Franka Emika Panda and UR5 with RG2 gripper are supported. This project currently lacks a more generic solution that would allow to easily utilize arbitrary models, e.g. full-on MoveIt 2 with ros2_control implementation. Adding new models is not complicated though, just time-consuming.

Reinforcement Learning

This project makes direct use of stable-baselines3 as well as sb3_contrib. Furthermore, scripts for training and evaluation are largely inspired by rl-baselines3-zoo.

Octree CNN Features Extractor

The OctreeCnnFeaturesExtractor makes use of O-CNN implementation to enable training on GPU. This features extractor is part of OctreeCnnPolicy policy that is currently implemented for TD3, SAC and TQC algorithms. Network architecture of this feature extractor is illustrated below.

Architecture of octree-based 3D CNN feature extractor


Hyperparameters for training of RL agents can be found in hyperparams directory. Optuna was used to autotune some of them, but certain algorithm/environment combinations require far more tuning (especially TD3). If needed, you can try running Optuna yourself, see ex_optimize example.

Directory Structure

├── drl_grasping        # Primary Python module of this project
    ├── algorithms      # Definitions of policies and slight modifications to RL algorithms
    ├── envs            # Environments for grasping (compatible with OpenAI Gym)
        ├── tasks       # Tasks for the agent that are identical for simulation
        ├── randomizers # Domain randomization of the tasks, which also populates the world
        └── models      # Functional models for the environment (Ignition Gazebo)
    ├── control         # Control for the agent
    ├── perception      # Perception for the agent
    └── utils           # Other utilities, used across the module
├── examples            # Examples for training and enjoying RL agents
├── hyperparams         # Hyperparameters for training RL agents
├── scripts             # Helpful scripts for training, evaluating, ... 
├── launch              # ROS 2 launch scripts that can be used to help with setup
├── docker              # Dockerfile for this project
└── drl_grasping.repos  # List of other dependencies created for `drl_grasping`

In case you have any problems or questions, feel free to open an Issue or a Discussion.

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,123,905
Pytorch (10,726
Reinforcement Learning (3,772
Robotics (3,224
Deep Reinforcement Learning (1,008
Openai Gym (538
Ros2 (311
Related Projects