This is our PyTorch implementation of Multi-level Scene Description Network (MSDN) proposed in our ICCV 2017 paper.
Alternatives To Msdn
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
2 months ago72mitJupyter Notebook
NeRF (Neural Radiance Fields) and NeRF in the Wild using pytorch-lightning
9 months ago78apache-2.0Python
A pytorch re-implementation of Real-time Scene Text Detection with Differentiable Binarization
Nerf Pytorch414
2 years ago20otherPython
A PyTorch re-implementation of Neural Radiance Fields
2 years ago14apache-2.0Python
A scene text recognition toolbox based on PyTorch
Awesome Gcn377
4 years ago1
resources for graph convolutional networks (图卷积神经网络相关资源)
Csrnet Pytorch364
5 years ago50Jupyter Notebook
CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes
Neural Motifs348
4 years ago26mitPython
Code for Neural Motifs: Scene Graph Parsing with Global Context (CVPR 2018)
3 years ago2mitPython
ASTER in Pytorch
2 years ago13mitPython
Codebase and pretrained models for ECCV'18 Unified Perceptual Parsing
Generative Query Network Pytorch274
4 years agootherJupyter Notebook
Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"
Alternatives To Msdn
Select To Compare

Alternative Project Comparisons

Multi-level Scene Description Network

This is our implementation of Multi-level Scene Description Network in Scene Graph Generation from Objects, Phrases and Region Captions. The project is based on PyTorch version of faster R-CNN. (Update: model links have been updated. Sorry for the inconvenience.)


We have released our newly proposed scene graph generation model in our ECCV-2018 paper:

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation.

Check the github repo Factorizable Net if you are interested.


  • [x] README for training
  • [x] README for project settings
  • [x] our trained RPN
  • [x] our trained Full Model
  • [x] Our cleansed Visual Genome Dataset
  • [x] training codes
  • [x] evaluation codes
  • [x] Model acceleration (please refer to our ECCV project).
  • [x] Multi-GPU support: we have release a beta Multi-GPU version of our FactorizableNet. If you want to enhance the training speed, please check that project.

We are still working on the project. If you are interested, please Follow our project.

Project Settings

  1. Install the requirements (you can use pip or Anaconda):

    conda install pip pyyaml sympy h5py cython numpy scipy
    conda install -c menpo opencv3
    conda install -c soumith pytorch torchvision cuda80 
    pip install easydict
  2. Clone the Faster R-CNN repository

    git clone [email protected]:yikang-li/MSDN.git
  3. Build the Cython modules for nms and the roi_pooling layer

    cd MSDN/faster_rcnn
    cd ..
  4. Download the trained full model and trained RPN, and place it to output/trained_model

  5. Download our cleansed Visual Genome dataset. And unzip it:

tar xzvf top_150_50.tgz
  • p.s. Our ipython scripts for data cleansing is also released.
  1. Download Visual Genome images

  2. Place Images and cleansed annotations to coresponding folders:

mkdir -p data/visual_genome
cd data/visual_genome
ln -s /path/to/VG_100K_images_folder VG_100K_images
ln -s /path/to/downloaded_folder top_150_50
  • p.s. You can change the default data directory by modifying __C.IMG_DATA_DIR in faster_rcnn/fast_rcnn/


  • Training in multiple stages. (Single-GPU training may take about one week.)

    1. Training RPN for object proposals and caption region proposals (the shared conv layers are fixed). We also provide our pretrained RPN model.

    by default, the training is done on a small part of the full dataset:


    For full Dataset Training:

     CUDA_VISIBLE_DEVICES=0 python --max_epoch=10 --step_size=2 --dataset_option=normal --model_name=RPN_full_region

    --step_size is set to indicate the number of epochs to decay the learning rate, dataset_option is to indicate the \[ small | fat | normal \] subset.

    1. Training MSDN

    Here, we use SGD (controled by --optimizer)by default:

     CUDA_VISIBLE_DEVICES=0 python --load_RPN --saved_model_path=./output/RPN/RPN_region_full_best.h5  --dataset_option=normal --enable_clip_gradient --step_size=2 --MPS_iter=1 --caption_use_bias --caption_use_dropout --rnn_type LSTM_normal 
  • Furthermore, we can directly use end-to-end training from scratch (not recommended). The result is not good.

     CUDA_VISIBLE_DEVICES=0 python  --dataset_option=normal --enable_clip_gradient  --step_size=3 --MPS_iter=1 --caption_use_bias --caption_use_dropout --max_epoch=11 --optimizer=1 --lr=0.001


Our pretrained full Model is provided for your evaluation for further implementation. (Please download the related files in advance.)


Currently, the accuracy of our released version is slightly different from the reported results in the paper:Recall@50: 11.705%; Recall@100: 14.085%.


We thank longcw for his generously releasing the PyTorch Implementation of Faster R-CNN.


author={Li, Yikang and Ouyang, Wanli and Zhou, Bolei and Wang, Kun and Wang, Xiaogang},
title={Scene graph generation from objects, phrases and region captions},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
year = {2017}


The pre-trained models and the MSDN technique are released for uncommercial use.

Contact Yikang LI if you have questions.

Popular Scene Projects
Popular Pytorch Projects
Popular Graphics Categories

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.