Awesome Open Source
Awesome Open Source

State-of-the-art Single Shot MultiBox Detector in TensorFlow

This repository contains codes of the reimplementation of SSD: Single Shot MultiBox Detector in TensorFlow. If your goal is to reproduce the results in the original paper, please use the official codes.

There are already some TensorFlow based SSD reimplementation codes on GitHub, the main special features of this repo inlcude:

  • state of the art performance(77.8%mAP) when training from VGG-16 pre-trained model (SSD300-VGG16).
  • the model is trained using TensorFlow high level API tf.estimator. Although TensorFlow provides many APIs, the Estimator API is highly recommended to yield scalable, high-performance models.
  • all codes were writen by pure TensorFlow ops (no numpy operation) to ensure the performance and portability.
  • using ssd augmentation pipeline discribed in the original paper.
  • PyTorch-like model definition using high-level tf.layers API for better readability ^-^.
  • high degree of modularity to ease futher development.
  • using replicate_model_fn makes it flexible to use one or more GPUs.

New Update(77.9%mAP): using absolute bbox coordinates instead of normalized coordinates, checkout here.


  • Download Pascal VOC Dataset and reorganize the directory as follows:

     	   |    |->Annotations/
     	   |    |->ImageSets/
     	   |    |->...
     	   |    |->Annotations/
     	   |    |->ImageSets/
     	   |    |->...
     	   |    |->Annotations/
     	   |    |->...

    VOCROOT is your path of the Pascal VOC Dataset.

  • Run the following script to generate TFRecords.

     python dataset/ --dataset_directory=VOCROOT --output_directory=./dataset/tfrecords
  • Download the pre-trained VGG-16 model (reduced-fc) from here and put them into one sub-directory named 'model' (we support SaverDef.V2 by default, the V1 version is also available for sake of compatibility).

  • Run the following script to start training:

  • Run the following script for evaluation and get mAP:


    Note: you need first modify some directory in

  • Run the following script for visualization:


All the codes was tested under TensorFlow 1.6, Python 3.5, Ubuntu 16.04 with CUDA 8.0. If you want to run training by yourself, one decent GPU will be highly recommended. The whole training process for VOC07+12 dataset took ~120k steps in total, and each step (32 samples per-batch) took ~1s on my little workstation with single GTX1080-Ti GPU Card. If you need run training without enough GPU memory you can try half of the current batch size(e.g. 16), try to lower the learning rate and run more steps, watching the TensorBoard until convergency. BTW, the codes here had also been tested under TensorFlow 1.4 with CUDA 8.0, but some modifications to the codes are needed to enable replicate model training, take following steps if you need:

  • copy all the codes of this file to your local file named ''
  • add one more line here to import module 'tf_replicate_model_fn'
  • change 'tf.contrib.estimator' in here and here to 'tf_replicate_model_fn'
  • now the training process should run perfectly
  • before you run '', you should also remove this line because of the interface compatibility

This repo is just created recently, any contribution will be welcomed.

Results (VOC07 Metric)

This implementation(SSD300-VGG16) yield mAP 77.8% on PASCAL VOC 2007 test dataset(the original performance described in the paper is 77.2%mAP), the details are as follows:

sofa bird pottedplant bus diningtable cow bottle horse aeroplane motorbike
78.9 76.2 53.5 85.2 75.5 85.0 48.6 86.7 82.2 83.4
sheep train boat bicycle chair cat tvmonitor person car dog
82.4 87.6 72.7 83.0 61.3 88.2 74.5 79.6 85.3 86.4

You can download the trained model(VOC07+12 Train) from GoogleDrive for further research.

For Chinese friends, you can also download both the trained model and pre-trained vgg16 weights from BaiduYun Drive, access code: tg64.

Here is the training logs and some detection results:

Too Busy TODO

  • Adapting for CoCo Dataset
  • Update version SSD-512
  • Transfer to other backbone networks

Known Issues

  • Got 'TypeError: Expected binary or unicode string, got None' while training
    • Why: There maybe some inconsistent between different TensorFlow version.
    • How: If you got this error, try change the default value of checkpoint_path to './model/vgg16.ckpt' in For more information issue6 and issue9.
  • Nan loss during training
    • Why: This is caused by the default learning rate which is a little higher for some TensorFlow version.

    • How: I don't know the details about the different behavior between different versions. There are two workarounds:

      • Adding warm-up: change some codes here to the following snippet:
      'decay_boundaries', '2000, 80000, 100000',
      'Learning rate decay boundaries by global_step (comma-separated list).')
      'lr_decay_factors', '0.1, 1, 0.1, 0.01',
      'The values of learning_rate decay factor for each segment between boundaries (comma-separated list).')
      • Lower the learning rate and run more steps until convergency.
  • Why this re-implementation perform better than the reported performance
    • I don't know


Use this bibtex to cite this repository:

  title={Single Shot MultiBox Detector in TensorFlow},
  author={Changan Wang},
  journal={GitHub repository},


Welcome to join in QQ Group(758790869) for more discussion

Apache License, Version 2.0

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
python (53,198
tensorflow (2,138
yolo (125
ssd (62
faster-rcnn (55