A tensorflow implementation of Google's MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
The official implementation is avaliable at tensorflow/model.
The official implementation of object detection is now released, see tensorflow/model/object_detection.
YellowFin optimizer has been intergrated, but I have no gpu resources to train on imagenet with it. Call for training ~_~
Official implement click here
|MobileNet||1.0||Same as Inception||66.51%||87.09%|
Environment: Ubuntu 16.04 LTS, Xeon E3-1231 v3, 4 Cores @ 3.40GHz, GTX1060.
TF 1.0.1(native pip install), TF 1.1.0(build from source, optimization flag '-mavx2')
|GPU||3ms||16ms||-||-||-||TF 1.0.1, CUDA8.0, CUDNN5.1|
|GPU||3ms||15ms||-||-||On||TF 1.0.1, CUDA8.0, CUDNN5.1|
Image Size: (224, 224, 3), Batch Size: 1
Prepare imagenet data. Please refer to Google's tutorial for training inception.
Modify './script/train_mobilenet_on_imagenet.sh' according to your environment.
After download KITTI data, you need to split it data into train/val set.
cd /path/to/kitti_root mkdir ImageSets cd ./ImageSets ls ../training/image_2/ | grep ".png" | sed s/.png// > trainval.txt python ./tools/kitti_random_split_train_val.py
kitti_root floder then look like below
kitti_root/ |->training/ | |-> image_2/00****.png | L-> label_2/00****.txt |->testing/ | L-> image_2/00****.png L->ImageSets/ |-> trainval.txt |-> train.txt L-> val.txt
Then convert it into tfrecord.
The code of this subject is largely based on SqueezeDet & SSD-Tensorflow. I would appreciated if you could feed back any bug.
According to the paper, MobileNet has 3.3 Million Parameters, which does not vary based on the input resolution. It means that the number of final model parameters should be larger than 3.3 Million, because of the fc layer.
When using RMSprop training strategy, the checkpoint file size should be almost 3 times as large as the model size, because of some auxiliary parameters used in RMSprop. You can use the inspect_checkpoint.py to figure it out.