Awesome Open Source
Awesome Open Source


This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.

This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.

Including Papers (to be updated):

Attention Models

  • SENet: Squeeze-and-excitation Networks (paper)
  • SKNet: Selective Kernel Networks (paper)
  • CBAM: Convolutional Block Attention Module (paper)
  • GCNet: GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (paper)
  • BAM: Bottleneck Attention Module (paper)
  • SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks (paper)
  • SRMNet: SRM: A Style-based Recalibration Module for Convolutional Neural Networks (paper)

Non-Attention Models

  • OctNet: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution (paper)
  • Bag of Tricks for Image Classification with Convolutional Neural Networks (paper)
  • Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer (to appear)
  • Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay (to appear)
  • mixup: Beyond Empirical Risk Minimization (paper)
  • CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (paper)

Trained Models and Performance Table

Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256).

classifiaction training settings for media and large models
Details RandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu
Examples ResNet50
Note The newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted
classifiaction training settings for mobile/small models
Details RandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu
Examples ShuffleNetV2

Typical Training & Testing Tips:

Small Models


python -m torch.distributed.launch --nproc_per_node=8 --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing

python -m torch.distributed.launch --nproc_per_node=2 -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc

Large Models


python -W ignore -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training

python -m torch.distributed.launch --nproc_per_node=8 -a sge_resnet101 --data /path/to/imagenet1k/ \ 
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \ 
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster) 
python -W ignore -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc

python -m torch.distributed.launch --nproc_per_node=2 -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc

WS-ResNet with e-shifted L2 regularizer, e = 1e-3

python -m torch.distributed.launch --nproc_per_node=8 -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2

Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"

Note the following results (old) do not set the bias wd = 0 for large models


Model #P GFLOPs Top-1 Acc Top-5 Acc Download1 Download2 log
ShuffleNetV2_1x 2.28M 0.151 69.6420 88.7200 GoogleDrive shufflenetv2_1x.log
ResNet50 25.56M 4.122 76.3840 92.9080 BaiduDrive(zuvx) GoogleDrive old_resnet50.log
SE-ResNet50 28.09M 4.130 77.1840 93.6720
SK-ResNet50* 26.15M 4.185 77.5380 93.7000 BaiduDrive(tfwn) GoogleDrive sk_resnet50.log
BAM-ResNet50 25.92M 4.205 76.8980 93.4020 BaiduDrive(z0h3) GoogleDrive bam_resnet50.log
CBAM-ResNet50 28.09M 4.139 77.6260 93.6600 BaiduDrive(bram) GoogleDrive cbam_resnet50.log
SGE-ResNet50 25.56M 4.127 77.5840 93.6640 BaiduDrive(gxo9) GoogleDrive sge_resnet50.log
ResNet101 44.55M 7.849 78.2000 93.9060 BaiduDrive(js5t) GoogleDrive old_resnet101.log
SE-ResNet101 49.33M 7.863 78.4680 94.1020 BaiduDrive(j2ox) GoogleDrive se_resnet101.log
SK-ResNet101* 45.68M 7.978 78.7920 94.2680 BaiduDrive(boii) GoogleDrive sk_resnet101.log
BAM-ResNet101 44.91M 7.933 78.2180 94.0180 BaiduDrive(4bw6) GoogleDrive bam_resnet101.log
CBAM-ResNet101 49.33M 7.879 78.3540 94.0640 BaiduDrive(syj3) GoogleDrive cbam_resnet101.log
SGE-ResNet101 44.55M 7.858 78.7980 94.3680 BaiduDrive(wqn6) GoogleDrive sge_resnet101.log

Here SK-ResNet* is a modified version (for more fair comparison with ResNet backbone here) of original SKNet. The original SKNets perform stronger, and the pytorch version can be referred in pppLang-SKNet.


Model #p GFLOPs Detector Neck AP50:95 (%) AP50 (%) AP75 (%) Download
ResNet50 23.51M 88.0 Faster RCNN FPN 37.5 59.1 40.6 GoogleDrive
SGE-ResNet50 23.51M 88.1 Faster RCNN FPN 38.7 60.8 41.7 GoogleDrive
ResNet50 23.51M 88.0 Mask RCNN FPN 38.6 60.0 41.9 GoogleDrive
SGE-ResNet50 23.51M 88.1 Mask RCNN FPN 39.6 61.5 42.9 GoogleDrive
ResNet50 23.51M 88.0 Cascade RCNN FPN 41.1 59.3 44.8 GoogleDrive
SGE-ResNet50 23.51M 88.1 Cascade RCNN FPN 42.6 61.4 46.2 GoogleDrive
ResNet101 42.50M 167.9 Faster RCNN FPN 39.4 60.7 43.0 GoogleDrive
SE-ResNet101 47.28M 168.3 Faster RCNN FPN 40.4 61.9 44.2 GoogleDrive
SGE-ResNet101 42.50M 168.1 Faster RCNN FPN 41.0 63.0 44.3 GoogleDrive
ResNet101 42.50M 167.9 Mask RCNN FPN 40.4 61.6 44.2 GoogleDrive
SE-ResNet101 47.28M 168.3 Mask RCNN FPN 41.5 63.0 45.3 GoogleDrive
SGE-ResNet101 42.50M 168.1 Mask RCNN FPN 42.1 63.7 46.1 GoogleDrive
ResNet101 42.50M 167.9 Cascade RCNN FPN 42.6 60.9 46.4 GoogleDrive
SE-ResNet101 47.28M 168.3 Cascade RCNN FPN 43.4 62.2 47.2 GoogleDrive
SGE-ResNet101 42.50M 168.1 Cascade RCNN FPN 44.4 63.2 48.4 GoogleDrive

Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"

Note that the following models are with bias wd = 0.


Model Top-1 Download
WS-ResNet50 76.74 GoogleDrive
WS-ResNet50(e = 1e-3) 76.86 GoogleDrive
WS-ResNet101 78.07 GoogleDrive
WS-ResNet101(e = 1e-6) 78.29 GoogleDrive
WS-ResNeXt50(e = 1e-3) 77.88 GoogleDrive
WS-ResNeXt101(e = 1e-3) 78.80 GoogleDrive
WS-DenseNet201(e = 1e-8) 77.59 GoogleDrive
WS-ShuffleNetV1(e = 1e-8) 68.09 GoogleDrive
WS-ShuffleNetV2(e = 1e-8) 69.70 GoogleDrive
WS-MobileNetV1(e = 1e-6) 73.60 GoogleDrive

Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"

To appear


If you find our related works useful in your research, please consider citing the paper:

  title={Selective Kernel Networks},
  author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},

  title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
  author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:1905.09646},

  title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
  author={Li, Xiang and Chen, Shuo and Yang, Jian},
  journal={arXiv preprint arXiv:},

  title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
  author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:},

Get A Weekly Email With Trending Projects For These Topics
No Spam. Unsubscribe easily at any time.
Python (1,126,504
Pytorch (10,769
Cnn (3,128
Classification (3,033
Detection (1,181
Pretrained Models (298
Convolutional Networks (156
Related Projects