This is a pytorch lib with state-of-the-art architectures, pretrained models and real-time updated results.

This repository aims to accelarate the advance of Deep Learning Research, make reproducible results and easier for doing researches, and in Pytorch.

Including Papers (to be updated):

Attention Models

  • SENet: Squeeze-and-excitation Networks (paper)
  • SKNet: Selective Kernel Networks (paper)
  • CBAM: Convolutional Block Attention Module (paper)
  • GCNet: GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond (paper)
  • BAM: Bottleneck Attention Module (paper)
  • SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks (paper)
  • SRMNet: SRM: A Style-based Recalibration Module for Convolutional Neural Networks (paper)

Non-Attention Models

  • OctNet: Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution (paper)
  • Bag of Tricks for Image Classification with Convolutional Neural Networks (paper)
  • Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer (to appear)
  • Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay (to appear)
  • mixup: Beyond Empirical Risk Minimization (paper)
  • CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features (paper)

Trained Models and Performance Table

Single crop validation error on ImageNet-1k (center 224x224 crop from resized image with shorter side = 256).

classifiaction training settings for media and large models
Details RandomResizedCrop, RandomHorizontalFlip; 0.1 init lr, total 100 epochs, decay at every 30 epochs; SGD with naive softmax cross entropy loss, 1e-4 weight decay, 0.9 momentum, 8 gpus, 32 images per gpu
Examples ResNet50
Note The newest code adds one default operation: setting all bias wd = 0, please refer to the theoretical analysis of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay" (to appear), thereby the training accuracy can be slightly boosted
classifiaction training settings for mobile/small models
Details RandomResizedCrop, RandomHorizontalFlip; 0.4 init lr, total 300 epochs, 5 linear warm up epochs, cosine lr decay; SGD with softmax cross entropy loss and label smoothing 0.1, 4e-5 weight decay on conv weights, 0 weight decay on all other weights, 0.9 momentum, 8 gpus, 128 images per gpu
Examples ShuffleNetV2

Typical Training & Testing Tips:

Small Models


python -m torch.distributed.launch --nproc_per_node=8 --cos -a shufflenetv2_1x --data /path/to/imagenet1k/ \
--epochs 300 --wd 4e-5 --gamma 0.1 -c checkpoints/imagenet/shufflenetv2_1x --train-batch 128 --opt-level O0 --nowd-bn # Triaing

python -m torch.distributed.launch --nproc_per_node=2 -a shufflenetv2_1x --data /path/to/imagenet1k/ \
-e --resume ../pretrain/shufflenetv2_1x.pth.tar --test-batch 100 --opt-level O0 # Testing, ~69.6% top-1 Acc

Large Models


python -W ignore -a sge_resnet101 --data /path/to/imagenet1k/ --epochs 100 --schedule 30 60 90 \
--gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --gpu-id 0,1,2,3,4,5,6,7 # Training

python -m torch.distributed.launch --nproc_per_node=8 -a sge_resnet101 --data /path/to/imagenet1k/ \ 
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/sge_resnet101 --train-batch 32 \ 
--opt-level O0 --wd-all --label-smoothing 0. --warmup 0 # Training (faster) 
python -W ignore -a sge_resnet101 --data /path/to/imagenet1k/ --gpu-id 0,1 -e --resume ../pretrain/sge_resnet101.pth.tar \
# Testing ~78.8% top-1 Acc

python -m torch.distributed.launch --nproc_per_node=2 -a sge_resnet101 --data /path/to/imagenet1k/ -e --resume \
../pretrain/sge_resnet101.pth.tar --test-batch 100 --opt-level O0 # Testing (faster) ~78.8% top-1 Acc

WS-ResNet with e-shifted L2 regularizer, e = 1e-3

python -m torch.distributed.launch --nproc_per_node=8 -a ws_resnet50 --data /share1/public/public/imagenet1k/ \
--epochs 100 --schedule 30 60 90 --wd 1e-4 --gamma 0.1 -c checkpoints/imagenet/es1e-3_ws_resnet50 --train-batch 32 \
--opt-level O0 --label-smoothing 0. --warmup 0 --nowd-conv --mineps 1e-3 --el2

Results of "SGENet: Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks"

Note the following results (old) do not set the bias wd = 0 for large models


Model #P GFLOPs Top-1 Acc Top-5 Acc Download1 Download2 log
ShuffleNetV2_1x 2.28M 0.151 69.6420 88.7200 GoogleDrive shufflenetv2_1x.log
ResNet50 25.56M 4.122 76.3840 92.9080 BaiduDrive(zuvx) GoogleDrive old_resnet50.log
SE-ResNet50 28.09M 4.130 77.1840 93.6720
SK-ResNet50* 26.15M 4.185 77.5380 93.7000 BaiduDrive(tfwn) GoogleDrive sk_resnet50.log
BAM-ResNet50 25.92M 4.205 76.8980 93.4020 BaiduDrive(z0h3) GoogleDrive bam_resnet50.log
CBAM-ResNet50 28.09M 4.139 77.6260 93.6600 BaiduDrive(bram) GoogleDrive cbam_resnet50.log
SGE-ResNet50 25.56M 4.127 77.5840 93.6640 BaiduDrive(gxo9) GoogleDrive sge_resnet50.log
ResNet101 44.55M 7.849 78.2000 93.9060 BaiduDrive(js5t) GoogleDrive old_resnet101.log
SE-ResNet101 49.33M 7.863 78.4680 94.1020 BaiduDrive(j2ox) GoogleDrive se_resnet101.log
SK-ResNet101* 45.68M 7.978 78.7920 94.2680 BaiduDrive(boii) GoogleDrive sk_resnet101.log
BAM-ResNet101 44.91M 7.933 78.2180 94.0180 BaiduDrive(4bw6) GoogleDrive bam_resnet101.log
CBAM-ResNet101 49.33M 7.879 78.3540 94.0640 BaiduDrive(syj3) GoogleDrive cbam_resnet101.log
SGE-ResNet101 44.55M 7.858 78.7980 94.3680 BaiduDrive(wqn6) GoogleDrive sge_resnet101.log

Here SK-ResNet* is a modified version (for more fair comparison with ResNet backbone here) of original SKNet. The original SKNets perform stronger, and the pytorch version can be referred in pppLang-SKNet.


Model #p GFLOPs Detector Neck AP50:95 (%) AP50 (%) AP75 (%) Download
ResNet50 23.51M 88.0 Faster RCNN FPN 37.5 59.1 40.6 GoogleDrive
SGE-ResNet50 23.51M 88.1 Faster RCNN FPN 38.7 60.8 41.7 GoogleDrive
ResNet50 23.51M 88.0 Mask RCNN FPN 38.6 60.0 41.9 GoogleDrive
SGE-ResNet50 23.51M 88.1 Mask RCNN FPN 39.6 61.5 42.9 GoogleDrive
ResNet50 23.51M 88.0 Cascade RCNN FPN 41.1 59.3 44.8 GoogleDrive
SGE-ResNet50 23.51M 88.1 Cascade RCNN FPN 42.6 61.4 46.2 GoogleDrive
ResNet101 42.50M 167.9 Faster RCNN FPN 39.4 60.7 43.0 GoogleDrive
SE-ResNet101 47.28M 168.3 Faster RCNN FPN 40.4 61.9 44.2 GoogleDrive
SGE-ResNet101 42.50M 168.1 Faster RCNN FPN 41.0 63.0 44.3 GoogleDrive
ResNet101 42.50M 167.9 Mask RCNN FPN 40.4 61.6 44.2 GoogleDrive
SE-ResNet101 47.28M 168.3 Mask RCNN FPN 41.5 63.0 45.3 GoogleDrive
SGE-ResNet101 42.50M 168.1 Mask RCNN FPN 42.1 63.7 46.1 GoogleDrive
ResNet101 42.50M 167.9 Cascade RCNN FPN 42.6 60.9 46.4 GoogleDrive
SE-ResNet101 47.28M 168.3 Cascade RCNN FPN 43.4 62.2 47.2 GoogleDrive
SGE-ResNet101 42.50M 168.1 Cascade RCNN FPN 44.4 63.2 48.4 GoogleDrive

Results of "Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer"

Note that the following models are with bias wd = 0.


Model Top-1 Download
WS-ResNet50 76.74 GoogleDrive
WS-ResNet50(e = 1e-3) 76.86 GoogleDrive
WS-ResNet101 78.07 GoogleDrive
WS-ResNet101(e = 1e-6) 78.29 GoogleDrive
WS-ResNeXt50(e = 1e-3) 77.88 GoogleDrive
WS-ResNeXt101(e = 1e-3) 78.80 GoogleDrive
WS-DenseNet201(e = 1e-8) 77.59 GoogleDrive
WS-ShuffleNetV1(e = 1e-8) 68.09 GoogleDrive
WS-ShuffleNetV2(e = 1e-8) 69.70 GoogleDrive
WS-MobileNetV1(e = 1e-6) 73.60 GoogleDrive

Results of "Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay"

To appear


If you find our related works useful in your research, please consider citing the paper:

  title={Selective Kernel Networks},
  author={Li, Xiang and Wang, Wenhai and Hu, Xiaolin and Yang, Jian},
  journal={IEEE Conference on Computer Vision and Pattern Recognition},

  title={Spatial Group-wise Enhance: Enhancing Semantic Feature Learning in Convolutional Networks},
  author={Li, Xiang and Hu, Xiaolin and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:1905.09646},

  title={Understanding the Disharmony between Weight Normalization Family and Weight Decay: e-shifted L2 Regularizer},
  author={Li, Xiang and Chen, Shuo and Yang, Jian},
  journal={arXiv preprint arXiv:},

  title={Generalization Bound Regularizer: A Unified Framework for Understanding Weight Decay},
  author={Li, Xiang and Chen, Shuo and Gong, Chen and Xia, Yan and Yang, Jian},
  journal={arXiv preprint arXiv:},

