Alternatives To Pvt
Project NameStarsDownloadsRepos Using ThisPackages Using ThisMost Recent CommitTotal ReleasesLatest ReleaseOpen IssuesLicenseLanguage
Segmentation_models4,09512125 months ago8January 10, 2020237mitPython
Segmentation models with pretrained backbones. Keras and TensorFlow Keras.
Resnest3,07054 months ago897July 07, 202258apache-2.0Python
ResNeSt: Split-Attention Networks
4 years ago121mitPython
A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.
5 months ago26apache-2.0Python
3 years ago1otherPython
FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation.
4 years agoPython
Implementation code of the paper: FishNet: A Versatile Backbone for Image, Region, and Pixel Level Prediction, NeurIPS 2018
3 years ago12otherPython
CenterMask : Real-Time Anchor-Free Instance Segmentation, in CVPR 2020
Panoptic Deeplab333
2 years ago7apache-2.0Python
This is Pytorch re-implementation of our CVPR 2020 paper "Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation" (
Iccv2021 Papers With Code Demo212
a year ago1
ICCV 2021 paper with code
Ds Net181
a year ago6mitPython
[CVPR 2021] Rank 1st in the public leaderboard of SemanticKITTI Panoptic Segmentation (2020-11-16)
Alternatives To Pvt
Select To Compare

Alternative Project Comparisons


  • (2022/08/09) Application examples for polyp segmentation (polyp-pvt) and vision-language modeling.
  • (2020/06/21) Code of PVTv2 is released! PVTv2 largely improves PVTv1 and works better than Swin Transformer with ImageNet-1K pre-training.

Pyramid Vision Transformer

The image is from Transformers: Revenge of the Fallen.

This repository contains the official implementation of PVTv1 & PVTv2 in image classification, object detection, and semantic segmentation tasks.

Model Zoo

Image Classification

Classification configs & weights see >>>here<<<.

  • PVTv2 on ImageNet-1K
Method Size [email protected] #Params (M)
PVTv2-B0 224 70.5 3.7
PVTv2-B1 224 78.7 14.0
PVTv2-B2-Linear 224 82.1 22.6
PVTv2-B2 224 82.0 25.4
PVTv2-B3 224 83.1 45.2
PVTv2-B4 224 83.6 62.6
PVTv2-B5 224 83.8 82.0
  • PVTv1 on ImageNet-1K
Method Size [email protected] #Params (M)
PVT-Tiny 224 75.1 13.2
PVT-Small 224 79.8 24.5
PVT-Medium 224 81.2 44.2
PVT-Large 224 81.7 61.4

Object Detection

Detection configs & weights see >>>here<<<.

  • PVTv2 on COCO

Baseline Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
RetinaNet PVTv2-b0 ImageNet-1K 1x No 37.2 -
RetinaNet PVTv2-b1 ImageNet-1K 1x No 41.2 -
RetinaNet PVTv2-b2 ImageNet-1K 1x No 44.6 -
RetinaNet PVTv2-b3 ImageNet-1K 1x No 45.9 -
RetinaNet PVTv2-b4 ImageNet-1K 1x No 46.1 -
RetinaNet PVTv2-b5 ImageNet-1K 1x No 46.2 -
Mask R-CNN PVTv2-b0 ImageNet-1K 1x No 38.2 36.2
Mask R-CNN PVTv2-b1 ImageNet-1K 1x No 41.8 38.8
Mask R-CNN PVTv2-b2 ImageNet-1K 1x No 45.3 41.2
Mask R-CNN PVTv2-b3 ImageNet-1K 1x No 47.0 42.5
Mask R-CNN PVTv2-b4 ImageNet-1K 1x No 47.5 42.7
Mask R-CNN PVTv2-b5 ImageNet-1K 1x No 47.4 42.5

Advanced Detectors

Method Backbone Pretrain Lr schd Aug box AP mask AP
Cascade Mask R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 50.9 44.0
Cascade Mask R-CNN PVTv2-b2 ImageNet-1K 3x Yes 51.1 44.4
ATSS PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
ATSS PVTv2-b2 ImageNet-1K 3x Yes 49.9 -
GFL PVTv2-b2-Linear ImageNet-1K 3x Yes 49.2 -
GFL PVTv2-b2 ImageNet-1K 3x Yes 50.2 -
Sparse R-CNN PVTv2-b2-Linear ImageNet-1K 3x Yes 48.9 -
Sparse R-CNN PVTv2-b2 ImageNet-1K 3x Yes 50.1 -
  • PVTv1 on COCO
Detector Backbone Pretrain Lr schd box AP mask AP
RetinaNet PVT-Tiny ImageNet-1K 1x 36.7 -
RetinaNet PVT-Small ImageNet-1K 1x 40.4 -
Mask RCNN PVT-Tiny ImageNet-1K 1x 36.7 35.1
Mask RCNN PVT-Small ImageNet-1K 1x 40.4 37.8
DETR PVT-Small ImageNet-1K 50ep 34.7 -

Semantic Segmentation

Segmentation configs & weights see >>>here<<<.

PVT-v2 + Segmentation see >>>here<<<.

  • PVTv1 on ADE20K
Method Backbone Pretrain Iters mIoU
Semantic FPN PVT-Tiny ImageNet-1K 40K 35.7
Semantic FPN PVT-Small ImageNet-1K 40K 39.8
Semantic FPN PVT-Medium ImageNet-1K 40K 41.6
Semantic FPN PVT-Large ImageNet-1K 40K 42.1

Polyp Segmentation

Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers. pdf | code

Vision-Language Modeling

Masked Vision-Language Transformer in Fashion. pdf | code


This repository is released under the Apache 2.0 license as found in the LICENSE file.


If you use this code for a paper, please cite:


  title={Pyramid vision transformer: A versatile backbone for dense prediction without convolutions},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},


  title={Pvtv2: Improved baselines with pyramid vision transformer},
  author={Wang, Wenhai and Xie, Enze and Li, Xiang and Fan, Deng-Ping and Song, Kaitao and Liang, Ding and Lu, Tong and Luo, Ping and Shao, Ling},
  journal={Computational Visual Media},


This repo is currently maintained by Wenhai Wang (@whai362), Enze Xie (@xieenze), and Zhe Chen (@czczup).

Popular Segmentation Projects
Popular Backbone Projects
Popular Machine Learning Categories
Related Searches

Get A Weekly Email With Trending Projects For These Categories
No Spam. Unsubscribe easily at any time.