Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Cvat | 9,058 | 21 hours ago | 2 | September 08, 2022 | 481 | mit | TypeScript | |||
Annotate better with CVAT, the industry-leading data engine for machine learning. Used and trusted by teams at any scale, for data of any scale. | ||||||||||
Awesome Semantic Segmentation | 8,065 | 2 years ago | 13 | |||||||
:metal: awesome-semantic-segmentation | ||||||||||
Segmentation_models.pytorch | 6,982 | 2 | 34 | 2 days ago | 10 | November 18, 2021 | 26 | mit | Python | |
Segmentation models with pretrained backbones. PyTorch. | ||||||||||
Pytorch Unet | 6,465 | 20 days ago | 49 | gpl-3.0 | Python | |||||
PyTorch implementation of the U-Net for image semantic segmentation with high quality images | ||||||||||
Mmsegmentation | 5,463 | 2 | a day ago | 30 | July 01, 2022 | 296 | apache-2.0 | Python | ||
OpenMMLab Semantic Segmentation Toolbox and Benchmark. | ||||||||||
Gluon Cv | 5,422 | 15 | 44 | 2 months ago | 1,514 | July 07, 2022 | 61 | apache-2.0 | Python | |
Gluon CV Toolkit | ||||||||||
Semantic Segmentation Pytorch | 4,559 | 2 years ago | 1 | September 09, 2021 | 56 | bsd-3-clause | Python | |||
Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset | ||||||||||
Pytorch Semseg | 3,297 | 2 months ago | 3 | February 09, 2018 | 131 | mit | Python | |||
Semantic Segmentation Architectures Implemented in PyTorch | ||||||||||
Imgclsmob | 2,399 | 4 | a year ago | 67 | September 21, 2021 | 6 | mit | Python | ||
Sandbox for training deep learning networks | ||||||||||
Awesome Semantic Segmentation Pytorch | 2,399 | 3 months ago | 114 | apache-2.0 | Python | |||||
Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet) |
This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).
ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: CSAILVision/sceneparsing
If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!
You can also use this colab notebook playground here to tinker with the code for segmenting an image.
All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch
Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing
config/defaults.py
.This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.
The implementation is easy to use as:
For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel
module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.
Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__
function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random
before activating multiple worker in dataloader.
We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config
folder.
Encoder:
Decoder:
IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.
Architecture | MultiScale Testing | Mean IoU | Pixel Accuracy(%) | Overall Score | Inference Speed(fps) |
---|---|---|---|---|---|
MobileNetV2dilated + C1_deepsup | No | 34.84 | 75.75 | 54.07 | 17.2 |
Yes | 33.84 | 76.80 | 55.32 | 10.3 | |
MobileNetV2dilated + PPM_deepsup | No | 35.76 | 77.77 | 56.27 | 14.9 |
Yes | 36.28 | 78.26 | 57.27 | 6.7 | |
ResNet18dilated + C1_deepsup | No | 33.82 | 76.05 | 54.94 | 13.9 |
Yes | 35.34 | 77.41 | 56.38 | 5.8 | |
ResNet18dilated + PPM_deepsup | No | 38.00 | 78.64 | 58.32 | 11.7 |
Yes | 38.81 | 79.29 | 59.05 | 4.2 | |
ResNet50dilated + PPM_deepsup | No | 41.26 | 79.73 | 60.50 | 8.3 |
Yes | 42.14 | 80.13 | 61.14 | 2.6 | |
ResNet101dilated + PPM_deepsup | No | 42.19 | 80.59 | 61.39 | 6.8 |
Yes | 42.53 | 80.91 | 61.72 | 2.0 | |
UperNet50 | No | 40.44 | 79.80 | 60.12 | 8.4 |
Yes | 41.55 | 80.23 | 60.89 | 2.9 | |
UperNet101 | No | 42.00 | 80.79 | 61.40 | 7.8 |
Yes | 42.66 | 81.01 | 61.84 | 2.3 | |
HRNetV2 | No | 42.03 | 80.77 | 61.40 | 5.8 |
Yes | 43.20 | 81.47 | 62.34 | 1.9 |
The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.
The code is developed under the following configurations.
[--gpus GPUS]
accordingly)chmod +x demo_test.sh
./demo_test.sh
This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.
$PATH_IMG
), you can simply do the following:python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG
chmod +x download_ADE20K.sh
./download_ADE20K.sh
$GPUS
) and configuration file ($CFG
) to use. During training, checkpoints by default are saved in folder ckpt
.python3 train.py --gpus $GPUS --cfg $CFG
--gpus 0-7
, or --gpus 0,2,4,6
.For example, you can start with our provided configurations:
python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
python3 train.py TRAIN.num_epoch 10
.VAL.visualize True
in argument to output visualizations as shown in teaser.For example:
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
This library can be installed via pip
to easily integrate with another codebase
pip install git+https://github.com/CSAILVision/[email protected]
Now this library can easily be consumed programmatically. For example
from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule
If you find the code or pre-trained models useful, please cite the following papers:
Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)
@article{zhou2018semantic,
title={Semantic understanding of scenes through the ade20k dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
journal={International Journal on Computer Vision},
year={2018}
}
Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)
@inproceedings{zhou2017scene,
title={Scene Parsing through ADE20K Dataset},
author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
year={2017}
}