Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Deep Learning For Image Processing | 14,599 | 22 days ago | 28 | gpl-3.0 | Python | |||||
deep learning for image processing including classification and object-detection etc. | ||||||||||
Labelme | 9,896 | 8 | 8 | 17 days ago | 177 | March 03, 2022 | 67 | other | Python | |
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation). | ||||||||||
Jetson Inference | 6,243 | 2 days ago | 938 | mit | C++ | |||||
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson. | ||||||||||
Pyaudioanalysis | 4,973 | 11 | 8 | 6 months ago | 23 | February 07, 2022 | 184 | apache-2.0 | Python | |
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications | ||||||||||
Paddlex | 4,165 | 1 | a month ago | 54 | December 10, 2021 | 477 | apache-2.0 | Python | ||
PaddlePaddle End-to-End Development Toolkit(『飞桨』深度学习全流程开发工具) | ||||||||||
Pointnet | 3,907 | 6 months ago | 174 | other | Python | |||||
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation | ||||||||||
Catalyst | 3,102 | 19 | 10 | 10 hours ago | 108 | April 29, 2022 | 4 | apache-2.0 | Python | |
Accelerated deep learning R&D | ||||||||||
Imgclsmob | 2,399 | 4 | a year ago | 67 | September 21, 2021 | 6 | mit | Python | ||
Sandbox for training deep learning networks | ||||||||||
Awesome Deeplearning | 2,048 | 15 days ago | 477 | apache-2.0 | Jupyter Notebook | |||||
深度学习入门课、资深课、特色课、学术案例、产业实践案例、深度学习知识百科及面试题库The course, case and knowledge of Deep Learning and AI | ||||||||||
Pointnet_pointnet2_pytorch | 1,796 | 7 months ago | 80 | mit | Python | |||||
PointNet and PointNet++ implemented by pytorch (pure python) and on ModelNet, ShapeNet and S3DIS. |
This repository hosts the official TensorFlow implementation of MAXViT models:
MaxViT: Multi-Axis Vision Transformer. ECCV 2022.
Zhengzhong Tu, Hossein Talebi, Han Zhang, Feng Yang, Peyman Milanfar, Alan Bovik, and Yinxiao Li
Google Research, University of Texas at Austin
Disclaimer: This is not an officially supported Google product.
News:
MaxViT is a family of hybrid (CNN + ViT) image classification models, that achieves better performances across the board for both parameter and FLOPs efficiency than both SoTA ConvNets and Transformers. They can also scale well on large dataset sizes like ImageNet-21K. Notably, due to the linear-complexity of the grid attention used, MaxViT is able to ''see'' globally throughout the entire network, even in earlier, high-resolution stages.
MaxViT meta-architecture:
Results on ImageNet-1k train and test:
Results on ImageNet-21k and JFT pre-trained models:
We have released a Google Colab Demo on the tutorials of how to run MaxViT on images. Try it here
We have provided a list of results and checkpoints as follows:
Name | Resolution | Top1 Acc. | #Params | FLOPs | Model |
---|---|---|---|---|---|
MaxViT-T | 224x224 | 83.62% | 31M | 5.6B | ckpt |
MaxViT-T | 384x384 | 85.24% | 31M | 17.7B | ckpt |
MaxViT-T | 512x512 | 85.72% | 31M | 33.7B | ckpt |
MaxViT-S | 224x224 | 84.45% | 69M | 11.7B | ckpt |
MaxViT-S | 384x384 | 85.74% | 69M | 36.1B | ckpt |
MaxViT-S | 512x512 | 86.19% | 69M | 67.6B | ckpt |
MaxViT-B | 224x224 | 84.95% | 119M | 24.2B | ckpt |
MaxViT-B | 384x384 | 86.34% | 119M | 74.2B | ckpt |
MaxViT-B | 512x512 | 86.66% | 119M | 138.5B | ckpt |
MaxViT-L | 224x224 | 85.17% | 212M | 43.9B | ckpt |
MaxViT-L | 384x384 | 86.40% | 212M | 133.1B | ckpt |
MaxViT-L | 512x512 | 86.70% | 212M | 245.4B | ckpt |
Here are a list of ImageNet-21K pretrained and ImageNet-1K finetuned models:
Name | Resolution | Top1 Acc. | #Params | FLOPs | 21k model | 1k model |
---|---|---|---|---|---|---|
MaxViT-B | 224x224 | - | 119M | 24.2B | ckpt | - |
MaxViT-B | 384x384 | - | 119M | 74.2B | - | ckpt |
MaxViT-B | 512x512 | - | 119M | 138.5B | - | ckpt |
MaxViT-L | 224x224 | - | 212M | 43.9B | ckpt | - |
MaxViT-L | 384x384 | - | 212M | 133.1B | - | ckpt |
MaxViT-L | 512x512 | - | 212M | 245.4B | - | ckpt |
MaxViT-XL | 224x224 | - | 475M | 97.8B | ckpt | - |
MaxViT-XL | 384x384 | - | 475M | 293.7B | - | ckpt |
MaxViT-XL | 512x512 | - | 475M | 535.2B | - | ckpt |
Should you find this repository useful, please consider citing:
@article{tu2022maxvit,
title={MaxViT: Multi-Axis Vision Transformer},
author={Tu, Zhengzhong and Talebi, Hossein and Zhang, Han and Yang, Feng and Milanfar, Peyman and Bovik, Alan and Li, Yinxiao},
journal={ECCV},
year={2022},
}
Acknowledgement: This repository is built on the EfficientNets and CoAtNet.