Awesome Open Source

Programming Languages

Search results for vision transformer

vision-transformer x

248 search results found

Mmdetection ⭐ 26,886

OpenMMLab Detection Toolbox and Benchmark

Latex Ocr ⭐ 8,088

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Transformers Tutorials ⭐ 7,486

This repository contains demos I made with the Transformers library by HuggingFace.

Swinir ⭐ 4,024

SwinIR: Image Restoration Using Swin Transformer (official repository)

Awesome Transformer Attention ⭐ 3,895

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

Efficient Ai Backbones ⭐ 3,770

Efficient AI Backbones including GhostNet, TNT and MLP, developed by Huawei Noah's Ark Lab.

Mmpretrain ⭐ 3,177

OpenMMLab Pre-training Toolbox and Benchmark

Towhee ⭐ 2,903

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Scenic ⭐ 2,733

Scenic: A Jax Library for Computer Vision Research and Beyond

Easycv ⭐ 1,614

An all-in-one toolkit for computer vision

Transformer Explainability ⭐ 1,596

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Cream ⭐ 1,446

This is a collection of our NAS and Vision Transformer work.

EVA Series: Visual Representation Fantasies from BAAI

Vit Adapter ⭐ 1,003

[ICLR 2023 Spotlight] Vision Transformer Adapter for Dense Predictions

Vitpose ⭐ 950

The official repo for [NeurIPS'22] "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation" and [Arxiv'22] "ViTPose+: Vision Transformer Foundation Model for Generic Body Pose Estimation"

Voxformer ⭐ 937

Official PyTorch implementation of VoxFormer [CVPR 2023 Highlight]

T2t Vit ⭐ 921

ICCV2021, Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

VRT: A Video Restoration Transformer (official repository)

Videomae ⭐ 864

[NeurIPS 2022 Spotlight] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

[NeurIPS 2021] You Only Look at One Sequence

Internvideo ⭐ 736

InternVideo: General Video Foundation Models via Generative and Discriminative Learning (https://arxiv.org/abs/2212.03191)

Efficientvit ⭐ 732

EfficientViT is a new family of vision models for efficient high-resolution vision.

One Peace ⭐ 714

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Awesome Attention Mechanism In Cv ⭐ 686

Awesome List of Attention Modules and Plug&Play Modules in Computer Vision

Repository of Vision Transformer with Deformable Attention (CVPR2022) and DAT++: Spatially Dynamic Vision Transformerwith Deformable Attention

Imagenet21k ⭐ 576

Official Pytorch Implementation of: "ImageNet-21K Pretraining for the Masses"(NeurIPS, 2021) paper

How Do Vits Work ⭐ 571

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

Vision Centric Bev Perception ⭐ 541

Vision-Centric BEV Perception: A Survey

Fastervit ⭐ 539

[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention

Openmixup ⭐ 538

CAIRI Supervised, Semi- and Self-Supervised Visual Representation Learning Toolbox and Benchmark

Scene Text Recognition with Permuted Autoregressive Sequence Models (ECCV 2022)

[ICML 2023] Official PyTorch implementation of Global Context Vision Transformers

Transformer_for_medical_image_analysis ⭐ 412

A collection of papers about Transformer in the field of medical image analysis.

UNetFormer: A UNet-like transformer for efficient semantic segmentation of remote sensing urban scene imagery, ISPRS. Also, including other vision transformers and CNNs for satellite, aerial image and UAV image segmentation.

Vitae Transformer Remote Sensing ⭐ 379

A comprehensive list of our research works related to remote sensing, including papers, codes, and citations. Note: The repo for [TGRS'22] "An Empirical Study of Remote Sensing Pretraining" has been moved to: https://github.com/ViTAE-Transformer/RSP

LiBai(李白): A Toolbox for Large-Scale Distributed Parallel Training

Awesome Foundation Models ⭐ 366

A curated list of foundation models for vision and language tasks

Hierarchical Image Pyramid Transformer - CVPR 2022 (Oral)

[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...

Official Pytorch Implementation for "Splicing ViT Features for Semantic Appearance Transfer" presenting "Splice" (CVPR 2022 Oral)

[ICCV 2023] You Only Look at One Partial Sequence

[NeurIPS 2021] [T-PAMI] Global Filter Networks for Image Classification

Swin2sr ⭐ 303

Swin2SR: SwinV2 Transformer for Compressed Image Super-Resolution and Restoration. Advances in Image Manipulation (AIM) workshop ECCV 2022. Try it out! over 1.8 M runs https://replicate.com/mv-lab/swin2sr

Crossformer ⭐ 302

The official code for the paper: https://openreview.net/forum?id=_PHymLIxuI

Transmorph_transformer_for_medical_image_registration ⭐ 302

TransMorph: Transformer for Unsupervised Medical Image Registration (PyTorch)

[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions

Actionformer_release ⭐ 285

Code release for ActionFormer (ECCV 2022)

[IJCAI 2022] FQ-ViT: Post-Training Quantization for Fully Quantized Vision Transformer

Tensorflow Image Models ⭐ 273

TensorFlow port of PyTorch Image Models (timm) - image models with pretrained weights.

Alphaclip ⭐ 273

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want

Deep Text Recognition Benchmark ⭐ 268

PyTorch code of my ICDAR 2021 paper Vision Transformer for Fast and Efficient Scene Text Recognition (ViTSTR)

Vit Explain ⭐ 260

Explainability for Vision Transformers

PASSL包含 SimCLR，MoCo v1/v2，BYOL，CLIP，PixPro，simsiam, SwAV, BEiT，MAE 等图像自监督算法以及 Vision Transformer，DEiT，Swin Transformer，CvT，T2T-ViT，MLP-Mixer，XCiT，ConvNeXt，PV 等基础视觉算法

Dehazeformer ⭐ 229

[IEEE TIP] Vision Transformers for Single Image Dehazing

Semantic Segmentation ⭐ 228

SOTA Semantic Segmentation Models in PyTorch

Vitae Transformer Matting ⭐ 223

A comprehensive list [AIM@IJCAI'21, P3M@MM'21, GFM@IJCV'22, RIM@CVPR'23, P3MNet@IJCV'23] of our research works related to image matting, including papers, codes, datasets, demos, and citations. Note: The repo for [IJCV'23] "Rethinking Portrait Matting with Privacy Preserving" has been moved to: https://github.com/ViTAE-Transformer/P3M-Net

Pytorch Vit ⭐ 217

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Visual_token_matching ⭐ 213

[ICLR'23 Oral] Universal Few-shot Learning of Dense Prediction Tasks with Visual Token Matching

Sam Detr ⭐ 211

[CVPR'2022] SAM-DETR & SAM-DETR++: Official PyTorch Implementation

[NeurIPS 2022 Spotlight] This is the official PyTorch implementation of "Fast Vision Transformers with HiLo Attention"

Vt Unet ⭐ 210

[MICCAI2022] This is an official PyTorch implementation for A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation

Eccv2022 Papers With Code Demo ⭐ 207

收集 ECCV 最新的成果，包括论文、代码和demo视频等，欢迎大家推荐！

V2x Vit ⭐ 205

[ECCV2022] Official Implementation of paper "V2X-ViT: Vehicle-to-Everything Cooperative Perception with Vision Transformer"

Interpretdl ⭐ 203

InterpretDL: Interpretation of Deep Learning Models，基于『飞桨』的模型可解释性算法库。

This is an official implementation for "Contextual Transformer Networks for Visual Recognition".

Seq2seqsharp ⭐ 188

Seq2SeqSharp is a tensor based fast & flexible deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, different network types (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported, cross-platforms (Windows, Linux, x86, x64, ARM), multimodal model for text and images and so on.

Vitae Transformer ⭐ 187

The official repo for [NeurIPS'21] "ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias" and [IJCV'22] "ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond"

Vit V Net_for_3d_image_registration_pytorch ⭐ 185

Vision Transformer for 3D medical image registration (Pytorch).

Machinelearning Ai ⭐ 184

This repository contains all the work that I regularly did and studied from Medium blogs, several research papers, and other Repos (related/unrelated to the research papers).

Awesome Mim ⭐ 178

[Survey] Masked Modeling for Self-supervised Representation Learning on Vision and Beyond (https://arxiv.org/abs/2401.00897)

Vision Transformer Pytorch ⭐ 164

Pytorch version of Vision Transformer (ViT) with pretrained models. This is part of CASL (https://casl-project.github.io/) and ASYML project.

Awesome Transformer In Cv ⭐ 162

A Survey on Transformer in CV.

[CVPRW 2022] MANIQA: Multi-dimension Attention Network for No-Reference Image Quality Assessment

MPViT:Multi-Path Vision Transformer for Dense Prediction in CVPR 2022

Mobilevit Pytorch ⭐ 152

A PyTorch implementation of "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer".

'NKD and USKD' (ICCV 2023) and 'ViTKD'

Lm4visualencoding ⭐ 144

[ICLR 2024 (Spotlight)] "Frozen Transformers in Language Models are Effective Visual Encoder Layers"

universal visual model trained on LAION-400M

Visualization ⭐ 142

a collection of visualization function

Vformer ⭐ 138

A modular PyTorch library for vision transformer models

Vit.cpp ⭐ 135

Inference Vision Transformer (ViT) in plain C/C++ with ggml

Lamda Pilot ⭐ 134

🎉 PILOT: A Pre-trained Model-Based Continual Learning Toolbox

Greenmim ⭐ 129

[NeurIPS2022] Official implementation of the paper 'Green Hierarchical Vision Transformer for Masked Image Modeling'.

PyTorch reimplementation of the paper "MaxViT: Multi-Axis Vision Transformer" [arXiv 2022].

Official code for "Top-Down Visual Attention from Analysis by Synthesis" (CVPR 2023 highlight)

KoCLIP: Korean port of OpenAI CLIP, in Flax

Multi Modal Transformer ⭐ 117

The repository collects many various multi-modal transformer architectures, including image transformer, video transformer, image-language transformer, video-language transformer and self-supervised learning models. Additionally, it also collects many useful tutorials and tools in these related domains.

Llava Cpp Server ⭐ 116

LLaVA server (llama.cpp).

Vts Drloc ⭐ 116

NeurIPS 2021, Official codes for "Efficient Training of Visual Transformers with Small Datasets".

Swin Transformer V2 ⭐ 106

PyTorch reimplementation of the paper "Swin Transformer V2: Scaling Up Capacity and Resolution" [CVPR 2022].

Awesome Transformer In Medical Imaging ⭐ 103

[MedIA Journal] An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

Imagenetmodel ⭐ 101

Official ImageNet Model repository

Adaptformer ⭐ 101

[NeurIPS 2022] Implementation of "AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition"

Moganet ⭐ 100

[ICLR 2024] MogaNet: Efficient Multi-order Gated Aggregation Network

Robustvit ⭐ 96

[NeurIPS 2022] Official PyTorch implementation of Optimizing Relevance Maps of Vision Transformers Improves Robustness. This code allows to finetune the explainability maps of Vision Transformers to enhance robustness.

Tutorial ⭐ 94

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python and R)

Aiatrack ⭐ 90

[ECCV'22] The official PyTorch implementation of our ECCV 2022 paper: "AiATrack: Attention in Attention for Transformer Visual Tracking".

[TPAMI 2024] This is the official repository for our paper: ''Pruning Self-attentions into Convolutional Layers in Single Path''.

Boosting Crowd Counting Via Multifaceted Attention ⭐ 87

Official Implement of CVPR 2022 paper 'Boosting Crowd Counting via Multifaceted Attention'

Fgvc Pim ⭐ 86

Pytorch implementation for "A Novel Plug-in Module for Fine-Grained Visual Classification". fine-grained visual classification task.

Related Searches

Deep Learning Vision Transformer (105)

Python Vision Transformer (72)

1-100 of 248 search results

Privacy | About | Terms | Follow Us On Twitter

Copyright 2018-2024 Awesome Open Source. All rights reserved.