CVPR 2022 论文和开源项目合集(papers with code)!
CVPR 2022 收录列表ID:https://drive.google.com/file/d/15JFhfPboKdUcIH9LdbCMUFmGq_JhaxhC/view
注1:欢迎各位大佬提交issue,分享CVPR 2022论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://awesomeopensource.com/project/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
A ConvNet for the 2020s
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Code: https://awesomeopensource.com/project/megvii-research/RepLKNet
Code2: https://awesomeopensource.com/project/DingXiaoH/RepLKNet-pytorch
MPViT : Multi-Path Vision Transformer for Dense Prediction
Mobile-Former: Bridging MobileNet and Transformer
MetaFormer is Actually What You Need for Vision
Shunted Self-Attention via Multi-Scale Token Aggregation
TVConv: Efficient Translation Variant Convolution for Layout-aware Visual Processing
Learned Queries for Efficient Local Attention
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
HairCLIP: Design Your Hair by Text and Reference Image
PointCLIP: Point Cloud Understanding by CLIP
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
Style Transformer for Image Inversion and Editing
Unsupervised Image-to-Image Translation with Generative Prior
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
OSSGAN: Open-set Semi-supervised Image Generation
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
OrphicX: A Causality-Inspired Latent Variable Model for Interpreting Graph Neural Networks
RepMLPNet: Hierarchical Vision MLP with Re-parameterized Locality
β-DARTS: Beta-Decay Regularization for Differentiable Architecture Search
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields
Homepage: https://jonbarron.info/mipnerf360/
Point-NeRF: Point-based Neural Radiance Fields
NeRF in the Dark: High Dynamic Range View Synthesis from Noisy Raw Images
Urban Radiance Fields
Homepage: https://urban-radiance-fields.github.io/
Pix2NeRF: Unsupervised Conditional π-GAN for Single Image to Neural Radiance Fields Translation
HumanNeRF: Free-viewpoint Rendering of Moving People from Monocular Video
ImFace: A Nonlinear 3D Morphable Face Model with Implicit Neural Representations
Retrieval Augmented Classification for Long-Tail Visual Recognition
MPViT : Multi-Path Vision Transformer for Dense Prediction
MetaFormer is Actually What You Need for Vision
Mobile-Former: Bridging MobileNet and Transformer
Shunted Self-Attention via Multi-Scale Token Aggregation
Learned Queries for Efficient Local Attention
Language-based Video Editing via Multi-Modal Multi-Level Transformer
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Embracing Single Stride 3D Object Detector with Sparse Transformer
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Spatio-temporal Relation Modeling for Few-shot Action Recognition
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
GroupViT: Semantic Segmentation Emerges from Text Supervision
Homepage: https://jerryxu.net/GroupViT/
Restormer: Efficient Transformer for High-Resolution Image Restoration
Splicing ViT Features for Semantic Appearance Transfer
Self-supervised Video Transformer
Homepage: https://kahnchana.github.io/svt/
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
Accelerating DETR Convergence via Semantic-Aligned Matching
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Style Transformer for Image Inversion and Editing
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
Mask Transfiner for High-Quality Instance Segmentation
Language as Queries for Referring Video Object Segmentation
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Collaborative Transformers for Grounded Situation Recognition
NFormer: Robust Person Re-identification with Neighbor Transformer
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Safe Self-Refinement for Transformer-based Domain Adaptation
Fast Point Transformer
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
Conditional Prompt Learning for Vision-Language Models
Bridging Video-text Retrieval with Multiple Choice Question
Visual Abductive Reasoning
UniVIP: A Unified Framework for Self-Supervised Visual Pre-training
Crafting Better Contrastive Views for Siamese Representation Learning
HCSC: Hierarchical Contrastive Selective Coding
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
TeachAugment: Data Augmentation Optimization Using Teacher Knowledge
AlignMixup: Improving Representations By Interpolating Aligned Features
Decoupled Knowledge Distillation
BoxeR: Box-Attention for 2D and 3D Transformers
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising
Accelerating DETR Convergence via Semantic-Aligned Matching
Localization Distillation for Dense Object Detection
Focal and Global Knowledge Distillation for Detectors
A Dual Weighting Label Assignment Scheme for Object Detection
AdaMixer: A Fast-Converging Query-Based Object Detector
Omni-DETR: Omni-Supervised Object Detection with Transformers
SIGMA: Semantic-complete Graph Matching for Domain Adaptive Object Detection
Dense Learning based Semi-Supervised Object Detection
Correlation-Aware Deep Tracking
TCTrack: Temporal Contexts for Aerial Tracking
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
Learning of Global Objective for Network Flow in Multi-Object Tracking
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Novel Class Discovery in Semantic Segmentation
Deep Hierarchical Semantic Segmentation
Rethinking Semantic Segmentation: A Prototype View
Class Re-Activation Maps for Weakly-Supervised Semantic Segmentation
Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers
CLIMS: Cross Language Image Matching for Weakly Supervised Semantic Segmentation
CCAM: Contrastive learning of Class-agnostic Activation Map for Weakly Supervised Object Localization and Semantic Segmentation
FIFO: Learning Fog-invariant Features for Foggy Scene Segmentation
Regional Semantic Contrast and Aggregation for Weakly Supervised Semantic Segmentation
ST++: Make Self-training Work Better for Semi-supervised Semantic Segmentation
Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation
Towards Fewer Annotations: Active Learning via Region Impurity and Prediction Uncertainty for Domain Adaptive Semantic Segmentation
DAFormer: Improving Network Architectures and Training Strategies for Domain-Adaptive Semantic Segmentation
GroupViT: Semantic Segmentation Emerges from Text Supervision
Generalized Few-shot Semantic Segmentation
BoxeR: Box-Attention for 2D and 3D Transformers
E2EC: An End-to-End Contour-based Method for High-Quality High-Speed Instance Segmentation
Mask Transfiner for High-Quality Instance Segmentation
Open-World Instance Segmentation: Exploiting Pseudo Ground Truth From Learned Pairwise Affinity
FreeSOLO: Learning to Segment Objects without Annotations
Efficient Video Instance Segmentation via Tracklet Query and Proposal
Temporally Efficient Vision Transformer for Video Instance Segmentation
Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
Integrative Few-Shot Learning for Classification and Segmentation
Learning to Affiliate: Mutual Centralized Learning for Few-shot Classification
Learning What Not to Segment: A New Perspective on Few-Shot Segmentation
Integrative Few-Shot Learning for Classification and Segmentation
Dynamic Prototype Convolution Network for Few-Shot Semantic Segmentation
Boosting Robustness of Image Matting with Context Assembling and Strong Data Augmentation
Self-supervised Video Transformer
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Paper(Oral): https://arxiv.org/abs/2204.03646
Dataset: https://awesomeopensource.com/project/xujinglin/FineDiving
Code: https://awesomeopensource.com/project/xujinglin/FineDiving
Dual-AI: Dual-path Actor Interaction Learning for Group Activity Recognition
Spatio-temporal Relation Modeling for Few-shot Action Recognition
End-to-End Semi-Supervised Learning for Video Action Detection
Style Transformer for Image Inversion and Editing
Blended Diffusion for Text-driven Editing of Natural Images
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Homepage: https://semanticstylegan.github.io/
ISNAS-DIP: Image-Specific Neural Architecture Search for Deep Image Prior
Restormer: Efficient Transformer for High-Resolution Image Restoration
Robust Equivariant Imaging: a fully unsupervised framework for learning to image from noisy and partial measurements
Learning the Degradation Distribution for Blind Image Super-Resolution
BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment
Look Back and Forth: Video Super-Resolution with Explicit Temporal Difference Modeling
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Learning to Deblur using Light Field Generated and Real Defocus Images
Homepage: http://lyruan.com/Projects/DRBNet/
Paper(Oral): https://arxiv.org/abs/2204.00442
Code: https://awesomeopensource.com/project/lingyanruan/DRBNet
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Homepage: https://point-bert.ivg-research.xyz/
Code: https://awesomeopensource.com/project/lulutang0608/Point-BERT
A Unified Query-based Paradigm for Point Cloud Understanding
CrossPoint: Self-Supervised Cross-Modal Contrastive Learning for 3D Point Cloud Understanding
PointCLIP: Point Cloud Understanding by CLIP
Fast Point Transformer
RCP: Recurrent Closest Point for Scene Flow Estimation on 3D Point Clouds
The Devil is in the Pose: Ambiguity-free 3D Rotation-invariant Learning via Pose-aware Convolution
Not All Points Are Equal: Learning Highly Efficient Point-based Detectors for 3D LiDAR Point Clouds
Paper(Oral): https://arxiv.org/abs/2203.11139
Code: https://awesomeopensource.com/project/yifanzhang713/IA-SSD
BoxeR: Box-Attention for 2D and 3D Transformers
Embracing Single Stride 3D Object Detector with Sparse Transformer
Canonical Voting: Towards Robust Oriented Bounding Box Detection in 3D Scenes
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer
HyperDet3D: Learning a Scene-conditioned 3D Object Detector
OccAM's Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
Scribble-Supervised LiDAR Semantic Segmentation
Stratified Transformer for 3D Point Cloud Segmentation
Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds
PTTR: Relational 3D Point Cloud Object Tracking with Transformer
MHFormer: Multi-Hypothesis Transformer for 3D Human Pose Estimation
MixSTE: Seq2seq Mixed Spatio-Temporal Encoder for 3D Human Pose Estimation in Video
Distribution-Aware Single-Stage Models for Multi-Person 3D Pose Estimation
BEV: Putting People in their Place: Monocular Regression of 3D People in Depth
MonoScene: Monocular 3D Semantic Scene Completion
BANMo: Building Animatable 3D Neural Models from Many Casual Videos
NFormer: Robust Person Re-identification with Neighbor Transformer
Zoom In and Out: A Mixed-scale Triplet Network for Camouflaged Object Detection
NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation
OmniFusion: 360 Monocular Depth Estimation via Geometry-Aware Fusion
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
P3Depth: Monocular Depth Estimation with a Piecewise Planarity Prior
Multi-Frame Self-Supervised Depth with Transformers
Code: None
ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching
ClusterGNN: Cluster-based Coarse-to-Fine Graph Neural Network for Efficient Feature Matching
Rethinking Efficient Lane Detection via Curve Modeling
A Keypoint-based Global Association Network for Lane Detection
Imposing Consistency for Optical Flow Estimation
Deep Equilibrium Optical Flow Estimation
GMFlow: Learning Optical Flow via Global Matching
Incremental Transformer Structure Enhanced Image Inpainting with Masking Positional Encoding
Correlation Verification for Image Retrieval
AdaFace: Quality Adaptive Margin for Face Recognition
Leveraging Self-Supervision for Cross-Domain Crowd Counting
BoostMIS: Boosting Medical Image Semi-supervised Learning with Adaptive Pseudo Labeling and Informative Active Annotation
Anti-curriculum Pseudo-labelling for Semi-supervised Medical Image Classification
DiRA: Discriminative, Restorative, and Adversarial Learning for Self-supervised Medical Image Analysis
StyleGAN-V: A Continuous Video Generator with the Price, Image Quality and Perks of StyleGAN2
Homepage: https://universome.github.io/stylegan-v
Code: https://awesomeopensource.com/project/universome/stylegan-v
Demo: https://kaust-cair.s3.amazonaws.com/stylegan-v/stylegan-v.mp4
SGTR: End-to-end Scene Graph Generation with Transformer
Language as Queries for Referring Video Object Segmentation
ReSTR: Convolution-free Referring Image Segmentation Using Transformers
Gait Recognition in the Wild with Dense 3D Representations and A Benchmark
StyleMesh: Style Transfer for Indoor 3D Scene Reconstructions
Homepage: https://lukashoel.github.io/stylemesh/
Code: https://awesomeopensource.com/project/lukasHoel/stylemesh
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
Self-Supervised Predictive Convolutional Attentive Block for Anomaly Detection
Shadows can be Dangerous: Stealthy and Effective Physical-world Adversarial Attack by Natural Phenomenon
LAS-AT: Adversarial Training with Learnable Attack Strategy
Segment and Complete: Defending Object Detectors against Adversarial Patch Attacks with Robust Patch Detection
Weakly Supervised Object Localization as Domain Adaption
Exploiting Temporal Relations on Radar Perception for Autonomous Driving
Mask-guided Spectral-wise Transformer for Efficient Hyperspectral Image Reconstruction
Deep Rectangling for Image Stitching: A Learning Baseline
Paper(Oral): https://arxiv.org/abs/2203.03831
Code: https://awesomeopensource.com/project/nie-lang/DeepRectangling
Dataset: https://awesomeopensource.com/project/nie-lang/DeepRectangling
Deep 3D-to-2D Watermarking: Embedding Messages in 3D Meshes and Extracting Them from 2D Renderings
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
Collaborative Transformers for Grounded Situation Recognition
Unseen Classes at a Later Time? No Problem
Detecting Deepfakes with Self-Blended Images
Paper(Oral): https://arxiv.org/abs/2204.08376
Code: https://awesomeopensource.com/project/mapooon/SelfBlendedImages
It's About Time: Analog Clock Reading in the Wild
Toward Practical Self-Supervised Monocular Indoor Depth Estimation
Kubric: A scalable dataset generator
Scribble-Supervised LiDAR Semantic Segmentation
Deep Rectangling for Image Stitching: A Learning Baseline
ObjectFolder 2.0: A Multisensory Object Dataset for Sim2Real Transfer
Shape from Polarization for Complex Scenes in the Wild
Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline
TransRAC: Encoding Multi-scale Temporal Correlation with Transformers for Repetitive Action Counting
FineDiving: A Fine-grained Dataset for Procedure-aware Action Quality Assessment
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection
A New Dataset and Transformer for Stereoscopic Video Super-Resolution
Putting People in their Place: Monocular Regression of 3D People in Depth
Homepage: https://arthur151.github.io/BEV/BEV.html
Dataset: https://awesomeopensource.com/project/Arthur151/Relative_Human
UBnormal: New Benchmark for Supervised Open-Set Video Anomaly Detection
DanceTrack: Multi-Object Tracking in Uniform Appearance and Diverse Motion
Visual Abductive Reasoning
Large-scale Video Panoptic Segmentation in the Wild: A Benchmark
Language-based Video Editing via Multi-Modal Multi-Level Transformer
It's About Time: Analog Clock Reading in the Wild
Splicing ViT Features for Semantic Appearance Transfer
Visual Abductive Reasoning
Kubric: A scalable dataset generator
X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning
Balanced MSE for Imbalanced Visual Regression
SNUG: Self-Supervised Neural Dynamic Garments
Shape from Polarization for Complex Scenes in the Wild
LASER: LAtent SpacE Rendering for 2D Visual Localization
Single-Photon Structured Light
3DeformRS: Certifying Spatial Deformations on Point Clouds
Aesthetic Text Logo Synthesis via Content-aware Layout Inferring
Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes
Robust and Accurate Superquadric Recovery: a Probabilistic Approach
Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and Cycle Idempotence
Not All Tokens Are Equal: Human-centric Visual Analysis via Token Clustering Transformer
DeepDPM: Deep Clustering With an Unknown Number of Clusters
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Proto2Proto: Can you recognize the car, the way I do?
Putting People in their Place: Monocular Regression of 3D People in Depth
Light Field Neural Rendering
Neural Texture Extraction and Distribution for Controllable Person Image Synthesis
Locality-Aware Inter-and Intra-Video Reconstruction for Self-Supervised Correspondence Learning