Project Name | Stars | Downloads | Repos Using This | Packages Using This | Most Recent Commit | Total Releases | Latest Release | Open Issues | License | Language |
---|---|---|---|---|---|---|---|---|---|---|
Pwc | 14,522 | 3 years ago | 22 | |||||||
Papers with code. Sorted by stars. Updated weekly. | ||||||||||
Cvpr2023 Papers With Code | 11,882 | 6 days ago | 14 | |||||||
CVPR 2023 论文和开源项目合集 | ||||||||||
Daily Paper Computer Vision | 5,383 | 5 months ago | 5 | |||||||
记录每天整理的计算机视觉/深度学习/机器学习相关方向的论文 | ||||||||||
Awesome Multimodal Ml | 4,269 | 4 days ago | 6 | mit | ||||||
Reading list for research topics in multimodal machine learning | ||||||||||
Awesome Anomaly Detection | 2,146 | 8 months ago | 10 | |||||||
A curated list of awesome anomaly detection resources | ||||||||||
Image Text Localization Recognition | 883 | 10 months ago | ||||||||
A general list of resources to image text localization and recognition 场景文本位置感知与识别的论文资源与实现合集 シーンテキストの位置認識と識別のための論文リソースの要約 | ||||||||||
Awesome Gradient Boosting Papers | 841 | 4 months ago | cc0-1.0 | Python | ||||||
A curated list of gradient boosting research papers with implementations. | ||||||||||
Learning Deep Learning | 834 | 18 hours ago | 1 | Jupyter Notebook | ||||||
Paper reading notes on Deep Learning and Machine Learning | ||||||||||
Context Encoder | 675 | 3 years ago | 20 | other | Lua | |||||
[CVPR 2016] Unsupervised Feature Learning by Image Inpainting using GANs | ||||||||||
Mimicry | 556 | 10 months ago | 17 | August 14, 2020 | 9 | mit | Python | |||
[CVPR 2020 Workshop] A PyTorch GAN library that reproduces research results for popular GANs. |
CVPR 2023 论文和开源项目合集(papers with code)!
25.78% = 2360 / 9155
CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR 2022), and accepted 2360 papers, for a 25.78% acceptance rate.
注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
Integrally Pre-Trained Transformer Pyramid Networks
Stitchable Neural Networks
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
BiFormer: Vision Transformer with Bi-Level Routing Attention
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
Vision Transformer with Super Token Sampling
Hard Patches Mining for Masked Image Modeling
SMPConv: Self-moving Point Representations for Continuous Convolution
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Generic-to-Specific Distillation of Masked Autoencoders
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
Panoptic Lifting for 3D Scene Understanding with Neural Fields
NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer
HNeRV: A Hybrid Neural Representation for Videos
DETRs with Hybrid Matching
Diversity-Aware Meta Visual Prompting
PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
Structured 3D Features for Reconstructing Relightable and Animatable Avatars
Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos
Clothing-Change Feature Augmentation for Person Re-Identification
MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID
Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification
Video Probabilistic Diffusion Models in Projected Latent Space
Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
Imagic: Text-Based Real Image Editing with Diffusion Models
Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
DiffRF: Rendering-guided 3D Radiance Field Diffusion
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption
DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
Generative Diffusion Prior for Unified Image Restoration and Enhancement
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation
Integrally Pre-Trained Transformer Pyramid Networks
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
Learning Trajectory-Aware Transformer for Video Super-Resolution
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
BiFormer: Vision Transformer with Bi-Level Routing Attention
Vision Transformer with Super Token Sampling
BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision
BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
Teaching Structured Vision&Language Concepts to Vision&Language Models
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
All in One: Exploring Unified Video-Language Pre-training
Position-guided Text Prompt for Vision Language Pre-training
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
Multi-Modal Representation Learning with Text-Driven Soft Masks
Learning to Name Classes for Vision and Language Models
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
DETRs with Hybrid Matching
Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Simple Cues Lead to a Strong Multi-Object Tracker
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding
Label-Free Liver Tumor Segmentation
Directional Connectivity-based Segmentation of Medical Images
Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation
Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization
Fair Federated Medical Image Segmentation via Client Contribution Estimation
Ambiguous Medical Image Segmentation using Diffusion Models
Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation
MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery
MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation
Rethinking Few-Shot Medical Segmentation: A Vector Quantization View
Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation
SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation
Two-shot Video Object Segmentation
Under Video Object Segmentation Section
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
Code: None
Physical-World Optical Adversarial Attacks on 3D Face Recognition
IterativePFN: True Iterative Point Cloud Filtering
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection
3D Video Object Detection with Learnable Object-Centric Global Optimization
Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection
Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation
Robust Outlier Rejection for 3D Registration with Variational Bayes
Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective
Burstormer: Burst Image Restoration and Enhancement Transformer
Super-Resolution Neural Operator
Learning Trajectory-Aware Transformer for Video Super-Resolution
Code: researchmm/TTVSR
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Few-shot Semantic Image Synthesis with Class Affinity Transfer
TopNet: Transformer-based Object Placement Network for Image Compositing
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Conditional Image-to-Video Generation with Latent Flow Diffusion Models
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Frame Flexible Network
Masked Motion Encoding for Self-Supervised Video Representation Learning
TriDet: Temporal Action Detection with Relative Boundary Modeling
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
Generic-to-Specific Distillation of Masked Autoencoders
DepGraph: Towards Any Structural Pruning
Context-Based Trit-Plane Coding for Progressive Image Compression
Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images
OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
SparsePose: Sparse-View Camera Pose Regression and Refinement
NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
3D Cinemagraphy from a Single Image
Revisiting Rotation Averaging: Uncertainties and Robust Losses
FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction
A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images
Homepage: https://younglbw.github.io/HRN-homepage/
Code: youngLBW/HRN
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection
BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
Cross-Domain Image Captioning with Discriminative Finetuning
Model-Agnostic Gender Debiased Image Captioning
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Continuous Sign Language Recognition with Correlation Network
Paper: https://arxiv.org/abs/2303.03202
Code: hulianyuyy/CorrNet
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
3D Video Loops from Asynchronous Input
Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
Semantic Prompt for Few-Shot Learning
Iterative Geometry Encoding Volume for Stereo Matching
Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation
Prototype-based Embedding Network for Scene Graph Generation
Polynomial Implicit Neural Representations For Large Diverse Datasets
Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
Align and Attend: Multimodal Summarization with Dual Contrastive Losses
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
CelebV-Text: A Large-Scale Facial Text-Video Dataset
Interactive Segmentation as Gaussian Process Classification
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
Token Turing Machines
Single Image Backdoor Inversion via Robust Smoothed Classifiers
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
Learning Neural Parametric Head Models
A Meta-Learning Approach to Predicting Performance and Data Requirements
MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
Masked Images Are Counterfactual Samples for Robust Fine-tuning
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
UniHCP: A Unified Model for Human-Centric Perceptions
CUDA: Convolution-based Unlearnable Datasets
Masked Images Are Counterfactual Samples for Robust Fine-tuning
AdaptiveMix: Robust Feature Representation via Shrinking Feature Space
Physical-World Optical Adversarial Attacks on 3D Face Recognition
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
Sharpness-Aware Gradient Matching for Domain Generalization
Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization
Blind Video Deflickering by Neural Filtering with a Flawed Atlas
RiDDLE: Reversible and Diversified De-identification with Latent Encryptor
PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation
Upcycling Models under Domain and Category Shift
Modality-Agnostic Debiasing for Single Domain Generalization
Progressive Open Space Expansion for Open-Set Model Attribution
Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies
GFPose: Learning 3D Human Pose Prior with Gradient Fields
PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment
Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
Boundary Unlearning
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
Zero-shot Model Diagnosis
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
Quantum Multi-Model Fitting
DivClust: Controlling Diversity in Deep Clustering
Neural Volumetric Memory for Visual Locomotion Control
MonoHuman: Animatable Human Neural Field from Monocular Video
Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion
Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification
HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
On the Stability-Plasticity Dilemma of Class-Incremental Learning
Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning
VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution
Detecting and Grounding Multi-Modal Media Manipulation
Meta-causal Learning for Single Domain Generalization
Disentangling Writer and Character Styles for Handwriting Generation