This is my personal note about local and global descriptor. Trying to make anyone can get in to these fields more easily. If you find anything you want to add, feel free to post on issue or email me.

This repo is also a side product when I was doing the survey of our paper UR2KID. If you find this repo useful, please also consider to cite our paper.

  title={UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision},
  author={Yang*, Tsun-Yi; Nguyen*, Duy-Kien; Heijnen, Huub; Balntas, Vassileios},
  journal={arXiv preprint arXiv:2001.07252},

This repo will be constantly updated.

Author: Tsun-Yi Yang ([email protected])

Online talks

Year Topic Link
[ECCV20] MLAD Workshop morning, afternoon
[3DV20] 3DGV Talk: Marc Pollefeys - 3D geometric vision youtube
[CVPR20] Image Matching Workshop youtube
[CVPR20] CVPR2020 tutorial: Local Features: From SIFT to Differentiable Methods youtube
[CVPR20] Deep Visual SLAM Frontends: SuperPoint, SuperGlue, and SuperMaps youtube

Local matching pipeline

In this section, I focus on the review about the sparse keypoint matching and it's pipeline.

1. Keypoint detection

This subsection includes the review about keypoint detection and it's orientation, scale, or affine transformation estimation.

Year Paper Link Code
[CVPR20] Holistically-Attracted Wireframe Parsing arXiv github
[CVPR20] KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects arXiv link
[3DV19] SIPs: Succinct Interest Points from Unsupervised Inlierness Probability Learning arXiv Github
[ICCV19] Key.Net: Keypoint Detection by Handcrafted and Learned CNN Filters PDF Github
[ECCV18] Repeatability Is Not Enough: Learning Discriminative Affine Regions via Discriminability arXiv Github
[CVPR17] Learning Discriminative and Transformation Covariant Local Feature Detectors PDF Github
[CVPR17] Quad-networks: unsupervised learning to rank for interest point detection PDF -
[CVPR16] Learning to Assign Orientations to Feature Poitns - Github
[CVPR15] TILDE: a Temporally Invariant Learned DEtector arXiv Github
  • 3D
Year Paper link Code
[ECCV20] DH3D: Deep Hierarchical 3D Descriptors for Robust Large-Scale 6DoF Relocalization link github
[ICCV19] USIP: Unsupervised Stable Interest Point Detection from 3D Point Clouds arXiv Github
[arXiv19] Self-Supervised 3D Keypoint Learning for Ego-motion Estimation arXiv Github

2. Keypoint description (local descriptor)

In the last few decades, people focus on the patch descriptor

  • Hand-crafted
Year Paper link Code
[CVPR16] Accumulated Stability Voting: A Robust Descriptor from Descriptors of Multiple Scales PDF Github
[CVPR15] Domain-Size Pooling in Local Descriptors: DSP-SIFT PDF -
[CVPR15] BOLD - Binary Online Learned Descriptor For Efficient Image Matching PDF Github
[CVPR13] Boosting binary keypoint descriptors - -
[CVPR12] Freak: Fast retina keypoint - -
[CVPR12] Three things everyone should know to improve object retrieval PDF -
[IPOL11] ASIFT: An Algorithm for Fully Affine Invariant Comparison - -
[ICCV11] BRISK: Binary robust invariant scalable keypoints - -
[ICCV11] Orb: An efficient alternative to sift or surf - -
[ICCV11] Local inten-sity order pattern for feature description - -
[CVIU06] Speeded-up robust features (SURF) - -
[ECCV06] Surf:Speeded up robust features - -
[IJCV04] Distinctive image features from scale-invariant keypoints - Github
  • Deep learning
Year Paper link Code
[TIP19] Learning Local Descriptors by Optimizing the Keypoint-Correspondence Criterion: Applications to Face Matching, Learning from Unlabeled Videos and 3D-Shape Retrieval arXiv Github
[ICCV19] Beyond Cartesian Representations for Local Descriptors PDF -
[CVPR19] SOSNet: Second Order Similarity Regularization for Local Descriptor Learning arXiv,Page Github
[ECCV18] GeoDesc: Learning Local Descriptors by Integrating Geometry Constraints - Github
[CVPR18] Local Descriptors Optimized for Average Precision Page -
[NIPS17] Working hard to know your neighbor's margins: Local descriptor learning loss arXiv Github
[ICCV17] DeepCD: Learning Deep Complementary Descriptors for Patch Representations PDF Github
[CVPR17] L2-Net: Deep Learning of Discriminative Patch Descriptor in Euclidean Space PDF Github
[arXiv16] PN-Net: Conjoined Triple Deep Network for Learning Local Image Descriptors arXiv Github
[BMVC16] Learning local feature descriptors with triplets and shallow convolutional neural networks PDF Github
[ICCV15] Discriminative Learning of Deep Convolutional Feature Point Descriptors Page Github
[CVPR15] MatchNet: Unifying Feature and Metric Learning for Patch-Based Matching PDF -
[CVPR15] Learning to compare image patches via convolutional neural networks PDF Github
  • 3D
Year Paper link Code

3. End-to-end matching pipeline

Recently, more and more papers try to embed the whole matching pipeline (keypoint detection, keypoint description) into one framework.

Year Paper link Code
[arXiv20] Dense Semantic 3D Map Based Long-Term Visual Localization with Hybrid Features arXiv -
[arXiv20] D2D: Learning to find good correspondences for image matching and manipulation arXiv -
[arXiv20] DISK: Learning local features with policy gradient arXiv -
[arXiv20] D2D: Keypoint Extraction with Describe to Detect Approach arXiv -
[arXiv20] HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning arXiv -
[arXiv20] Learning Feature Descriptors using Camera Pose Supervision arXiv -
[arXiv20] Efficient Neighbourhood Consensus Networks via Submanifold Sparse Convolutions arXiv github
[arXiv20] S2DNet: Learning Accurate Correspondences for Sparse-to-Dense Feature Matching arXiv -
[CVPR20] ASLFeat: Learning Local Features of Accurate Shape and Localization arXiv github,tfmatch
[CVPR20] Reinforced Feature Points: Optimizing Feature Detection and Description for a High-Level Task arXiv -
[WACV19] DGC-Net: Dense Geometric Correspondence Network arXiv github
[NIPS19] R2D2: Repeatable and Reliable Detector and Descriptor arXiv,Page Github
[ICCV19] ELF: Embedded Localisation of Features in Pre-Trained CNN PDF Github
[CVPR19] RF-Net: An End-to-End Image Matching Network based on Receptive Field arXiv Github
[CVPR19] D2-Net: A Trainable CNN for Joint Description and Detection of Local Features arXiv,Page Github
[BMVC19] Matching Features without Descriptors: Implicitly Matched Interest Points PDF github
[CVPRW18] SuperPoint: Self-Supervised Interest Point Detection and Description arXiv Github,3rd_party
[NIPS18] LF-Net: Learning Local Features from Images PDF Github
[ECCV16] LIFT: Learned Invariant Feature Points - Github
  • 3D
Year Paper link Code
[CVPR20] D3Feat: Joint Learning of Dense Detection and Description of 3D Local Features arXiv github
[arXiv20] StickyPillars: Robust feature matching on point clouds using Graph Neural Networks arXiv -

3.5. Dense descriptor

Unlike local keypoint descriptor depends on keypoint, some works try to get the whole dense descriptor representation.

Year Paper link Code
[ICRA20] GN-Net: The Gauss-Newton Loss for Multi-Weather Relocalization arXiv, MyNote Web
[ICCV17] CLKN: Cascaded Lucas-Kanade Networks for Image Alignment PDF -

4. Geometric verification or learning based matcher

After the matching, standard RANSAC and it's variants are usually adopted for outlier removal.

  • Algorithm based
Year Paper link Code
[ECCV20] Making Affine Correspondences Work in Camera Geometry Computation arXiv github
[arXiv20] AdaLAM: Revisiting Handcrafted Outlier Detection arXiv github
[arXiv20] Multi-View Optimization of Local Feature Geometry arXiv -
[CVPR19] MAGSAC: Marginalizing Sample Consensus PDF Github
[CVPR16] Progressive Feature Matching with Alternate Descriptor Selection and Correspondence Enrichment PDF -
[CVPR13] Robust Feature Matching with Alternate Hough and Inverted Hough Transforms PDF -
[ECCV12] Improving Image-Based Localization by Active Correspondence Search PDF -
[CVPR05] Matching with PROSAC – Progressive Sample Consensus PDF -
[CVPR05] Two-View Geometry Estimation Unaffected by a Dominant Plane PDF Github
  • Deep learning based
Year Paper link Code
[ECCV20] Online Invariance Selection for Local Feature Descriptors arXiv github
[CVPR20] SuperGlue: Learning Feature Matching with Graph Neural Networks arXiv Github
[CVPR20] High-dimensional Convolutional Networks for Geometric Pattern Recognition arXiv, youtube -
[CVPR20] ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning arXiv github
[arXiv20] RANSAC-Flow: generic two-stage image alignment arXiv, youtube page,Github
[ICCV19] NG-RANSAC for Epipolar Geometry from Sparse Correspondences arXiv Github
[ICCV19] Learning Two-View Correspondences and Geometry Using Order-Aware Network arXiv Github
[CVPR18] Learning to Find Good Correspondences - Github
  • Image registration
Year Paper link Code
[arXiv20] Deep Global Registration arXiv, youtube -
[Access18] Multi-Temporal Remote Sensing Image Registration Using Deep Convolutional Features PDF Github

Global retrieval

Consider global retrieval usually targets on a lot of candidates, there are several way to generate one single description for one image.

1. Feature aggregation

  • Hand-crafted

When there is only hand-crafted local descriptors, people usually uses feature aggregation from a set of local descriptors and output a single description.

Year Paper link Code
To aggregate or not to aggregate: Selective match kernels for image search
Image search with selective match kernels: aggregation across single and multiple images
Official : matlab, from DELF (tensorflow)
[CVPR13] All about VLAD PDF -
[ECCV10] Improving the fisher kernel for large-scale image classification PDF -
[CVPR07] Object retrieval with large vocabularies and fast spatial matching PDF -
[CVPR06] Fisher kenrels on visual vocabularies for image categorizaton PDF -
  • Deep learning

Similar idea but use deep learning to adapt classical algorithm

Year Paper link Code
[ECCV16] CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples. PDF -
[CVPR16] NetVLAD: CNN architecture for weakly supervised place recognition Page Github

2. Real-valued descriptor

One single representation from the image.

Year Paper link Code
[ECCV20] Learning and aggregating deep local descriptors for instance-level recognition arXiv github
[ECCV20] Predicting Visual Overlap of Images Through Interpretable Non-Metric Box Embeddings arXiv github
[ECCV20] Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval arXiv github
[ECCV20] SOLAR: Second-Order Loss and Attention for Image Retrieval arXiv -
[ECCV20] Unifying Deep Local and Global Features for Efficient Image Search arXiv -
[arXiv19] ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval arXiv -
[TIP19] REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval arXiv -
[ICCV19] Learning with Average Precision: Training Image Retrieval with a Listwise Loss arXiv Github
[CVPR19] Detect-to-Retrieve: Efficient Regional Aggregation for Image Search PDF Github
[TPAMI18] Fine-tuning CNN Image Retrieval with No Human Annotation arXiv Github
[IJCV17] End-to-end Learning of Deep Visual Representations for Image Retrieval arXiv Github
[ICCV17] Large-Scale Image Retrieval with Attentive Deep Local Features - Github
[ECCV16] CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples arXiv Github

3. Binary descriptor and quantization

For more compact representation, a binary descriptor can be generated from hashing or thresholding. Quantization is also very popular in large-scale image retrieval.

Year Paper link Code
[ICCVW19] DAME WEB: DynAmic MEan with Whitening Ensemble Binarization for Landmark Retrieval without Human Annotation PDF Github
[CVPR19] FastAP: Deep Metric Learning to Rank PDF Github
[CVPR18] Hashing as Tie-Aware Learning to Rank PDF Github
[AAAI18] Deep Region Hashing for Generic Instance Search from Image - -
[TPAMI18] Supervised Learning of Semantics-Preserving Hash via Deep Convolutional NeuralNetworks - -
[TPAMI13] Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-Scale Image Retrieval PDF -
[TPAMI10] Product quantization for nearest neighbor search PDF -

4. Pre-processing/Post-processing

Anything can boost the performance in the pre/post-processing stage such as rectification/re-ranking/query expansion.

Year Paper link Code
[arXiv20] Image Stylization for Robust Features arXiv -
[ECCV20] Single-Image Depth Prediction Makes Feature Matching Easier arXiv github
[CVPR19] Local features and visual words emerge in activations PDF -
[CVPR12] Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking PDF -

5. 3d point cloud

Year Paper link Code
[CVPR18] PointNetVLAD: Deep Point Cloud Based Retrieval for Large-Scale Place Recognition arXiv Github

Multi-tasking local and global descriptors

Some works try to cover both local descriptor and global retrieval due to the shared similarity about the activation and the applications.

Year Paper link Code
[arXiv20] UR2KiD: Unifying Retrieval, Keypoint Detection, and Keypoint Description without Local Correspondence Supervision arXiv -
[CVPR19] ContextDesc: Local Descriptor Augmentation with Cross-Modality Context - Github
[CVPR19] From Coarse to Fine: Robust Hierarchical Localization at Large Scale with HF-Net arXiv Github
[ICCV17] Large-Scale Image Retrieval with Attentive Deep Local Features (DELF) - Github

Reivew type paper

Year Paper link Code
[arXiv18] From handcrafted to deep local features arXiv -
[CVPR17] Comparative Evaluation of Hand-Crafted and Learned Local Features PDF -

Metric learning

Year Paper link Code
[arXiv20] Metric learning: cross-entropy vs. pairwise losses arXiv -
[arXiv19] A Metric Learning Reality Check arXiv -


Year Paper link Code
[arXiv29] Reducing Drift in Structure from Motion using Extended Features arXiv -


Year Paper link Code
[CVPR20] Fast-MVSNet: Sparse-to-Dense Multi-View Stereo With Learned Propagation and Gauss-Newton Refinement arXiv github
[CVPR20] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks arXiv github

View Synthesis/Novel view/Image completion

Year Paper link Code
[ECCV20] Flow-edge Guided Video Completion arXiv link
[arXiv20] Reference Pose Generation for Visual Localization via Learned Features and View Synthesis arXiv -
[CVPR20] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Networks arXiv github

Segmentation localization

Year Paper link Code
[ICCV19] Fine-Grained Segmentation Networks: Self-Supervised Segmentation for Improved Long-Term Visual Localization arXiv github


Local matching

Year Paper link Code Note
[arXiv2020] Image Matching across Wide Baselines: From Paper to Practice arXiv github
[CVPR17] HPatches: A benchmark and evaluation of handcrafted and learned local descriptors arXiv Github Hpatches
[TPAMI11] Discriminative learning of local image descriptors Page - UBC/Brown dataset (subsets:Liberty (New York), Notre Dame (Paris) and Half Dome (Yosemite))
[CVPR08] On Benchmarking Camera Calibration and MultiView Stereo for High Resolution Imagery

Global retrieval

Year Paper link Code Note
[CVPR18] Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking Page Github ROxford5k, RParis6k
[CVPR07] Object retrieval with large vocabularies and fast spatial matching Page - Oxford5k
[CVPR08] Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases Page - Paris6k

Localization (both local matching and global retrieval)

Year Paper link Code Note
[ECCV20] Map-based Localization for Autonomous Driving web github1, github2 -
[CVPR18] Benchmarking 6DOF Outdoor Visual Localization in Changing Conditions PDF,Page Github Aachen-day-night, Robotcar, CMU-seasons


Year Paper link
[2020] Kapture github
[2020] hloc - the hierarchical localization toolbox github
[2020] pyslamv2 github

