This is the code for the current state-of-the-art billion-scale nearest neighbor search system presented in the paper:
Revisiting the Inverted Indices for Billion-Scale Approximate Nearest Neighbors,
Dmitry Baranchuk, Artem Babenko, Yury Malkov
The code is developed upon the FAISS library.
Today we provide the C++ implementation supporting only the CPU version, which requires a BLAS library.
The code requires a C++ compiler that understands:
git clone https://github.com/dbaranchuk/ivf-hnsw --recursive
There are a few models for makefile.inc in the faiss/example_makefiles/ subdirectory. Copy the relevant one for your system to faiss/ and adjust to your needs. In particular, for ivf-hnsw project, you need to set a proper BLAS library paths. There are also indications for specific configurations in the troubleshooting section of the FAISS wiki
Replace faiss/CMakeList.txt with CMakeList.txt.faiss in order to deactivate building of unnecessary tests and the GPU version.
mv CMakeLists.txt.faiss faiss/CMakeLists.txt
cmake . && make
The proposed methods are tested on two 1 billion datasets: SIFT1B and DEEP1B. For using provided examples, all data files have to be in data/SIFT1B and data/DEEP1B.
cd data/SIFT1B && bash load_sift1b.sh
cd data/DEEP1B && python load_deep1b.py
Note: precomputed indices are optional, as it just lets avoid assigning step, which takes about 2-3 days for 2^20 centroids.
tests/ provides two tests for each dataset:
Each test requires many options, so we provide bash scripts in examples/, exploiting these tests. Scripts are commented and the Parser class provides short descriptions for each option.
Make sure that:
mkdir models && mkdir models/SIFT1B && mkdir models/DEEP1B
The doxygen documentation gives per-class information