:mag: Awesome Vector Database
A curated list of awesome works related to high dimensional structure/vector search & database
Services
- Google Vector Search (Vertex AI)
- Pinecone
- Weaviate [Beginner Guide]
- Vespa
- txtai
- marqo
- vectara
- Epsilla
- algolia
- nucliadb
- OpenSearch
- MyScale
- QdrantCloud
- zilliz
- OpenSearch's AlibabaCloud
- Typesense's Cloud
- MongoDB Atlas Vector Search
- SuperDuperDB
- KBD.AI
Comparisons
Libraries & Engines
Multidimensional data / Vectors
- :star: 🥇 Vector DB Feature Matrix
- :star: Faiss Paper
- Typesense
- Qdrant
- annoy
- NGT
- pgvector
- Chroma
- LlamaIndex
- Epsilla
- jvector
- RAFT
- Vald
- Voyager
- tinyvector
- USearch
- vearch
- MRPT
- milvus
- infinity
- havenask
- chromem-go
- OasysDB [Notebook]
- arroy
- bleve
- cuVS
Texts
Others
- SimSIMD: Efficient Alternative to
scipy.spatial.distance
andnumpy.inner
Benchmarks & Databases
- ANN Benchmarks [Paper].
- Billion-scale ANNS Benchmarks
- BEIR
- VectorDBBench - A Vector Database Benchmark Tool
- Qdrant's Vector Database Benchmarks
- MyScale's Vector Database Benchmark
- Li, Wen, et al. "Approximate nearest neighbor search on high dimensional data—experiments, analyses, and improvement." IEEE Transactions on Knowledge and Data Engineering 32.8 (2019): 1475-1488.
- Zeng, Xianzhi, et al. "CANDY: A Benchmark for Continuous Approximate Nearest Neighbor Search with Dynamic Data Ingestion." arXiv preprint arXiv:2406.19651 (2024).
📚 Books
- Foundations of Multidimensional and Metric Data Structures
- Introduction to Information Retrieval
- Deep Learning for Search
- Foundations of Vector Retrieval
Conferences & Workshops
- :star: VLDB
- :star: Image Retrieval in the Wild (CVPR20) [Video]
- Haystack
- Neural Search In Action
- ACM MM 2020: Effective and Efficient: Toward Open-world Instance Re-identification
- Retrieval Augmented Generation and Vespa [Slides]
- SISAP Indexing Challenge
Courses
- Long Term Memory in AI - Vector Search and Databases (COS 495 - Princeton) [Class Notes]
- Freiburg Information Retrieval WS 2022-2023 [Website, Video Lectures]
- Vector Similarity Search and Faiss Course [Youtube Playlist]
Others
- VectorHub: a free, open-source learning website for people (software developers to senior ML architects) interested in adding vector retrieval to their ML stack.
Publications
Survey
- :star: Pan, James Jie, Jianguo Wang, and Guoliang Li. "Survey of Vector Database Management Systems." arXiv preprint arXiv:2310.14021 (2023). [Paper]
- Aumüller, Martin, and Matteo Ceccarello. "Recent Approaches and Trends in Approximate Nearest Neighbor Search." {IEEE} Data Engineering Bulletin (2023).
- Nearest neighbor search: the old, the new, and the impossible. Andoni, Alexandr. [Paper]
Quantization
Source: A survey of product quantization.
- :star: PQ: Product quantization for nearest neighbor search. Jegou, Herve, Matthijs Douze, and Cordelia Schmid. [Paper, Code, Julia Code, nanopq]
- :star: k-selection on GPU: Billion-scale similarity search with gpus. Johnson, Jeff, Matthijs Douze, and Hervé Jégou [Paper, Code]
- :star: A survey of product quantization. Matsui, Yusuke, Yusuke Uchida, Hervé Jégou, and Shin'ichi Satoh [Paper]
- OPQ: Optimized Product Quantization. Ge, Tiezheng, Kaiming He, Qifa Ke, and Jian Sun [Homepage, Paper, Code, nanopq]
- Quicker adc: Unlocking the hidden potential of product quantization with simd. André, Fabien, Anne-Marie Kermarrec, and Nicolas Le Scouarnec [Paper, Code]
- ScaNN: Accelerating Large-Scale Inference with Anisotropic Vector Quantization. Guo, Ruiqi, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, and Sanjiv Kumar [Paper, Python/C++ Inference, Julia Training/Inference]
- The inverted multi-index. Babenko, Artem, and Victor Lempitsky [Paper, Code]
- Are We There Yet? Product Quantization and its Hardware Acceleration. Fernandez-Marques, Javier, Ahmed F. AbouElhamayed, Nicholas D. Lane, and Mohamed S. Abdelfattah. [Paper]
- LibVQ: A Toolkit for Optimizing Vector Quantization and Efficient Neural Retrieval. Li, Chaofan, Zheng Liu, Shitao Xiao, Yingxia Shao, Defu Lian, and Zhao Cao. [Paper, Code]
- Matsui, Yusuke, Ryota Hinami, and Shin'ichi Satoh. "Reconfigurable Inverted Index." Proceedings of the 26th ACM international conference on Multimedia. 2018. [Paper, Project, Code]
- Aguerrebere, Cecilia, et al. "Similarity search in the blink of an eye with compressed indices." arXiv preprint arXiv:2304.04759 (2023).
- Huijben, Iris, et al. "Residual Quantization with Implicit Neural Codebooks." arXiv preprint arXiv:2401.14732 (2024). [Code]
- Rege, Aniket, et al. "Adanns: A framework for adaptive semantic search." Advances in Neural Information Processing Systems 36 (2024).
- Amara, Kenza, et al. "Nearest neighbor search with compact codes: A decoder perspective." Proceedings of the 2022 International Conference on Multimedia Retrieval. 2022.
- Krishnan, Aditya, and Edo Liberty. "Projective Clustering Product Quantization." arXiv preprint arXiv:2112.02179 (2021).
- Noh, Haechan, Taeho Kim, and Jae-Pil Heo. "Product quantizer aware inverted index for scalable nearest neighbor search." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021.
- Zhan, Jingtao, et al. "Jointly optimizing query encoder and product quantization to improve retrieval performance." Proceedings of the 30th ACM International Conference on Information & Knowledge Management. 2021.
- Wang, Runhui, and Dong Deng. "DeltaPQ: lossless product quantization code compression for high dimensional similarity search." Proceedings of the VLDB Endowment 13.13 (2020): 3603-3616.
- Jang, Young Kyun, and Nam Ik Cho. "[Generalized product quantization network for