Milvus (vector database)

Last updated
Milvus
Developer(s) Zilliz
Initial releaseOctober 19, 2019;5 years ago (2019-10-19)
Stable release
v2.4.17 / November 22, 2024;3 days ago (2024-11-22).: [1]
Repository github.com/milvus-io/milvus
Written in C++, Go
Operating system Linux, macOS
Platform x86, ARM
Type Vector database
License Apache License 2.0
Website milvus.io

Milvus is a distributed vector database developed by Zilliz. It is available as both open-source software and a cloud service.

Contents

Milvus is an open-source project under LF AI & Data Foundation [2] distributed under the Apache License 2.0.

History

Milvus has been developed by Zilliz since 2017. [3]

Milvus joined Linux foundation as an incubation project in January 2020 and became a graduate in June 2021. [2] The details about its architecture and possible applications were presented on ACM SIGMOD Conference in 2021 [4]

Milvus 2.0, a major redesign of the whole product with a new architecture, [5] was released in January 2022.

Features

Major similarity search related features that are available in the active 2.4.x Milvus branch [6] :

Milvus similarity search engine relies on heavily-modified forks of third-party open-source similarity search libraries, such as Faiss, [7] [8] DiskANN [9] [10] and hnswlib. [11]

Milvus includes optimizations for I/O data layout, specific to graph search indices. [12]

Database

As a database, Milvus provides the following features: [6]

Deployment options

Milvus can be deployed as an embedded database, standalone server, or distributed cluster. Zillis Cloud offers a fully managed version. [16]

GPU support

Milvus provides GPU accelerated index building and search using Nvidia CUDA technology [17] [18] via Nvidia RAFT library, [19] including a recent GPU-based graph indexing algorithm Nvidia CAGRA [20]

Integration

Milvus provides official SDK clients for Java, NodeJS, Python and Go. [21] An additional C# SDK client was contributed by Microsoft. [6] [22] The database can integrate with Prometheus and Grafana for monitoring and alerts, frameworks Haystack [23] and LangChain, [24] IBM Watsonx [25] , and OpenAI models. [26] [27]

See also

Related Research Articles

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

<span class="mw-page-title-main">PhysX</span> Realtime physics engine software

PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as part of the Nvidia GameWorks software suite.

<span class="mw-page-title-main">Texture compression</span> Type of data compression

Texture compression is a specialized form of image compression designed for storing texture maps in 3D computer graphics rendering systems. Unlike conventional image compression algorithms, texture compression algorithms are optimized for random access.

Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

<span class="mw-page-title-main">The Portland Group</span> American technology company

PGI was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc. As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the Nvidia HPC SDK product available as a free download from Nvidia.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

In computing, half precision is a binary floating-point computer number format that occupies 16 bits in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

mlpack

mlpack is a free, open-source and header-only software library for machine learning and artificial intelligence written in C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack has also a light deployment infrastructure with minimum dependencies, making it perfect for embedded systems and low resource devices. Its intended target users are scientists and engineers.

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

Nvidia NVENC is a feature in Nvidia graphics cards that performs video encoding, offloading this compute-intensive task from the CPU to a dedicated part of the GPU. It was introduced with the Kepler-based GeForce 600 series in March 2012.

<span class="mw-page-title-main">GPUOpen</span> Middleware software suite

GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.

Nvidia NVDEC is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU. NVDEC is a successor of PureVideo and is available in Kepler and later NVIDIA GPUs.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP, and OpenCL.

oneAPI (compute acceleration) Open standard for parallel computing

oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.

LangChain is a software framework that helps facilitate the integration of large language models (LLMs) into applications. As a language model integration framework, LangChain's use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.

A vector database, vector store or vector search engine is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

<span class="mw-page-title-main">Hierarchical navigable small world</span> Clustering and community detection algorithm

The Hierarchical navigable small world (HNSW) algorithm is a graph-based approximate nearest neighbor search technique used in many vector databases. Nearest neighbor search without an index involves computing the distance from the query to each point in the database, which for large datasets is computationally prohibitive. For high-dimensional data, tree-based exact vector search techniques such as the k-d tree and R-tree do not perform well enough because of the curse of dimensionality. To remedy this, approximate k-nearest neighbor searches have been proposed, such as locality-sensitive hashing (LSH) and product quantization (PQ) that trade performance for accuracy. The HNSW graph offers an approximate k-nearest neighbor search which scales logarithmically even in high-dimensional data.

References

  1. "Release notes for Milvus v2.4.17". GitHub .
  2. 1 2 "LF AI & Data Foundation Announces Graduation of Milvus Project". June 23, 2021.
  3. Liao, Ingrid Lunden and Rita (2022-08-24). "Zilliz raises $60M, relocates to SF". TechCrunch. Retrieved 2024-10-21.
  4. "Milvus: A Purpose-Built Vector Data Management System". SIGMOD '21: Proceedings of the 2021 International Conference on Management of Data. June 18, 2021. pp. 2614–2627. doi:10.1145/3448016.3457550. ISBN   978-1-4503-8343-1.
  5. Guo, Rentong; Luan, Xiaofan; Xiang, Long; Yan, Xiao; Yi, Xiaomeng; Luo, Jigao; Cheng, Qianya; Xu, Weizhi; Luo, Jiarui; Liu, Frank; Cao, Zhenshan; Qiao, Yanliang; Wang, Ting; Tang, Bo; Xie, Charles (2022). "Manu: A Cloud Native Vector Database Management System". arXiv: 2206.13843 [cs.DB].
  6. 1 2 3 "Milvus overview" . Retrieved September 23, 2024.
  7. "Faiss". GitHub . Retrieved September 23, 2024.
  8. Douze, Matthijs; Guzhva, Alexandr; Deng, Chengqi; Johnson, Jeff; Szilvasy, Gergely; Mazaré, Pierre-Emmanuel; Lomeli, Maria; Hosseini, Lucas; Jégou, Hervé (2024). "The Faiss library". arXiv: 2401.08281 [cs.LG].
  9. "DiskANN library". GitHub . Retrieved September 23, 2024.
  10. Subramanya, Suhas Jayaram; Kadekodi, Rohan; Krishaswamy, Ravishankar; Simhadri, Harsha Vardhan (8 December 2019). "DiskANN: fast accurate billion-point nearest neighbor search on a single node". Proceedings of the 33rd International Conference on Neural Information Processing Systems. Curran Associates Inc.: 13766–13776.
  11. "Hnswlib - fast approximate nearest neighbor search". GitHub . Retrieved September 23, 2024.
  12. Wang, Mengzhao; Xu, Weizhi; Yi, Xiaomeng; Wu, Songlin; Peng, Zhangyang; Ke, Xiangyu; Gao, Yunjun; Xu, Xiaoliang; Guo, Rentong; Xie, Charles (2024). "Starling: An I/O-Efficient Disk-Resident Graph Index Framework for High-Dimensional Vector Similarity Search on Data Segment". Proceedings of the ACM on Management of Data. 2: 1–27. arXiv: 2401.02116 . doi:10.1145/3639269.
  13. "Consistency levels in Milvus" . Retrieved September 29, 2024.
  14. "Multi-tenancy strategies" . Retrieved September 29, 2024.
  15. "Hybrid Search" . Retrieved September 23, 2024.
  16. "Zilliz cloud" . Retrieved October 10, 2024.
  17. "What's New In Milvus 2.3 Beta - 10X faster with GPUs" . Retrieved September 29, 2024.
  18. "Milvus 2.3 Launches with Support for Nvidia GPUs". 23 March 2023. Retrieved September 29, 2024.
  19. "NVIDIA RAFT library". GitHub .
  20. Ootomo, Hiroyuki; Naruse, Akira; Nolet, Corey; Wang, Ray; Feher, Tamas; Wang, Yong (August 2023). "CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs". arXiv: 2308.15136 [cs.DS].
  21. "Install Milvus Go SDK" . Retrieved September 29, 2024.
  22. "Get Started with Milvus Vector DB in .NET". March 6, 2024. Retrieved September 29, 2024.
  23. "Integration HayStack + Milvus" . Retrieved September 23, 2024.
  24. "Milvus connector for LangChain" . Retrieved September 23, 2024.
  25. "IBM watsonx.data's integrated vector database: unify, prepare, and deliver your data for AI". IBM . April 9, 2024. Retrieved September 29, 2024.
  26. "Getting started with Milvus and OpenAI". Mar 28, 2023. Retrieved September 23, 2024.
  27. "OpenAI and Milvus simple app". GitHub . Retrieved September 23, 2024.