NebulaGraph

NebulaGraph
Developer(s)	Vesoft Inc.
Initial release	2018;6 years ago
Stable release	3.7.0 / March 2024;4 months ago
Repository	github.com/vesoft-inc/nebula ;
Written in	C++, Go, Java, Python
Platform	Java SE
Type	open-source distributed graph database, Graph database, Multi-model database
License	Apache 2.0, Open source, Common Clause 1.0
Website	www.nebula-graph.io

Last updated July 17, 2024

NebulaGraph is an open-source distributed graph database built for super large-scale graphs with milliseconds of latency.^[1] NebulaGraph adopts the Apache 2.0 license and also comes with a wide range of data visualization tools.^[2]

History

NebulaGraph was developed in 2018 by Vesoft Inc.^[3] In May 2019, NebulaGraph was open-sourced on GitHub and its alpha version was released same year.^[4]

In June 2020, NebulaGraph raised $8M in a series pre-A funding round led by Redpoint China Ventures and Matrix Partners China.^[5]^[6]

In June 2019, NebulaGraph 1.0 GA version was released while version 2.0 GA was released in March 2021.^[7] The latest version 3.0.2 of Nebula was released in March 2022.^[8]

In September 2023, NebulaGraph and LlamaIndex introduced Graph RAG for retrieval-augmented generation.^[9]

Related Research Articles

<span class="mw-page-title-main">MySQL</span> SQL database engine software

MySQL is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database organizes data into one or more data tables in which data may be related to each other; these relations help structure the data. SQL is a language that programmers use to create, modify and extract data from the relational database, as well as control user access to the database. In addition to relational databases and SQL, an RDBMS like MySQL works with an operating system to implement a relational database in a computer's storage system, manages users, allows for network access and facilitates testing database integrity and creation of backups.

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

Ceph is a free and open-source software-defined storage platform that provides object storage, block storage, and file storage built on a common distributed cluster foundation. Ceph provides completely distributed operation without a single point of failure and scalability to the exabyte level, and is freely available. Since version 12 (Luminous), Ceph does not rely on any other conventional filesystem and directly manages HDDs and SSDs with its own storage backend BlueStore and can expose a POSIX filesystem.

MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB utilizes JSON-like documents with optional schemas. MongoDB is developed by MongoDB Inc. and current versions are licensed under the Server Side Public License (SSPL). MongoDB is a member of the MACH Alliance.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a  schema, it offers rapid  scalability  to manage  large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

Neo4j is a graph database management system (GDBMS) developed by Neo4j Inc.

OpenNebula is an open source cloud computing platform for managing heterogeneous data center, public cloud and edge computing infrastructure resources. OpenNebula manages on-premises and remote virtual infrastructure to build private, public, or hybrid implementations of Infrastructure as a Service and multi-tenant Kubernetes deployments. The two primary uses of the OpenNebula platform are data center virtualization and cloud deployments based on the KVM hypervisor, LXD/LXC system containers, and AWS Firecracker microVMs. The platform is also capable of offering the cloud infrastructure necessary to operate a cloud on top of existing VMware infrastructure. In early June 2020, OpenNebula announced the release of a new Enterprise Edition for corporate users, along with a Community Edition. OpenNebula CE is free and open-source software, released under the Apache License version 2. OpenNebula CE comes with free access to patch releases containing critical bug fixes but with no access to the regular EE maintenance releases. Upgrades to the latest minor/major version is only available for CE users with non-commercial deployments or with significant open source contributions to the OpenNebula Community. OpenNebula EE is distributed under a closed-source license and requires a commercial Subscription.

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents. Elasticsearch is developed in Java and is dual-licensed under the (source-available) Server Side Public License and the Elastic license, while other parts fall under the proprietary (source-available) Elastic License. Official clients are available in Java, .NET (C#), PHP, Python, Ruby and many other languages. According to the DB-Engines ranking, Elasticsearch is the most popular enterprise search engine.

DataStax, Inc. is a real-time data for AI company based in Santa Clara, California. Its product Astra DB is a cloud database-as-a-service based on Apache Cassandra. DataStax also offers DataStax Enterprise (DSE), an on-premises database built on Apache Cassandra, and Astra Streaming, a messaging and event streaming cloud service based on Apache Pulsar. As of June 2022, the company has roughly 800 customers distributed in over 50 countries.

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames, all of which are part of the core Apache Spark distribution; he also served as the release manager for Spark's 2.0 release.

JanusGraph is an open source, distributed graph database under The Linux Foundation. JanusGraph is available under the Apache License 2.0. The project is supported by IBM, Google, Hortonworks and Grakn Labs.

Prompt engineering is the process of structuring an instruction that can be interpreted and understood by a generative AI model. A prompt is natural language text describing the task that an AI should perform.

RavenDB is an open-source document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.

DuckDB is an open-source column-oriented relational database management system (RDBMS) originally developed by Mark Raasveldt and Hannes Mühleisen at the Centrum Wiskunde & Informatica (CWI) in the Netherlands and first released in 2019. Those behind the project say it has millions of downloads per month. It is designed to provide high performance on complex queries against large databases in embedded configuration, such as combining tables with hundreds of columns and billions of rows. Unlike other embedded databases DuckDB is not focusing on transactional (OLTP) applications and instead is specialized for online analytical processing (OLAP) workloads.

A vector database, vector store or vector search engine is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

Retrieval-augmented generation (RAG) is a type of information retrieval process. It modifies interactions with a large language model (LLM) so that it responds to queries with reference to a specified set of documents, using it in preference to information drawn from its own vast, static training data. This allows LLMs to use domain-specific and/or updated information. Use cases include providing chatbot access to internal company data, or giving factual information only from an authoritative source.

References

↑ Timothy Prickett Morgan, "Third Time Is The Charm For NebulaGraph Database". nextplatform.com. 19 February 2021. Retrieved 14 December 2022.
↑ George Leopold, "NebulaGraph Joins Database Race". datanami.com. 15 June 2020. Retrieved 14 December 2022.
↑ Wu, Min; Yi, Xinglu; Yu, Hui; Liu, Yu; Wang, Yujue (2022). "NebulaGraph: An open source distributed graph database". arXiv: 2206.07278 .
↑ "NebulaGraph: An open source distributed graph database". deepai.org. 15 June 2022. Retrieved 14 December 2022.
↑ "NebulaGraph Completes Series A to Scale Its Distributed Graph Database". datanami.com. 29 June 2020. Retrieved 14 December 2022.
↑ Jaime Hampton, "NebulaGraph Debuts for Big Data Analytics Discovery". datanami.com. 16 September 2022. Retrieved 14 December 2022.
↑ Rita Liao, "NebulaGraph reaps from China's growing appetite for graph databases". techcrunch.com. 16 September 2022. Retrieved 14 December 2022.
↑ "NebulaGraph Takes Another Step to Lead Global Graph Database Market With the Release of V3.0.0". martechseries.com. 18 February 2022. Retrieved 14 December 2022.
↑ "NebulaGraph Launches Industry-First Graph RAG: Retrieval-Augmented Generation with LLM Based on Knowledge Graphs". www.nebula-graph.io/.

External links

Official website

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Timothy Prickett Morgan, "Third Time Is The Charm For NebulaGraph Database". nextplatform.com. 19 February 2021. Retrieved 14 December 2022.

[2] George Leopold, "NebulaGraph Joins Database Race". datanami.com. 15 June 2020. Retrieved 14 December 2022.

[3] Wu, Min; Yi, Xinglu; Yu, Hui; Liu, Yu; Wang, Yujue (2022). "NebulaGraph: An open source distributed graph database". arXiv: 2206.07278 .

[4] "NebulaGraph: An open source distributed graph database". deepai.org. 15 June 2022. Retrieved 14 December 2022.

[5] "NebulaGraph Completes Series A to Scale Its Distributed Graph Database". datanami.com. 29 June 2020. Retrieved 14 December 2022.

[6] Jaime Hampton, "NebulaGraph Debuts for Big Data Analytics Discovery". datanami.com. 16 September 2022. Retrieved 14 December 2022.

[7] Rita Liao, "NebulaGraph reaps from China's growing appetite for graph databases". techcrunch.com. 16 September 2022. Retrieved 14 December 2022.

[8] "NebulaGraph Takes Another Step to Lead Global Graph Database Market With the Release of V3.0.0". martechseries.com. 18 February 2022. Retrieved 14 December 2022.

[9] "NebulaGraph Launches Industry-First Graph RAG: Retrieval-Augmented Generation with LLM Based on Knowledge Graphs". www.nebula-graph.io/.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

NebulaGraph

Contents

History

See also

Related Research Articles

References

External links