ArangoDB

Last updated
ArangoDB
Developer(s) ArangoDB GmbH
Initial release2011;13 years ago (2011)
Stable release
3.11.5 / November 9, 2023;4 months ago (2023-11-09)
Repository
Written in C++, JavaScript
Type Multi-model database, Graph database, Document-oriented database, Key/Value database, Full-text Search Engine
License Business Source License 1.1 and Arango Community License
Website arangodb.com

ArangoDB is a graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models (graphs, JSON documents, key/value) [1] with one database core and a unified query language AQL (ArangoDB Query Language). AQL is mainly a declarative language [2] and allows the combination of different data access patterns in a single query. [3]

Contents

ArangoDB is a NoSQL database system [4] but AQL is similar in many ways to SQL, [5] it uses RocksDB as a storage engine.

History

ArangoDB Inc. was founded in 2015 by Claudius Weinberger and Frank Celler. [6] They originally called the database system “A Versatile Object Container", or AVOC for short, leading them to call the database AvocadoDB. [7] [8] [9] Later, they changed the name to ArangoDB. [10] The word "arango" refers to a little-known avocado variety grown in Cuba. [11]

In January 2017 ArangoDB raised a seed round investment of 4.2 million Euros led by Target Partners. In March 2019 ArangoDB raised 10 million dollars in series A funding [12] led by Bow Capital. In October 2021 ArangoDB raised 27.8 million dollars in series B funding led by Iris Capital. [13]

Release history

ReleaseFirst ReleaseLatest Minor VersionLatest ReleaseFeature NotesReference
3.112023-05-303.11.52023-11-09
  • Faster query performance across search and graph.
  • Data science and analytics operational enhancements.
  • Improved user experience for database administration and management.
Release Notes
3.102022-10-043.10.112023-10-19
  • Native ARM support, including native support for Apple Silicon.
  • Support for computed values (persistent document attributes that are generated when a document is created or updated).
  • Parallelism for sharded graphs.
  • A graph traversal algorithm to query for all paths with the shortest value, between two documents.
Release Notes
3.92022-02-153.9.122023-08-23
  • Collections replicated on all cluster nodes can be combined with graphs sharded by document attributes to enable more local execution of graph queries ("Hybrid SmartGraphs", "Hybrid Disjoint SmartGraphs").
  • Language-agnostic tokenization of text ("Segmentation Analyzer").
Release Notes
3.82021-07-293.8.92023-03-27
  • Graph traversal algorithms to enumerate all paths between two vertices ("k Paths") and to emit paths in order of increasing edge weights ("Weighted Traversals").
  • Support for sliding window queries to aggregate adjacent documents, value ranges and time intervals.
  • Geo-spatial queries can be combined with full-text search.
  • Flexible data field pre-processing with custom queries ("AQL Analyzer") and the ability to chain built-in and custom analyzers ("Pipeline Analyzer").
  • Hardware-accelerated on-disk encryption.
Release Notes
3.72020-09-163.7.172022-02-01
  • Graphs replicated on all cluster nodes to execute graph traversals locally ("SatelliteGraphs").
  • Document validation using JSON Schema.
  • Wildcard and fuzzy search support for full-text search.
  • Key rotation for superuser JWT tokens, TLS certificates, and on-disk encryption keys.
Release Notes
3.62020-01-083.6.162021-09-06
  • Option to store all collections of a database on a single cluster node, to combine the performance of a single server and ACID semantics with a fault-tolerant cluster setup ("OneShard").
  • Parallel execution of queries on several cluster nodes.
  • Late document materialization to only fetch the relevant documents from SORT/LIMIT queries and early pruning of non-matching documents in full collection scans.
  • Inlining of certain subqueries to improve execution time.
Release Notes
3.52019-08-213.5.72020-12-30
  • Multi-document transactions with individual begin and commit / abort commands ("Stream Transactions").
  • Time-based removal of expired documents ("Time-to-live Index").
  • Stop condition support for graph traversals ("Pruning in Traversals").
  • Graph traversal algorithm to get multiple shortest paths ("k Shortest Paths").
  • Co-located joins in a cluster using identically sharded collections ("SmartJoins").
  • Consistent snapshot backup in cluster mode.
  • Custom text pre-processors for full-text search ("Configurable Analyzers").
  • Data masking capabilities for attributes containing sensitive data / PII when creating backups.
Release Notes
3.42018-12-063.4.112020-09-09
  • Integrated full-text search and information retrieval engine ("ArangoSearch").
  • Improved geo-spatial index with GeoJSON support.
  • Insert operations can be turned into a replace automatically, in case that the target document already exists ("Repsert").
  • Round-robin load-balancer support for cloud environments.
  • Query profiling to show detailed runtime information.
  • Cluster-distributed aggregation queries.
  • Native implementations in C++ of all built-in query functions.
  • Multi-threaded dump and restore operations.
Release Notes
3.32017-12-223.3.252020-02-28
  • Datacenter to Datacenter Replication for disaster recovery ("DC2DC").
  • Encrypted backups.
  • Deployment mode for single servers with automatic failover.
Release Notes
3.22017-07-203.2.182019-02-02
  • Distributed iterative graph processing with Pregel in single server and cluster.
  • Collections replicated on all cluster nodes to execute joins with sharded data locally ("SatelliteCollections").
  • Fault-tolerant microservices.
  • Support for composable, distance-based geo-queries.
  • Export utility for multiple formats.
  • Encryption of on-disk data.
  • LDAP authentication.
Release Notes
3.12016-11-033.1.292018-06-23
  • Value-based sharding of large graph datasets for better data locality when traversing graphs ("SmartGraphs").
  • Support for vertex-centric indexes for more efficient graph traversals with filter conditions.
  • New viewer for large graphs, supporting WebGL.
  • Binary wire format ("VelocyStream").
  • Low-latency request handling using a boost-ASIO server infrastructure.
  • Improved query editor and query explain output.
  • Audit logging.
Release Notes
3.02016-07-233.0.122016-11-23
  • Cluster support with synchronous replication and automatic failover.
  • Binary storage format ("VelocyPack").
  • Persistent indexes that are stored on disk for faster restarts.
Release

Notes

Features

AQL (ArangoDB Query Language) is the SQL-like query language [26] used in ArangoDB. It supports CRUD operations for both documents (nodes) and edges, but it is not a data definition language (DDL). AQL does support geospatial queries.

AQL is JSON-oriented:

// Return every document in a collectionFORdocINcollectionRETURNdoc// Count the number of documents in a collectionFORdocINcollectionCOLLECTWITHCOUNTINTOlengthRETURNlength// Add a new document into our collectionINSERT{_key:"john",name:"John",age:45}INTOcollection// Update document with key of “john” to have age 46.UPDATE{_key:"john",age:46}INcollection// Add an attribute numberOfLogins for all users with status active:FORuINusersFILTERu.active==trueUPDATEuWITH{numberOfLogins:0}INusers

Editions

See also

Related Research Articles

<span class="mw-page-title-main">Informix</span> Database management software product family

Informix is a product family within IBM's Information Management division that is centered on several relational database management system (RDBMS) and multi-model database offerings. The Informix products were originally developed by Informix Corporation, whose Informix Software subsidiary was acquired by IBM in 2001.

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

<span class="mw-page-title-main">Apache Solr</span> Open-source enterprise-search platform

Solr is an open-source enterprise-search platform, written in Java. Its major features include full-text search, hit highlighting, faceted search, real-time indexing, dynamic clustering, database integration, NoSQL features and rich document handling. Providing distributed search and index replication, Solr is designed for scalability and fault tolerance. Solr is widely used for enterprise search and analytics use cases and has an active development community and regular releases.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

<span class="mw-page-title-main">Redis</span> Source available in-memory key–value database

Redis is a source available in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

In computing, Open Data Protocol (OData) is an open protocol that allows the creation and consumption of queryable and interoperable Web service APIs in a standard way. Microsoft initiated OData in 2007. Versions 1.0, 2.0, and 3.0 are released under the Microsoft Open Specification Promise. Version 4.0 was standardized at OASIS, with a release in March 2014. In April 2015 OASIS submitted OData v4 and OData JSON Format v4 to ISO/IEC JTC 1 for approval as an international standard. In December 2016, ISO/IEC published OData 4.0 Core as ISO/IEC 20802-1:2016 and the OData JSON Format as ISO/IEC 20802-2:2016.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Greenqloud is a cloud computing software company with headquarters in Reykjavik, Iceland, and office in Seattle, Washington, offering cloud computing software and services. Greenqloud develops and sells the cloud and infrastructure management software Qstack for the global market.

<span class="mw-page-title-main">OpenShift</span> Cloud computing software

OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernetes on a foundation of Red Hat Enterprise Linux. The family's other products provide this platform through different environments: OKD serves as the community-driven upstream, Several deployment methods are available including self-managed, cloud native under ROSA, ARO and RHOIC on AWS, Azure, and IBM Cloud respectively, OpenShift Online as software as a service, and OpenShift Dedicated as a managed service.

<span class="mw-page-title-main">Apache Spark</span> Open-source data analytics cluster computing framework

Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of California, Berkeley's AMPLab, the Spark codebase was later donated to the Apache Software Foundation, which has maintained it since.

Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management. Originally designed by Google, the project is now maintained by a worldwide community of contributors, and the trademark is held by the Cloud Native Computing Foundation.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">Prometheus (software)</span> Application used for event monitoring and alerting

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.

Azure Data Explorer is a fully-managed big data analytics cloud platform and data-exploration service, developed by Microsoft, that ingests structured, semi-structured and unstructured data. The service then stores this data and answers analytic ad hoc queries on it with seconds of latency. It is a full text indexing and retrieval database, including time series analysis capabilities and regular expression evaluation and text parsing.

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

The Cloud Native Computing Foundation (CNCF) is a Linux Foundation project that was founded in 2015 to help advance container technology and align the tech industry around its evolution.

RavenDB is an open-source document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.

References

  1. "Advantages of native multi-model in ArangoDB". ArangoDB. Retrieved 2022-07-26.
  2. "ArangoDB Query Language (AQL) Introduction | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-07-26.
  3. "AQL Query Patterns & Examples | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-07-26.
  4. Celler, Frank (2012-03-07). "ArangoDB's design objectives". ArangoDB. Retrieved 2022-07-26.
  5. "ArangoDB Query Language (AQL) Introduction | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-07-26.
  6. "Variety Database". www.avocadosource.com. Retrieved 2022-07-27.
  7. Ortell, Bill (2021-03-08), AvocadoDB , retrieved 2022-07-27
  8. AvocadoDB explained , retrieved 2022-07-27
  9. AvocadoDB Query Language Jan Steemann in english , retrieved 2022-07-27
  10. ""AvocadoDB" becomes "ArangoDB"". ArangoDB. 2012-05-09. Retrieved 2022-07-27.
  11. "Variety Database". www.avocadosource.com. Retrieved 2022-08-05.
  12. Weinberger, Claudius (2019-03-14). "ArangoDB receives Series A Funding led by Bow Capital". ArangoDB. Retrieved 2022-07-27.
  13. "ArangoDB Announces $27.8 Million Series B Investment to Accelerate Development of Next-Generation Graph ML, Providing Advanced Analytics and AI Capabilities at Enterprise Scale". ArangoDB. Retrieved 2022-07-27.
  14. AvocadoDB explained , retrieved 2022-08-05
  15. AvocadoDB Query Language Jan Steemann in english , retrieved 2022-08-05
  16. ArangoDB, ArangoDB, 2022-08-05, retrieved 2022-08-05
  17. "Cluster | ArangoDB Deployment Modes | Architecture | Manual | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  18. "DC2DC Replication | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  19. "Kubernetes | Tutorials | Manual | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  20. "Foxx Microservices | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  21. ArangoDB, ArangoDB, 2022-08-05, retrieved 2022-08-05
  22. "ArangoSearch - Full-text search engine including similarity ranking capabilities". ArangoDB. Retrieved 2022-08-05.
  23. "Stanford University Pregel White paper" (PDF).
  24. "Pregel | Data Science | Manual | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  25. "Transactions | Manual | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-05.
  26. "Cluster | ArangoDB Deployment Modes | Architecture | Manual | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-11.
  27. ArangoDB, ArangoDB, 2023-10-13, retrieved 2023-10-13
  28. "ArangoDB SmartGraphs | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-11.
  29. "ArangoDB SatelliteCollections | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-11.
  30. "ArangoDB Enterprise Features". ArangoDB. Retrieved 2022-08-11.
  31. "Getting Started with ArangoDB Oasis | ArangoDB Documentation". www.arangodb.com. Retrieved 2022-08-11.
  32. "ArangoDB Oasis". ArangoDB Oasis. Retrieved 2022-08-11.