Developer(s) | TerminusDB |
---|---|
Initial release | 2019 |
Stable release | 11.0.0 / January 30, 2023 [1] |
Repository | |
Written in | Rust, Prolog [2] |
Type | Graph database |
License | |
Website | terminusdb |
TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.
TerminusDB provides a document API for building via the JSON exchange format. It implements both GraphQL and a datalog variant called WOQL. TerminusCMS is a cloud self-serve content and data platform built on TerminusDB.
TerminusDB is available under the Apache 2.0 license. TerminusDB is implemented in Prolog and Rust.
TerminusDB, previously known as DataChemist, [3] [4] was founded in Dublin, Ireland. Starting in Trinity College Dublin, [5] the development team behind TerminusDB ran the Horizon 2020 project ALIGNED that worked from February 2015 to January 2018. An open-access e-book entitled Engineering Agile Big-Data Systems was published on completion of the ALIGNED project. [6]
Version 1.0 was released in October 2019. [7] TerminusDB was first released under the GPLv3 license with the client libraries released with the Apache 2 license. With v4.0, which was released in December 2020, TerminusDB switched to the Apache 2.0 license. The shift was discussed extensively. [8]
Version | Release date | Feature notes | Refs |
---|---|---|---|
1.0 | October 2019 |
| [9] |
1.1 | January 2020 |
| [10] |
2.0 | June 2020 |
| [11] |
3.0 | September 2020 |
| [13] |
4.0 | December 2020 |
| |
4.1 | December 2020 |
| |
4.2 | February 2021 |
| [14] |
10.0 | September 2021 |
| [15] |
11.0 | January 2023 |
| [16] [17] |
11.0 | June 2023 |
| [18] |
TerminusDB is named after the Roman God of Boundaries, Terminus. It is also named after the home planet of the Foundation in the series of science-fiction novel by Issac Asimov. [19] TerminusDB uses a CowDuck mascot - the motif finds its origins in the examples used by core engineer Matthijs van Otterdijk when first demonstrating the append only immutable data store [20]
TerminusDB is an in-memory graph database management system with a rich query language. The design of the underlying data structure, which is implemented in a Rust library, uses a succinct data structures and delta encoding approach drawing inspiration from software source control systems like Git. [21] This allows all of the Git semantics to be used in TerminusDB.
TerminusDB is based on the RDF standard. This standard specifies finite labelled directed graphs which are parameterized in some universe of datatypes. The names for nodes and labels are drawn from a set of IRIs (Internationalized Resource Identifiers). TerminusDB uses the XSD datatypes as its universe of concrete values. For schema design, TerminusDB used the OWL language until version 10.0. Since version 10 it uses a JSON schema interface allowing users to build schemas using a simple JSON format. This provides a rich modelling language which enables constraints on the allowable shapes in the graph.
TerminusDB has a promise based client for the browser and node.js it is available through the npm registry, or can be directly included in web-sites. [22] It also has a Python client for the TerminusDB RESTful API and a python version of the web object query language, WOQLpy. [23]
GraphQL is implemented to allow users to query TerminusDB projects in such a way that deep linking can be discovered. [24]
WOQL (web object query language) is a datalog-based query language. It allows TerminusDB to treat the database as a document store or a graph interchangeably, and provides query features to make relationship traversals easy. This gives a relatively straightforward human-readable format which can be easily stored in TerminusDB itself.
A simple query which creates a document in the database, along with labels and cardinality constraints. [25]
WOQL.doctype("BankAccount").label("Bank Account") .property("owner","xsd:string") .label("owner") .cardinality(1) .property("balance","xsd:nonNegativeInteger") .label("owner") .cardinality(1)
TerminusDB published a sidecar vector database called VectorLink. It is a data tool to provide large language models with semantic context about data. Drawing on the features of TerminusDB, it provides versioned indexing of data and content..
A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).
JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.
Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.
A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.
A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.
Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.
FlockDB was an open-source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships between users, e.g. followings and favorites. FlockDB differs from other graph databases, e.g. Neo4j in that it was not designed for multi-hop graph traversal but rather for rapid set operations, not unlike the primary use-case for Redis sets. FlockDB was posted on GitHub shortly after Twitter released its Gizzard framework, which it used to query the FlockDB distributed datastore. The database is licensed under the Apache License.
Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data.
Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.
RethinkDB is a free and open-source, distributed document-oriented database originally created by the company of the same name. The database stores JSON documents with dynamic schemas, and is designed to facilitate pushing real-time updates for query results to applications. Initially seed funded by Y Combinator in June 2009, the company announced in October 2016 that it had been unable to build a sustainable business and its products would be entirely open-sourced without commercial support.
ArangoDB is a graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models with one database core and a unified query language AQL. AQL is mainly a declarative language and allows the combination of different data access patterns in a single query.
RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.
GraphQL is a data query and manipulation language for APIs, that allows a client to specify what data it needs. A GraphQL server can fetch data from separate sources for a single client query and present the results in a unified graph, so it is not tied to any specific database or storage engine.
gRPC is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing. gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.
DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.
TypeDB is an open-source, distributed database management system that relies on a user-defined type system to model, manage, and query data.
Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.
Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.
RavenDB is an open-source document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.
A vector database, vector store or vector search engine is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.