TerminusDB

Last updated
TerminusDB
Developer(s) TerminusDB
Initial release2019;5 years ago (2019)
Stable release
11.0.0 / January 30, 2023;18 months ago (2023-01-30) [1]
Repository
Written in Rust, Prolog [2]
Type Graph database
License
Website terminusdb.com

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

Contents

TerminusDB provides a document API for building via the JSON exchange format. It implements both GraphQL and a datalog variant called WOQL. TerminusCMS is a cloud self-serve content and data platform built on TerminusDB.

TerminusDB is available under the Apache 2.0 license. TerminusDB is implemented in Prolog and Rust.

History

TerminusDB, previously known as DataChemist, [3] [4] was founded in Dublin, Ireland. Starting in Trinity College Dublin, [5] the development team behind TerminusDB ran the Horizon 2020 project ALIGNED that worked from February 2015 to January 2018. An open-access e-book entitled Engineering Agile Big-Data Systems was published on completion of the ALIGNED project. [6]

Version 1.0 was released in October 2019. [7] TerminusDB was first released under the GPLv3 license with the client libraries released with the Apache 2 license. With v4.0, which was released in December 2020, TerminusDB switched to the Apache 2.0 license. The shift was discussed extensively. [8]

Release history

TerminusDB release history
VersionRelease dateFeature notesRefs
1.0October 2019
  • first server release with HDT backend
[9]
1.1January 2020
  • instance and schema checking
[10]
2.0June 2020
  • Rust based storage backend
  • Delta encoding
  • Commit Graph
  • Time-travel on databases
  • Regular path queries
[11]
3.0September 2020
  • Added reverse path queries
  • Non-backtracking side-effect
  • Reset API allows resetting branch to arbitrary commit
  • Squash API operation now available
  • Default branch is now called main and not master [12]
[13]
4.0December 2020
  • CSV support
  • Automatic CSV schema generation
  • Extraction or filtering of types from arbitrary nodes
  • New CLI interface
  • Graphical Model Building Tool
4.1December 2020
  • Multiple witness flag
  • Add deb repo
4.2February 2021
  • Delta rollups for optimize
  • Large data transfers over TUS protocol
  • Document interface with CRUD actions
  • New frame for adding class choices
  • New branch management actions: squash, reset, delete, optimize
[14]
10.0September 2021
  • JSON schema interface
  • Radically simplified document interface
  • JSON documents can refer to other documents in the graph.
[15]
11.0January 2023
  • New TerminusDB dashboard
  • GraphQL Integration
  • New storage backend introduces a layer archive format reducing storage use and latency and simplifying interchange
  • Added typed storage for a wide variety of XSD types, reducing storage overhead and improving search performance
[16] [17]
11.0June 2023
  • VectorLink sidecar vector database launched
  • Vector sidecar that integrates with TerminusDB or can be used independently
[18]

Name

TerminusDB is named after the Roman God of Boundaries, Terminus. It is also named after the home planet of the Foundation in the series of science-fiction novel by Issac Asimov. [19] TerminusDB uses a CowDuck mascot - the motif finds its origins in the examples used by core engineer Matthijs van Otterdijk when first demonstrating the append only immutable data store [20]

Software design

TerminusDB is an in-memory graph database management system with a rich query language. The design of the underlying data structure, which is implemented in a Rust library, uses a succinct data structures and delta encoding approach drawing inspiration from software source control systems like Git. [21] This allows all of the Git semantics to be used in TerminusDB.

Data model

TerminusDB is based on the RDF standard. This standard specifies finite labelled directed graphs which are parameterized in some universe of datatypes. The names for nodes and labels are drawn from a set of IRIs (Internationalized Resource Identifiers). TerminusDB uses the XSD datatypes as its universe of concrete values. For schema design, TerminusDB used the OWL language until version 10.0. Since version 10 it uses a JSON schema interface allowing users to build schemas using a simple JSON format. This provides a rich modelling language which enables constraints on the allowable shapes in the graph.

TerminusDB has a promise based client for the browser and node.js it is available through the npm registry, or can be directly included in web-sites. [22] It also has a Python client for the TerminusDB RESTful API and a python version of the web object query language, WOQLpy. [23]

Query language

GraphQL is implemented to allow users to query TerminusDB projects in such a way that deep linking can be discovered. [24]

WOQL (web object query language) is a datalog-based query language. It allows TerminusDB to treat the database as a document store or a graph interchangeably, and provides query features to make relationship traversals easy. This gives a relatively straightforward human-readable format which can be easily stored in TerminusDB itself.

Example

A simple query which creates a document in the database, along with labels and cardinality constraints. [25]

WOQL.doctype("BankAccount").label("Bank Account")     .property("owner","xsd:string")        .label("owner")        .cardinality(1)     .property("balance","xsd:nonNegativeInteger")        .label("owner")        .cardinality(1) 

TerminusDB published a sidecar vector database called VectorLink. It is a data tool to provide large language models with semantic context about data. Drawing on the features of TerminusDB, it provides versioned indexing of data and content..

Related Research Articles

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a commonly used data format with diverse uses in electronic data interchange, including that of web applications with servers.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

Redis is a source-available, in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

FlockDB was an open-source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships between users, e.g. followings and favorites. FlockDB differs from other graph databases, e.g. Neo4j in that it was not designed for multi-hop graph traversal but rather for rapid set operations, not unlike the primary use-case for Redis sets. FlockDB was posted on GitHub shortly after Twitter released its Gizzard framework, which it used to query the FlockDB distributed datastore. The database is licensed under the Apache License.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data.

<span class="mw-page-title-main">Cosmos DB</span> Cloud-based NoSQL database service

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.

RethinkDB is a free and open-source, distributed document-oriented database originally created by the company of the same name. The database stores JSON documents with dynamic schemas, and is designed to facilitate pushing real-time updates for query results to applications. Initially seed funded by Y Combinator in June 2009, the company announced in October 2016 that it had been unable to build a sustainable business and its products would be entirely open-sourced without commercial support.

<span class="mw-page-title-main">ArangoDB</span> Multi-model database

ArangoDB is a graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models with one database core and a unified query language AQL. AQL is mainly a declarative language and allows the combination of different data access patterns in a single query.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">GraphQL</span> Data query language developed by Facebook

GraphQL is a data query and manipulation language for APIs, that allows a client to specify what data it needs. A GraphQL server can fetch data from separate sources for a single client query and present the results in a unified graph, so it is not tied to any specific database or storage engine.

gRPC is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservices to the "last mile" of computing. gRPC uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.

<span class="mw-page-title-main">TypeDB</span> Open-source, strongly-typed database

TypeDB is an open-source, distributed database management system that relies on a user-defined type system to model, manage, and query data.

<span class="mw-page-title-main">Prometheus (software)</span> Application used for event monitoring and alerting

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.

<span class="mw-page-title-main">Blazegraph</span> Open source triplestore and graph database

Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.

RavenDB is an open-source document-oriented database written in C#, developed by Hibernating Rhinos Ltd. It is cross-platform, supported on Windows, Linux, and Mac OS. RavenDB stores data as JSON documents and can be deployed in distributed clusters with master-master replication.

A vector database, vector store or vector search engine is a database that can store vectors along with other data items. Vector databases typically implement one or more Approximate Nearest Neighbor (ANN) algorithms, so that one can search the database with a query vector to retrieve the closest matching database records.

References

  1. "Releases · terminusdb/terminusdb". GitHub . Retrieved 27 September 2022.
  2. "TerminusDB Repository". GitHub .
  3. "DataChemist wants to make sense of big-picture intelligence in the data analytics 'arms race'". Fora.ie. 7 March 2019. Retrieved 2020-05-06.
  4. "Innovadores | Cómo lograr la paz en el mundo con ayuda del big data". Innovadores (in Spanish). Retrieved 2020-05-06.
  5. "Show HN: TerminusDB – An open source in-memory graph database | Hacker News". news.ycombinator.com. Retrieved 2020-05-06.
  6. Feeney, Kevin; Davies, Jim; Welch, James; Hellmann, Sebastian; Dirschl, Christian; Koller, Andreas; Francois, Pieter; Marciniak, Arkadiusz (2018-10-30). Engineering Agile Big-Data Systems. River Publishers. ISBN   978-87-7022-016-3.
  7. Feeney, Luke (2019-10-07). "Today we release TerminusDB — the database for data people". Medium. Retrieved 2019-12-06.
  8. "We Love GPLv3, but Are Switching License to Apache 2.0 | Hacker News". news.ycombinator.com. Retrieved 2020-12-09.
  9. "GitHub - terminusdb/terminusdb at v1.0.0". GitHub. Retrieved 2021-09-27.
  10. "GitHub - terminusdb/terminusdb at v1.0.0". GitHub. Retrieved 2021-09-27.
  11. "Release Notes for TerminusDB 2.0 to 10.0". github.com. 24 November 2022.
  12. Feeney, Luke (31 August 2020). "TerminusDB: From 'Master' to Main | Graph Database Blog - News and Tutorials from TerminusDB". terminusdb.com/blog. Retrieved 2021-09-27.
  13. terminusdb/terminusdb, TerminusDB, 2021-09-27, retrieved 2021-09-27
  14. terminusdb/terminusdb, TerminusDB, 2021-09-27, retrieved 2021-09-27
  15. "Release Notes for TerminusDB 10.0". github.com. 24 November 2022.
  16. "Releases · terminusdb/terminusdb". GitHub. Retrieved 2023-02-19.
  17. "Releases · terminusdb/terminusdb". GitHub. Retrieved 2023-02-19.
  18. Oliver (2023-06-21). "Building a Vector Database to Make Use of Vector Embeddings". TerminusDB Community. Retrieved 2024-08-13.
  19. Feeney, Luke (2019-10-01). "TerminusDB — what's in a name?". Medium. Retrieved 2019-12-06.
  20. terminusdb/terminus-store, TerminusDB, 2020-05-06, retrieved 2020-05-06
  21. "Succinct Data Structures and Delta Encoding for Modern Databases" (PDF). GitHub . 24 November 2022.
  22. terminusdb/terminus-client, TerminusDB, 2020-04-29, retrieved 2020-05-06
  23. terminusdb/terminus-client-python, TerminusDB, 2020-05-06, retrieved 2020-05-06
  24. Gavin (2023-02-07). "Putting the Graph in GraphQL Query". TerminusDB. Retrieved 2023-02-19.
  25. "Taking TerminusDB to The Bank (Part I)". Graph Database Blog - News and Tutorials from TerminusDB. Retrieved 2020-12-09.