TerminusDB

Last updated
TerminusDB
Developer(s) TerminusDB
Initial release2019;4 years ago (2019)
Stable release
11.0.0 / January 30, 2023;10 months ago (2023-01-30) [1]
Repository
Written in Rust, Prolog [2]
Type Graph database
License
Website terminusdb.com

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

Contents

TerminusDB provides a document API for building via the JSON exchange format. It implements both GraphQL and a datalog variant called WOQL. TerminusCMS is a cloud self-serve content and data platform built on TerminusDB.

TerminusDB is available under the Apache 2.0 license. TerminusDB is implemented in Prolog and Rust.

History

TerminusDB, previously known as DataChemist, [3] [4] was founded in Dublin, Ireland. Starting in Trinity College Dublin, [5] the development team behind TerminusDB ran the Horizon 2020 project ALIGNED that worked from February 2015 to January 2018. An open-access e-book entitled Engineering Agile Big-Data Systems was published on completion of the ALIGNED project. [6]

Version 1.0 was released in October 2019. [7] TerminusDB was first released under the GPLv3 license with the client libraries released with the Apache 2 license. With v4.0, which was released in December 2020, TerminusDB switched to the Apache 2.0 license. The shift was discussed extensively. [8]

Release history

TerminusDB release history
VersionRelease dateFeature notesRefs
1.0October 2019
  • first server release with HDT backend
[9]
1.1January 2020
  • instance and schema checking
[10]
2.0June 2020
  • Rust based storage backend
  • Delta encoding
  • Commit Graph
  • Time-travel on databases
  • Regular path queries
[11]
3.0September 2020
  • Added reverse path queries
  • Non-backtracking side-effect
  • Reset API allows resetting branch to arbitrary commit
  • Squash API operation now available
  • Default branch is now called main and not master [12]
[13]
4.0December 2020
  • CSV support
  • Automatic CSV schema generation
  • Extraction or filtering of types from arbitrary nodes
  • New CLI interface
  • Graphical Model Building Tool
4.1December 2020
  • Multiple witness flag
  • Add deb repo
4.2February 2021
  • Delta rollups for optimize
  • Large data transfers over TUS protocol
  • Document interface with CRUD actions
  • New frame for adding class choices
  • New branch management actions: squash, reset, delete, optimize
[14]
10.0September 2021
  • JSON schema interface
  • Radically simplified document interface
  • JSON documents can refer to other documents in the graph.
[15]
11.0January 2023
  • New TerminusDB dashboard
  • GraphQL Integration
  • New storage backend introduces a layer archive format reducing storage use and latency and simplifying interchange
  • Added typed storage for a wide variety of XSD types, reducing storage overhead and improving search performance

[16]

Name

TerminusDB is named after the Roman God of Boundaries, Terminus. It is also named after the home planet of the Foundation in the series of science-fiction novel by Issac Asimov. [17] TerminusDB uses a CowDuck mascot - the motif finds its origins in the examples used by core engineer Matthijs van Otterdijk when first demonstrating the append only immutable data store [18]

Software design

TerminusDB is an in-memory graph database management system with a rich query language. The design of the underlying data structure, which is implemented in a Rust library, uses a succinct data structures and delta encoding approach drawing inspiration from software source control systems like Git. [19] This allows all of the Git semantics to be used in TerminusDB.

Data model

TerminusDB is based on the RDF standard. This standard specifies finite labelled directed graphs which are parameterized in some universe of datatypes. The names for nodes and labels are drawn from a set of IRIs (Internationalized Resource Identifiers). TerminusDB uses the XSD datatypes as its universe of concrete values. For schema design, TerminusDB used the OWL language until version 10.0. Since version 10 it uses a JSON schema interface allowing users to build schemas using a simple JSON format. This provides a rich modelling language which enables constraints on the allowable shapes in the graph.

TerminusDB has a promise based client for the browser and node.js it is available through the npm registry, or can be directly included in web-sites. [20] It also has a Python client for the TerminusDB RESTful API and a python version of the web object query language, WOQLpy. [21]

Query language

GraphQL is implemented to allow users to query TerminusDB projects in such a way that deep linking can be discovered. [22]

WOQL (web object query language) is a datalog-based query language. It allows TerminusDB to treat the database as a document store or a graph interchangeably, and provides query features to make relationship traversals easy. This gives a relatively straightforward human-readable format which can be easily stored in TerminusDB itself.

Example

A simple query which creates a document in the database, along with labels and cardinality constraints. [23]

WOQL.doctype("BankAccount").label("Bank Account")     .property("owner","xsd:string")        .label("owner")        .cardinality(1)     .property("balance","xsd:nonNegativeInteger")        .label("owner")        .cardinality(1) 

TerminusCMS

TerminusCMS is a headless open-source content management system that uses a model-driven, API-first approach to content management. The CMS is built on TerminusDB and focused on software engineer users. It was released in February 2023 [24]

Related Research Articles

A query language, also known as data query language or database query language (DQL), is a computer language used to make queries in databases and information systems. In database systems, query languages rely on strict theory to retrieve information. A well known example is the Structured Query Language (SQL).

<span class="mw-page-title-main">JSON</span> Open standard file format and data interchange

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays. It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data.

Freebase was a large collaborative knowledge base consisting of data composed mainly by its community members. It was an online collection of structured data harvested from many sources, including individual, user-submitted wiki contributions. Freebase aimed to create a global resource that allowed people to access common information more effectively. It was developed by the American software company Metaweb and run publicly beginning in March 2007. Metaweb was acquired by Google in a private sale announced on 16 July 2010. Google's Knowledge Graph is powered in part by Freebase.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

<span class="mw-page-title-main">Redis</span> Open-source in-memory key–value database

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

<span class="mw-page-title-main">Neo4j</span> Graph database implemented in Java

Neo4j is a graph database management system developed by Neo4j, Inc.

FlockDB was an open-source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships between users, e.g. followings and favorites. FlockDB differs from other graph databases, e.g. Neo4j in that it was not designed for multi-hop graph traversal but rather for rapid set operations, not unlike the primary use-case for Redis sets. FlockDB was posted on GitHub shortly after Twitter released its Gizzard framework, which it used to query the FlockDB distributed datastore. The database is licensed under the Apache License.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries (HiveQL) into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

RethinkDB is a free and open-source, distributed document-oriented database originally created by the company of the same name. The database stores JSON documents with dynamic schemas, and is designed to facilitate pushing real-time updates for query results to applications. Initially seed funded by Y Combinator in June 2009, the company announced in October 2016 that it had been unable to build a sustainable business and its products would in future be entirely open-sourced without commercial support.

<span class="mw-page-title-main">ArangoDB</span> Multi-model database

ArangoDB is a graph database system developed by ArangoDB Inc. ArangoDB is a multi-model database system since it supports three data models with one database core and a unified query language AQL. AQL is mainly a declarative language and allows the combination of different data access patterns in a single query.

<span class="mw-page-title-main">RocksDB</span> Embedded key-value database

RocksDB is a high performance embedded database for key-value data. It is a fork of Google's LevelDB optimized to exploit multi-core processors (CPUs), and make efficient use of fast storage, such as solid-state drives (SSD), for input/output (I/O) bound workloads. It is based on a log-structured merge-tree data structure. It is written in C++ and provides official language bindings for C++, C, and Java. Many third-party language bindings exist. RocksDB is free and open-source software, released originally under a BSD 3-clause license. However, in July 2017 the project was migrated to a dual license of both Apache 2.0 and GPLv2 license. This change helped its adoption in Apache Software Foundation's projects after blacklist of the previous BSD+Patents license clause.

<span class="mw-page-title-main">GraphQL</span> Data query language developed by Facebook

GraphQL is an open-source data query and manipulation language for APIs and a query runtime engine.

<span class="mw-page-title-main">DBeaver</span> Multi-platform database administration software

DBeaver is a SQL client software application and a database administration tool. For relational databases it uses the JDBC application programming interface (API) to interact with databases via a JDBC driver. For other databases (NoSQL) it uses proprietary database drivers. It provides an editor that supports code completion and syntax highlighting. It provides a plug-in architecture that allows users to modify much of the application's behavior to provide database-specific functionality or features that are database-independent. This is a desktop application written in Java and based on Eclipse platform.

Shapes Constraint Language (SHACL) is a World Wide Web Consortium (W3C) standard language for describing Resource Description Framework (RDF) graphs. SHACL has been designed to enhance the semantic and technical interoperability layers of ontologies expressed as RDF graphs.

<span class="mw-page-title-main">Prometheus (software)</span> Application used for event monitoring and alerting

Prometheus is a free software application used for event monitoring and alerting. It records metrics in a time series database built using an HTTP pull model, with flexible queries and real-time alerting. The project is written in Go and licensed under the Apache 2 License, with source code available on GitHub, and is a graduated project of the Cloud Native Computing Foundation, along with Kubernetes and Envoy.

<span class="mw-page-title-main">Blazegraph</span> Open source triplestore and graph database

Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.

Datacommons.org is an open knowledge graph hosted by Google that provides a unified view across multiple public datasets, combining economic, scientific and other open datasets into an integrated data graph. The Datacommons.org site was launched in May 2018 with an initial dataset consisting of fact-checking data published in Schema.org "ClaimReview" format by several fact checkers from the International Fact-Checking Network. Google has worked with partners including the United States Census, the World Bank, and US Bureau of Labor Statistics to populate the repository, which also hosts data from Wikipedia, the National Oceanic and Atmospheric Administration and the Federal Bureau of Investigation. The service expanded during 2019 to include an RDF-style Knowledge Graph populated from a number of largely statistical open datasets. The service was announced to a wider audience in 2019. In 2020 the service improved its coverage of non-US datasets, while also increasing its coverage of bioinformatics and coronavirus.

<span class="mw-page-title-main">Ontotext GraphDB</span> RDF-store

Ontotext GraphDB is a graph database and knowledge discovery tool compliant with RDF and SPARQL and available as a high-availability cluster. Ontotext GraphDB is used in various European research projects.

References

  1. "Releases · terminusdb/terminusdb". GitHub . Retrieved 27 September 2022.
  2. "TerminusDB Repository". GitHub .
  3. "DataChemist wants to make sense of big-picture intelligence in the data analytics 'arms race'". Fora.ie. Retrieved 2020-05-06.
  4. "Innovadores | Cómo lograr la paz en el mundo con ayuda del big data". Innovadores (in Spanish). Retrieved 2020-05-06.
  5. "Show HN: TerminusDB – An open source in-memory graph database | Hacker News". news.ycombinator.com. Retrieved 2020-05-06.
  6. Feeney, Kevin; Davies, Jim; Welch, James; Hellmann, Sebastian; Dirschl, Christian; Koller, Andreas; Francois, Pieter; Marciniak, Arkadiusz (2018-10-30). Engineering Agile Big-Data Systems. River Publishers. ISBN   978-87-7022-016-3.
  7. Feeney, Luke (2019-10-07). "Today we release TerminusDB — the database for data people". Medium. Retrieved 2019-12-06.
  8. "We Love GPLv3, but Are Switching License to Apache 2.0 | Hacker News". news.ycombinator.com. Retrieved 2020-12-09.
  9. "GitHub - terminusdb/terminusdb at v1.0.0". GitHub. Retrieved 2021-09-27.
  10. "GitHub - terminusdb/terminusdb at v1.0.0". GitHub. Retrieved 2021-09-27.
  11. "Release Notes for TerminusDB 2.0 to 10.0". github.com. 24 November 2022.
  12. Feeney, Luke (31 August 2020). "TerminusDB: From 'Master' to Main | Graph Database Blog - News and Tutorials from TerminusDB". terminusdb.com/blog. Retrieved 2021-09-27.
  13. terminusdb/terminusdb, TerminusDB, 2021-09-27, retrieved 2021-09-27
  14. terminusdb/terminusdb, TerminusDB, 2021-09-27, retrieved 2021-09-27
  15. "Release Notes for TerminusDB 10.0". github.com. 24 November 2022.
  16. "Releases · terminusdb/terminusdb". GitHub. Retrieved 2023-02-19.
  17. Feeney, Luke (2019-10-01). "TerminusDB — what's in a name?". Medium. Retrieved 2019-12-06.
  18. terminusdb/terminus-store, TerminusDB, 2020-05-06, retrieved 2020-05-06
  19. "Succinct Data Structures and Delta Encoding for Modern Databases" (PDF). GitHub . 24 November 2022.
  20. terminusdb/terminus-client, TerminusDB, 2020-04-29, retrieved 2020-05-06
  21. terminusdb/terminus-client-python, TerminusDB, 2020-05-06, retrieved 2020-05-06
  22. Gavin (2023-02-07). "Putting the Graph in GraphQL Query". TerminusDB. Retrieved 2023-02-19.
  23. "Taking TerminusDB to The Bank (Part I)". Graph Database Blog - News and Tutorials from TerminusDB. Retrieved 2020-12-09.
  24. "TerminusDB Introduces TerminusCMS, Brand Networks Acquired by Augeo, More News". CMSWire.com. Retrieved 2023-02-19.