JanusGraph

Last updated
JanusGraph
Initial releaseMay 3, 2017;6 years ago (2017-05-03). [1]
Stable release
0.6.3 / February 18, 2023;12 months ago (2023-02-18) [2]
Repository https://github.com/JanusGraph/janusgraph/
Written in Java
Type Graph database
License Apache License 2.0
Website janusgraph.org

JanusGraph is an open source, distributed graph database under The Linux Foundation. [3] JanusGraph is available under the Apache License 2.0. The project is supported by IBM, Google, Hortonworks and Grakn Labs. [4]

Contents

JanusGraph supports various storage backends (Apache Cassandra, Apache HBase, Google Cloud Bigtable, Oracle BerkeleyDB, ScyllaDB). [5] [6] The Scalability of JanusGraph depends on the underlying technologies, which are used with JanusGraph. For example, by using Apache Cassandra as a storage backend scaling to multiple datacenters is provided out of the box.

JanusGraph supports global graph data analytics, reporting, and ETL through integration with big data platforms (Apache Spark, Apache Giraph, Apache Hadoop). [7]

JanusGraph supports geo, numeric range, and full-text search via external index storages (ElasticSearch, Apache Solr, Apache Lucene). [8]

JanusGraph has native integration with the Apache TinkerPop [9] graph stack (Gremlin graph query language, Gremlin graph server, Gremlin applications). [7]

History

JanusGraph is the fork of TitanDB [10] graph database which is being developed since 2012. [11] [3]

Licensing and contributions

JanusGraph is available under Apache Software License 2.0.

For contributions an individual or an organisation must sign a CLA paper. [29]

Literature

Publications

Related Research Articles

<span class="mw-page-title-main">IBM Db2</span> Relational model database server

Db2 is a family of data management products, including database servers, developed by IBM. It initially supported the relational model, but was extended to support object–relational features and non-relational structures like JSON and XML. The brand name was originally styled as DB2 until 2017, when it changed to its present form.

In computing, a solution stack or software stack is a set of software subsystems or components needed to create a complete platform such that no additional software is needed to support applications. Applications are said to "run on" or "run on top of" the resulting platform.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

A graph database (GDB) is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data. A key concept of the system is the graph. The graph relates the data items in the store to a collection of nodes and edges, the edges representing the relationships between the nodes. The relationships allow data in the store to be linked together directly and, in many cases, retrieved with one operation. Graph databases hold the relationships between data as a priority. Querying relationships is fast because they are perpetually stored in the database. Relationships can be intuitively visualized using graph databases, making them useful for heavily inter-connected data.

<span class="mw-page-title-main">Neo4j</span> Graph database implemented in Java

Neo4j is a graph database management system developed by Neo4j, Inc.

Riak is a distributed NoSQL key-value data store that offers high availability, fault tolerance, operational simplicity, and scalability. Riak moved to an entirely open-source project in August 2017, with many of the licensed Enterprise Edition features being incorporated. Riak implements the principles from Amazon's Dynamo paper with heavy influence from the CAP theorem. Written in Erlang, Riak has fault-tolerant data replication and automatic data distribution across the cluster for performance and resilience.

<span class="mw-page-title-main">OrientDB</span>

OrientDB is an open source NoSQL database management system written in Java. It is a Multi-model database, supporting graph, document and object models, the relationships are managed as in graph databases with direct connections between records. It supports schema-less, schema-full and schema-mixed modes. It has a strong security profiling system based on users and roles and supports querying with Gremlin along with SQL extended for graph traversal. OrientDB uses several indexing mechanisms based on B-tree and Extendible hashing, the last one is known as "hash index". Each record has Surrogate key which indicates the position of the record on disk. Links between records (edges) are stored either as the record's position stored directly inside of the referrer or as B-tree of record positions, that serves as a container of RIDs, which allows fast traversal of one-to-many relationships and fast addition/removal of new links. OrientDB is the 6th most popular graph database according to the DB-Engines graph database ranking, as of January 2024.

<span class="mw-page-title-main">Gremlin (query language)</span> Computing language

Gremlin is a graph traversal language and virtual machine developed by Apache TinkerPop of the Apache Software Foundation. Gremlin works for both OLTP-based graph databases as well as OLAP-based graph processors. Gremlin's automata and functional language foundation enable Gremlin to naturally support: imperative and declarative querying; host language agnosticism; user-defined domain specific languages; an extensible compiler/optimizer, single- and multi-machine execution models; hybrid depth- and breadth-first evaluation with Turing completeness.

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds. Kafka can connect to external systems via Kafka Connect, and provides the Kafka Streams libraries for stream processing applications. Kafka uses a binary TCP-based protocol that is optimized for efficiency and relies on a "message set" abstraction that naturally groups messages together to reduce the overhead of the network roundtrip. This "leads to larger network packets, larger sequential disk operations, contiguous memory blocks [...] which allows Kafka to turn a bursty stream of random message writes into linear writes."

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

Apache SystemDS is an open source ML system for the end-to-end data science lifecycle.

gRPC is a cross-platform open source high performance remote procedure call (RPC) framework. gRPC was initially created by Google, which used a single general-purpose RPC infrastructure called Stubby to connect the large number of microservices running within and across its data centers from about 2001. In March 2015, Google decided to build the next version of Stubby and make it open source. The result was gRPC, which is now used in many organizations aside from Google to power use cases from microservices to the "last mile" of computing. It uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. It generates cross-platform client and server bindings for many languages. Most common usage scenarios include connecting services in a microservices style architecture, or connecting mobile device clients to backend services.

Amazon Neptune is a managed graph database product published by Amazon.com. It is used as a web service and is part of Amazon Web Services (AWS). It was announced on November 29, 2017. Amazon Neptune supports popular graph models property graph and W3C's RDF, and their respective query languages Apache TinkerPop's Gremlin, openCypher, and SPARQL, including other Amazon Web Services products.

<span class="mw-page-title-main">Blazegraph</span> Open source triplestore and graph database

Blazegraph is an open source triplestore and graph database, developed by Systap, which is used in the Wikidata SPARQL endpoint and by other large customers. It is licensed under the GNU GPL.

PlaidML is a portable tensor compiler. Tensor compilers bridge the gap between the universal mathematical descriptions of deep learning operations, such as convolution, and the platform and chip-specific code needed to perform those operations with good performance. Internally, PlaidML makes use of the Tile eDSL to generate OpenCL, OpenGL, LLVM, or CUDA code. It enables deep learning on devices where the available computing hardware is either not well supported or the available software stack contains only proprietary components. For example, it does not require the usage of CUDA or cuDNN on Nvidia hardware, while achieving comparable performance.

TerminusDB is an open source knowledge graph and document store. It is used to build versioned data products. It is a native revision control database that is architecturally similar to Git. It is listed on DB-Engines.

An Ordered Key-Value Store (OKVS) is a type of data storage paradigm that can support multi-model database. An OKVS is an ordered mapping of bytes to bytes. It is a more powerful paradigm than Key-Value Store because OKVS allow to build higher level abstractions without the need to do full scans. An OKVS will keep the key-value pairs sorted by the key lexicographic order. OKVS systems provides different set of features and performance trade-offs. Most of them are shipped as a library without network interfaces, in order to be embedded in another process. Most OKVS support ACID guarantees. Some OKVS are distributed databases. Ordered Key-Value Store found their way into many modern database systems including NewSQL database systems.

NebulaGraph is an open-source distributed graph database built for super large-scale graphs with milliseconds of latency. NebulaGraph adopts the Apache 2.0 license and also comes with a wide range of data visualization tools.

References

  1. 1 2 "JanusGraph version 0.1.0". April 20, 2017 via Github.
  2. 1 2 "JanusGraph version 0.6.3". February 18, 2023 via Github.
  3. 1 2 "JanusGraph joining The Linux Foundation". www.linuxfoundation.org. The Linux Foundation. 12 January 2017. Archived from the original on 2018-08-24. Retrieved 2018-10-01.
  4. "GRAKN.AI Announces Collaboration with Expero, Google, Hortonworks and IBM on JanusGraph". 10 January 2019.
  5. "JanusGraph storage backends". Archived from the original on 2018-10-02. Retrieved 2018-09-19.
  6. "Powering a Graph Data System with Scylla + JanusGraph". 14 May 2019. Retrieved 2019-11-08.
  7. 1 2 "JanusGraph site". Archived from the original on 2018-08-27. Retrieved 2018-09-19.
  8. "JanusGraph index storages". Archived from the original on 2018-10-02. Retrieved 2018-09-19.
  9. TinkerPop, Apache. "Apache TinkerPop". tinkerpop.apache.org. Archived from the original on 2018-08-29. Retrieved 2018-09-19.
  10. "Titan: Distributed Graph Database". titan.thinkaurelius.com. Archived from the original on 2018-07-31. Retrieved 2018-09-19.
  11. "JanusGraph Picks Up Where TitanDB Left Off". datanami.com. Datanami. 13 January 2017. Archived from the original on 2018-08-24. Retrieved 2018-09-30.
  12. "JanusGraph version 0.1.1". May 16, 2017 via Github.
  13. "JanusGraph version 0.2.0". October 12, 2017. Archived from the original on 2017-10-22. Retrieved 2018-09-19 via Github.
  14. "JanusGraph version 0.2.1". July 10, 2018 via Github.
  15. "JanusGraph version 0.2.2". October 9, 2018 via Github.
  16. "JanusGraph version 0.2.3". May 21, 2019 via Github.
  17. "JanusGraph version 0.3.0". July 31, 2018 via Github.
  18. "JanusGraph version 0.3.1". October 2, 2018 via Github.
  19. "JanusGraph version 0.3.2". June 16, 2019 via Github.
  20. "JanusGraph version 0.3.3". January 11, 2020 via Github.
  21. "JanusGraph version 0.4.0". July 1, 2019 via Github.
  22. "JanusGraph version 0.4.1". January 14, 2020 via Github.
  23. "JanusGraph version 0.5.0". March 10, 2020 via Github.
  24. "JanusGraph version 0.5.1". March 25, 2020 via Github.
  25. "JanusGraph version 0.5.2". May 3, 2020 via Github.
  26. "JanusGraph version 0.5.3". December 24, 2020 via Github.
  27. "JanusGraph version 0.6.0". September 3, 2021 via Github.
  28. "JanusGraph version 0.6.1". January 18, 2022 via Github.
  29. "JanusGraph contribution rules". Archived from the original on 2017-06-08. Retrieved 2018-10-01 via Github.