Comparison of structured storage software

Last updated

Structured storage is computer storage for structured data, often in the form of a distributed database. [1] Computer software formally known as structured storage systems include Apache Cassandra, [2] Google's Bigtable [3] and Apache HBase. [4]

Contents

Comparison

The following is a comparison of notable structured storage systems.

Project NameType Persistence Replication High Availability Transactions Rack-locality Awareness Implementation Language Influences, Sponsors License
Aerospike NoSQL databaseYes, Hybrid DRAM and flash for persistenceYesYes, Distributed for scaleYesYesC (small bits of assembly language)Aerospike AGPL v3
AllegroGraph Graph database YesNo - v5, 2010YesYesNo Common Lisp Franz Inc. Proprietary
Apache Ignite Key-valueTo and from an underlying persistent storage (e.g. an RDBMS)YesYesYesYes Java Apache, GridGain Systems Apache 2.0
Apache Jackrabbit Key-value & Hierarchical & DocumentYesYesYesYeslikely Java Apache, Roy Fielding, Day Software Apache 2.0
Berkeley DB/Dbm 1.xKey-valueYesNoNoNoNo C old schoolVarious
Berkeley DB Sleepycat/Oracle Berkeley DB 5.xKey-valueYesYesYesYesNoC, C++, or Javadbm, Sleepycat/Oracledual GPL-like Sleepycat License
Apache Cassandra Key-valueYesYesDistributedPartial Only supports CAS (Check And Set) after 2.1.1 and later [5] [6] YesJavaDynamo and Bigtable, Facebook/Digg/Rackspace Apache 2.0
ClustrixDB scale-out relationalYesYesDistributed and ReplicationYesNoCClustrixProprietary
Coherence Key-valuePersistent data typically in an RDBMS YesYesYesYes Java Oracle (previously Tangosol) Proprietary
Oracle NoSQL Database Key-valueYesYesYesYesNoJavaOracle AGPLv3 License or proprietary
Couchbase DocumentYesYesYesYes, with two-phase commits [7] Yes C++, Erlang, C, [8] Go CouchDB, Memcached Apache 2.0
CouchDB DocumentYesYesreplication + load balancingAtomicity is per document, per CouchDB instance [9] No Erlang Lotus Notes / Ubuntu, Mozilla, IBM Apache 2.0
Extensible Storage Engine(ESE/NT)Document or Key-valueYesNoNoYesNoC++, Assembly Microsoft Proprietary
FoundationDB Ordered Key-valueYesYesYesYesDepends on user configurationC++FoundationDB Proprietary
GT.M Key-valueYesYesYesYesDepends on user configurationC (small bits of assembly language)FIS AGPL v3
Project NameTypePersistenceReplicationHigh AvailabilityTransactionsRack-locality AwarenessImplementation LanguageInfluences, SponsorsLicense
Apache HBase Key-valueYes. Major version upgrades require re-import.Yes HDFS, [10] Amazon S3 [11] or Amazon Elastic Block Store. [12] Yes [13] Yes [14] See HDFS, S3 or EBS.Java Bigtable Apache 2.0
Information Management System IBM IMS aka DB1Key-value. Multi-levelYesYesYes, with HALDBYes, with IMS TMUnknown Assembler IBM since 1966 Proprietary
Infinispan Key-valueYesYesYesYesYesJavaRed Hat Apache 2.0
Memcached Key-valueNoNoNoPartial Only supports CAS (Check And Set - or Compare And Swap) [15] [16] NoCSix Apart/Couchbase/Fotolog/FacebookBSD-like permissive copyright by Danga
LevelDB Key-value, Bigtable YesNoNoPartial Multiple writes can be combined into single operationNoC++GoogleNew BSD License
LightningDB Key-value, memory-mapped filesYesNoNoYes, ACID, MVCCNoCSymasOpenLDAP Public License
MongoDB Document (JSON)YesYesfail-overPartial Single document atomicity [17] NoC++10gen GNU AGPL v3.0
Neo4j Graph database YesYesYesYesNoJavaNeo Technology GNU GPL v3.0
OrientDB Multi-Model (Graph-Document-Object-Key/Value)YesYes [18] Yes [19] Yes [20] YesJavaOrient Technologies Apache 2.0
Redis Key-valueYes. But last few queries can be lost. [21] YesYes [22] Yes [23] NoAnsi-CVMWare, Memcache BSD
ScyllaDB Key-valueYesYesDistributed and Replication [24] No [25] UnknownC++ Apache Cassandra AGPL v3
SimpleDB (Amazon.com)Document & Key-valueYesYes (automatic)YesUnknownlikelyErlangAmazon.comAmazon internal only
Tarantool Free-dimensional tuples with primary and secondary keysYes. (Asynchronous)YesYesYesNoC, Lua [26] Memcached, Mnesia, MySQL, Mail.ru BSD
Project NameTypePersistenceReplicationHigh AvailabilityTransactionsRack-locality AwarenessImplementation LanguageInfluences, SponsorsLicense

See also

Related Research Articles

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting.

<span class="mw-page-title-main">Redis</span> Open-source in-memory key–value database

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Patrick Eugene O'Neil was an American computer scientist, an expert on databases, and a professor of computer science at the University of Massachusetts Boston.

<span class="mw-page-title-main">Apache Hive</span> Database engine

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular NoSQL wide column store behind Apache Cassandra and HBase and the 67th most popular database engine of any type (complete) as of 2018.

<span class="mw-page-title-main">Apache Drill</span> Open-source software framework

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

<span class="mw-page-title-main">Apache Trafodion</span> Relational database management system for Apache Hadoop

Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase targeting big data transactional or operational workloads. The project was named after the Welsh word for transactions. As of April 2021, it is no longer actively developed.

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

<span class="mw-page-title-main">JanusGraph</span> Graph database

JanusGraph is an open source, distributed graph database under The Linux Foundation. JanusGraph is available under the Apache License 2.0. The project is supported by IBM, Google, Hortonworks and Grakn Labs.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

  1. Hamilton, James (3 November 2009). "Perspectives: One Size Does Not Fit All" . Retrieved 13 November 2009.
  2. Lakshman, Avinash; Malik, Prashant. "Cassandra - A Decentralized Structured Storage System" (PDF). Cornell University. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)
  3. Chang, Fay; Jeffrey Dean; Sanjay Ghemawat; Wilson C. Hsieh; Deborah A. Wallach; Mike Burrows; Tushar Chandra; Andrew Fikes; Robert E. Gruber. "Bigtable: A Distributed Storage System for Structured Data" (PDF). Archived from the original (PDF) on 11 May 2008. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)
  4. Kellerman, Jim. "HBase: structured storage of sparse data for Hadoop" (PDF). Retrieved 20 February 2016.
  5. java - Cassandra - transaction support - Stack Overflow
  6. Lightweight transactions
  7. Providing transactional logic
  8. Damien Katz (January 8, 2013). "The Unreasonable Effectiveness of C" . Retrieved September 30, 2016.
  9. "How do I use transactions with CouchDB?". Archived from the original on 2012-07-16. Retrieved 2012-07-12.
  10. HBase: Bigtable-like structured storage for Hadoop HDFS
  11. HBase on EC2 [ permanent dead link ]
  12. HBase on EC2 using EBS volumes : Lessons Learned | My AWS Musings
  13. Hbase/MultipleMasters - Hadoop Wiki
  14. ACID in HBase
  15. sql - Memcache with transactions? - Stack Overflow
  16. Memcached
  17. Atomic Operations - MongoDB
  18. "OrientDB Replication". Archived from the original on 2014-12-28. Retrieved 2015-01-08.
  19. "OrientDB Distributed Architecture Lifecycle". Archived from the original on 2015-01-19. Retrieved 2015-01-08.
  20. "OrientDB Transactions". Archived from the original on 2015-01-18. Retrieved 2015-01-08.
  21. Redis Persistence
  22. high availability - Redis master/slave replication - single point of failure? - Stack Overflow
  23. Transactions – Redis
  24. "Scylla Architecture - Fault Tolerance". Scylla Docs. Retrieved 2018-07-07.
  25. "Scylla Apache Cassandra Compatibility". Scylla Docs. Retrieved 2018-07-07.
  26. "Tarantool". GitHub . 29 April 2022.