Comparison of structured storage software

Last updated March 05, 2024

Structured storage is computer storage for structured data, often in the form of a distributed database.^[1] Computer software formally known as structured storage systems include Apache Cassandra,^[2] Google's Bigtable ^[3] and Apache HBase.^[4]

Comparison

The following is a comparison of notable structured storage systems.

Project Name	Type	Persistence	Replication	High Availability	Transactions	Rack-locality Awareness	Implementation Language	Influences, Sponsors	License
Aerospike	NoSQL database	Yes, Hybrid DRAM and flash for persistence	Yes	Yes, Distributed for scale	Yes	Yes	C (small bits of assembly language)	Aerospike	AGPL v3
AllegroGraph	Graph database	Yes	No - v5, 2010	Yes	Yes	No	Common Lisp	Franz Inc.	Proprietary
Apache Ignite	Key-value	To and from an underlying persistent storage (e.g. an RDBMS)	Yes	Yes	Yes	Yes	Java	Apache, GridGain Systems	Apache 2.0
Apache Jackrabbit	Key-value & Hierarchical & Document	Yes	Yes	Yes	Yes	likely	Java	Apache, Roy Fielding, Day Software	Apache 2.0
Berkeley DB/Dbm 1.x	Key-value	Yes	No	No	No	No	C	old school	Various
Berkeley DB Sleepycat/Oracle Berkeley DB 5.x	Key-value	Yes	Yes	Yes	Yes	No	C, C++, or Java	dbm, Sleepycat/Oracle	dual GPL-like Sleepycat License
Apache Cassandra	Key-value	Yes	Yes	Distributed	Partial Only supports CAS (Check And Set) after 2.1.1 and later^[5]^[6]	Yes	Java	Dynamo and Bigtable, Facebook/Digg/Rackspace	Apache 2.0
ClustrixDB	scale-out relational	Yes	Yes	Distributed and Replication	Yes	No	C	Clustrix	Proprietary
Coherence	Key-value	Persistent data typically in an RDBMS	Yes	Yes	Yes	Yes	Java	Oracle (previously Tangosol)	Proprietary
Oracle NoSQL Database	Key-value	Yes	Yes	Yes	Yes	No	Java	Oracle	AGPLv3 License or proprietary
Couchbase	Document	Yes	Yes	Yes	Yes, with two-phase commits^[7]	Yes	C++, Erlang, C,^[8] Go	CouchDB, Memcached	Apache 2.0
CouchDB	Document	Yes	Yes	replication + load balancing	Atomicity is per document, per CouchDB instance^[9]	No	Erlang	Lotus Notes / Ubuntu, Mozilla, IBM	Apache 2.0
Extensible Storage Engine(ESE/NT)	Document or Key-value	Yes	No	No	Yes	No	C++, Assembly	Microsoft	Proprietary
FoundationDB	Ordered Key-value	Yes	Yes	Yes	Yes	Depends on user configuration	C++	FoundationDB	Proprietary
GT.M	Key-value	Yes	Yes	Yes	Yes	Depends on user configuration	C (small bits of assembly language)	FIS	AGPL v3
Project Name	Type	Persistence	Replication	High Availability	Transactions	Rack-locality Awareness	Implementation Language	Influences, Sponsors	License
Apache HBase	Key-value	Yes. Major version upgrades require re-import.	Yes HDFS,^[10] Amazon S3 ^[11] or Amazon Elastic Block Store.^[12]	Yes^[13]	Yes^[14]	See HDFS, S3 or EBS.	Java	Bigtable	Apache 2.0
Information Management System IBM IMS aka DB1	Key-value. Multi-level	Yes	Yes	Yes, with HALDB	Yes, with IMS TM	Unknown	Assembler	IBM since 1966	Proprietary
Infinispan	Key-value	Yes	Yes	Yes	Yes	Yes	Java	Red Hat	Apache 2.0
Memcached	Key-value	No	No	No	Partial Only supports CAS (Check And Set - or Compare And Swap)^[15]^[16]	No	C	Six Apart/Couchbase/Fotolog/Facebook	BSD-like permissive copyright by Danga
LevelDB	Key-value, Bigtable	Yes	No	No	Partial Multiple writes can be combined into single operation	No	C++	Google	New BSD License
LightningDB	Key-value, memory-mapped files	Yes	No	No	Yes, ACID, MVCC	No	C	Symas	OpenLDAP Public License
MongoDB	Document (JSON)	Yes	Yes	fail-over	Partial Single document atomicity^[17]	No	C++	10gen	GNU AGPL v3.0
Neo4j	Graph database	Yes	Yes	Yes	Yes	No	Java	Neo Technology	GNU GPL v3.0
OrientDB	Multi-Model (Graph-Document-Object-Key/Value)	Yes	Yes^[18]	Yes^[19]	Yes^[20]	Yes	Java	Orient Technologies	Apache 2.0
Redis	Key-value	Yes. But last few queries can be lost.^[21]	Yes	Yes^[22]	Yes^[23]	No	Ansi-C	VMWare, Memcache	BSD
ScyllaDB	Key-value	Yes	Yes	Distributed and Replication^[24]	No^[25]	Unknown	C++	Apache Cassandra	AGPL v3
SimpleDB (Amazon.com)	Document & Key-value	Yes	Yes (automatic)	Yes	Unknown	likely	Erlang	Amazon.com	Amazon internal only
Tarantool	Free-dimensional tuples with primary and secondary keys	Yes. (Asynchronous)	Yes	Yes	Yes	No	C, Lua^[26]	Memcached, Mnesia, MySQL, Mail.ru	BSD
Project Name	Type	Persistence	Replication	High Availability	Transactions	Rack-locality Awareness	Implementation Language	Influences, Sponsors	License

Related Research Articles

A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting.

Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

Patrick Eugene O'Neil was an American computer scientist, an expert on databases, and a professor of computer science at the University of Massachusetts Boston.

Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.

A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.

Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular NoSQL wide column store behind Apache Cassandra and HBase and the 67th most popular database engine of any type (complete) as of 2018.

Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.

Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.

Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase targeting big data transactional or operational workloads. The project was named after the Welsh word for transactions. As of April 2021, it is no longer actively developed.

The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.

A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.

Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.

JanusGraph is an open source, distributed graph database under The Linux Foundation. JanusGraph is available under the Apache License 2.0. The project is supported by IBM, Google, Hortonworks and Grakn Labs.

<span class="mw-page-title-main">YugabyteDB</span> Transactional distributed SQL database

YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.

References

↑ Hamilton, James (3 November 2009). "Perspectives: One Size Does Not Fit All" . Retrieved 13 November 2009.
↑ Lakshman, Avinash; Malik, Prashant. "Cassandra - A Decentralized Structured Storage System" (PDF). Cornell University. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)
↑ Chang, Fay; Jeffrey Dean; Sanjay Ghemawat; Wilson C. Hsieh; Deborah A. Wallach; Mike Burrows; Tushar Chandra; Andrew Fikes; Robert E. Gruber. "Bigtable: A Distributed Storage System for Structured Data" (PDF). Archived from the original (PDF) on 11 May 2008. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)
↑ Kellerman, Jim. "HBase: structured storage of sparse data for Hadoop" (PDF). Retrieved 20 February 2016.
↑ java - Cassandra - transaction support - Stack Overflow
↑ Lightweight transactions
↑ Providing transactional logic
↑ Damien Katz (January 8, 2013). "The Unreasonable Effectiveness of C" . Retrieved September 30, 2016.
↑ "How do I use transactions with CouchDB?". Archived from the original on 2012-07-16. Retrieved 2012-07-12.
↑ HBase: Bigtable-like structured storage for Hadoop HDFS
↑ HBase on EC2 ^{[ permanent dead link ]}
↑ HBase on EC2 using EBS volumes : Lessons Learned | My AWS Musings
↑ Hbase/MultipleMasters - Hadoop Wiki
↑ ACID in HBase
↑ sql - Memcache with transactions? - Stack Overflow
↑ Memcached
↑ Atomic Operations - MongoDB
↑ "OrientDB Replication". Archived from the original on 2014-12-28. Retrieved 2015-01-08.
↑ "OrientDB Distributed Architecture Lifecycle". Archived from the original on 2015-01-19. Retrieved 2015-01-08.
↑ "OrientDB Transactions". Archived from the original on 2015-01-18. Retrieved 2015-01-08.
↑ Redis Persistence
↑ high availability - Redis master/slave replication - single point of failure? - Stack Overflow
↑ Transactions – Redis
↑ "Scylla Architecture - Fault Tolerance". Scylla Docs. Retrieved 2018-07-07.
↑ "Scylla Apache Cassandra Compatibility". Scylla Docs. Retrieved 2018-07-07.
↑ "Tarantool". GitHub . 29 April 2022.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hamilton, James (3 November 2009). "Perspectives: One Size Does Not Fit All" . Retrieved 13 November 2009.

[2] Lakshman, Avinash; Malik, Prashant. "Cassandra - A Decentralized Structured Storage System" (PDF). Cornell University. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)

[3] Chang, Fay; Jeffrey Dean; Sanjay Ghemawat; Wilson C. Hsieh; Deborah A. Wallach; Mike Burrows; Tushar Chandra; Andrew Fikes; Robert E. Gruber. "Bigtable: A Distributed Storage System for Structured Data" (PDF). Archived from the original (PDF) on 11 May 2008. Retrieved 13 November 2009.{{cite journal}}: Cite journal requires |journal= (help)

[4] Kellerman, Jim. "HBase: structured storage of sparse data for Hadoop" (PDF). Retrieved 20 February 2016.

[5] va - Cassandra - transaction support - Stack Overflow

[6] Lightweight transactions

[7] Providing transactional logic

[8] Damien Katz (January 8, 2013). "The Unreasonable Effectiveness of C" . Retrieved September 30, 2016.

[9] "How do I use transactions with CouchDB?". Archived from the original on 2012-07-16. Retrieved 2012-07-12.

[10] HBase: Bigtable-like structured storage for Hadoop HDFS

[11] HBase on EC2 ^{[ permanent dead link ]}

[12] HBase on EC2 using EBS volumes : Lessons Learned | My AWS Musings

[13] Hbase/MultipleMasters - Hadoop Wiki

[14] ACID in HBase

[15] sql - Memcache with transactions? - Stack Overflow

[16] Memcached

[17] Atomic Operations - MongoDB

[18] "OrientDB Replication". Archived from the original on 2014-12-28. Retrieved 2015-01-08.

[19] "OrientDB Distributed Architecture Lifecycle". Archived from the original on 2015-01-19. Retrieved 2015-01-08.

[20] "OrientDB Transactions". Archived from the original on 2015-01-18. Retrieved 2015-01-08.

[21] Redis Persistence

[22] vailability - Redis master/slave replication - single point of failure? - Stack Overflow

[23] Transactions – Redis

[24] "Scylla Architecture - Fault Tolerance". Scylla Docs. Retrieved 2018-07-07.

[25] "Scylla Apache Cassandra Compatibility". Scylla Docs. Retrieved 2018-07-07.

[26] "Tarantool". GitHub . 29 April 2022.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

v t e Cloud computing
Business models	Content as a service Data as a service Desktop as a service Function as a service Infrastructure as a service Integration platform as a service Backend as a service Network as a service Platform as a service Security as a service Software as a service
Technologies	Cloud database Cloud storage Data centers Distributed file system for cloud Hardware virtualization Internet Native cloud application Networking Security Structured storage Virtual appliance Web APIs Virtual private cloud
Applications	Box Dropbox Google Workspace Drive HP Cloud (closed) IBM Cloud Microsoft Office 365 OneDrive Nextcloud Oracle Cloud Rackspace Salesforce Workday Zoho
Platforms	Alibaba Cloud Amazon Web Services AppScale Box CloudBolt Cloud Foundry Cocaine (PaaS) Creatio Engine Yard Helion GE Predix Google App Engine GreenQloud Heroku IBM Cloud Inktank Jelastic Microsoft Azure MindSphere Netlify Oracle Cloud OutSystems openQRM OpenShift PythonAnywhere RightScale Scalr Force.com SAP Cloud Platform Splunk Vercel vCloud Air WaveMaker
Infrastructure	Alibaba Cloud Amazon Web Services Abiquo Enterprise Edition CloudStack Citrix Cloud Deft DigitalOcean EMC Atmos Eucalyptus Fujitsu Google Cloud Platform GreenButton GreenQloud IBM Cloud iland Joyent Linode Lunacloud Microsoft Azure Mirantis Netlify Nimbula Nimbus OpenIO OpenNebula OpenStack Oracle Cloud OrionVM Rackspace Cloud Safe Swiss Cloud Zadara libvirt libguestfs OVirt Virtual Machine Manager Wakame-vdc Vercel Virtual Private Cloud OnDemand
Category Commons

Comparison of structured storage software

Contents

Comparison

See also

Related Research Articles

References