Structured storage is computer storage for structured data, often in the form of a distributed database. [1] Computer software formally known as structured storage systems include Apache Cassandra, [2] Google's Bigtable [3] and Apache HBase. [4]
The following is a comparison of notable structured storage systems.
Project Name | Type | Persistence | Replication | High Availability | Transactions | Rack-locality Awareness | Implementation Language | Influences, Sponsors | License |
---|---|---|---|---|---|---|---|---|---|
Aerospike | NoSQL database | Yes, Hybrid DRAM and flash for persistence | Yes | Yes, Distributed for scale | Yes | Yes | C (small bits of assembly language) | Aerospike | AGPL v3 |
AllegroGraph | Graph database | Yes | No - v5, 2010 | Yes | Yes | No | Common Lisp | Franz Inc. | Proprietary |
Apache Ignite | Key-value | To and from an underlying persistent storage (e.g. an RDBMS) | Yes | Yes | Yes | Yes | Java | Apache, GridGain Systems | Apache 2.0 |
Apache Jackrabbit | Key-value & Hierarchical & Document | Yes | Yes | Yes | Yes | likely | Java | Apache, Roy Fielding, Day Software | Apache 2.0 |
Berkeley DB/Dbm 1.x | Key-value | Yes | No | No | No | No | C | old school | Various |
Berkeley DB Sleepycat/Oracle Berkeley DB 5.x | Key-value | Yes | Yes | Yes | Yes | No | C, C++, or Java | dbm, Sleepycat/Oracle | dual GPL-like Sleepycat License |
Apache Cassandra | Key-value | Yes | Yes | Distributed | Partial Only supports CAS (Check And Set) after 2.1.1 and later [5] [6] | Yes | Java | Dynamo and Bigtable, Facebook/Digg/Rackspace | Apache 2.0 |
ClustrixDB | scale-out relational | Yes | Yes | Distributed and Replication | Yes | No | C | Clustrix | Proprietary |
Coherence | Key-value | Persistent data typically in an RDBMS | Yes | Yes | Yes | Yes | Java | Oracle (previously Tangosol) | Proprietary |
Oracle NoSQL Database | Key-value | Yes | Yes | Yes | Yes | No | Java | Oracle | AGPLv3 License or proprietary |
Couchbase | Document | Yes | Yes | Yes | Yes, with two-phase commits [7] | Yes | C++, Erlang, C, [8] Go | CouchDB, Memcached | Apache 2.0 |
CouchDB | Document | Yes | Yes | replication + load balancing | Atomicity is per document, per CouchDB instance [9] | No | Erlang | Lotus Notes / Ubuntu, Mozilla, IBM | Apache 2.0 |
Extensible Storage Engine(ESE/NT) | Document or Key-value | Yes | No | No | Yes | No | C++, Assembly | Microsoft | Proprietary |
FoundationDB | Ordered Key-value | Yes | Yes | Yes | Yes | Depends on user configuration | C++ | FoundationDB | Proprietary |
GT.M | Key-value | Yes | Yes | Yes | Yes | Depends on user configuration | C (small bits of assembly language) | FIS | AGPL v3 |
Project Name | Type | Persistence | Replication | High Availability | Transactions | Rack-locality Awareness | Implementation Language | Influences, Sponsors | License |
Apache HBase | Key-value | Yes. Major version upgrades require re-import. | Yes HDFS, [10] Amazon S3 [11] or Amazon Elastic Block Store. [12] | Yes [13] | Yes [14] | See HDFS, S3 or EBS. | Java | Bigtable | Apache 2.0 |
Information Management System IBM IMS aka DB1 | Key-value. Multi-level | Yes | Yes | Yes, with HALDB | Yes, with IMS TM | Unknown | Assembler | IBM since 1966 | Proprietary |
Infinispan | Key-value | Yes | Yes | Yes | Yes | Yes | Java | Red Hat | Apache 2.0 |
Memcached | Key-value | No | No | No | Partial Only supports CAS (Check And Set - or Compare And Swap) [15] [16] | No | C | Six Apart/Couchbase/Fotolog/Facebook | BSD-like permissive copyright by Danga |
LevelDB | Key-value, Bigtable | Yes | No | No | Partial Multiple writes can be combined into single operation | No | C++ | New BSD License | |
LightningDB | Key-value, memory-mapped files | Yes | No | No | Yes, ACID, MVCC | No | C | Symas | OpenLDAP Public License |
MongoDB | Document (JSON) | Yes | Yes | fail-over | Partial Single document atomicity [17] | No | C++ | 10gen | GNU AGPL v3.0 |
Neo4j | Graph database | Yes | Yes | Yes | Yes | No | Java | Neo Technology | GNU GPL v3.0 |
OrientDB | Multi-Model (Graph-Document-Object-Key/Value) | Yes | Yes [18] | Yes [19] | Yes [20] | Yes | Java | Orient Technologies | Apache 2.0 |
Redis | Key-value | Yes. But last few queries can be lost. [21] | Yes | Yes [22] | Yes [23] | No | Ansi-C | VMWare, Memcache | BSD |
ScyllaDB | Key-value | Yes | Yes | Distributed and Replication [24] | No [25] | Unknown | C++ | Apache Cassandra | AGPL v3 |
SimpleDB (Amazon.com) | Document & Key-value | Yes | Yes (automatic) | Yes | Unknown | likely | Erlang | Amazon.com | Amazon internal only |
Tarantool | Free-dimensional tuples with primary and secondary keys | Yes. (Asynchronous) | Yes | Yes | Yes | No | C, Lua [26] | Memcached, Mnesia, MySQL, Mail.ru | BSD |
Project Name | Type | Persistence | Replication | High Availability | Transactions | Rack-locality Awareness | Implementation Language | Influences, Sponsors | License |
A spatial database is a general-purpose database that has been enhanced to include spatial data that represents objects defined in a geometric space, along with tools for querying and analyzing such data.
Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.
HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS or Alluxio, providing Bigtable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.
Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.
Sector/Sphere is an open source software suite for high-performance distributed data storage and processing. It can be broadly compared to Google's GFS and MapReduce technology. Sector is a distributed file system targeting data storage over a large number of commodity computers. Sphere is the programming architecture framework that supports in-storage parallel data processing for data stored in Sector. Sector/Sphere operates in a wide area network (WAN) setting.
Redis is an open-source in-memory storage, used as a distributed, in-memory key–value database, cache and message broker, with optional durability. Because it holds all data in memory and because of its design, Redis offers low-latency reads and writes, making it particularly suitable for use cases that require a cache. Redis is the most popular NoSQL database, and one of the most popular databases overall. Redis is used in companies like Twitter, Airbnb, Tinder, Yahoo, Adobe, Hulu, Amazon and OpenAI.
Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.
Patrick Eugene O'Neil was an American computer scientist, an expert on databases, and a professor of computer science at the University of Massachusetts Boston.
Apache Hive is a data warehouse software project, built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditional SQL queries must be implemented in the MapReduce Java API to execute SQL applications and queries over distributed data. Hive provides the necessary SQL abstraction to integrate SQL-like queries into the underlying Java without the need to implement queries in the low-level Java API. Since most data warehousing applications work with SQL-based querying languages, Hive aids the portability of SQL-based applications to Hadoop. While initially developed by Facebook, Apache Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA). Amazon maintains a software fork of Apache Hive included in Amazon Elastic MapReduce on Amazon Web Services.
A cloud database is a database that typically runs on a cloud computing platform and access to the database is provided as-a-service. There are two common deployment models: users can run databases on the cloud independently, using a virtual machine image, or they can purchase access to a database service, maintained by a cloud database provider. Of the databases available on the cloud, some are SQL-based and some use a NoSQL data model.
Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and server-side programming mechanisms. According to DB-Engines ranking, Accumulo is the third most popular NoSQL wide column store behind Apache Cassandra and HBase and the 67th most popular database engine of any type (complete) as of 2018.
Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's Dremel system. Drill is an Apache top-level project. Tom Shiran is the founder of the Apache Drill Project. It was designated an Apache Software Foundation top-level project in December 2016.
Elliptics is a distributed key–value data storage with open source code. By default it is a classic distributed hash table (DHT) with multiple replicas put in different groups. Elliptics was created to meet requirements of multi-datacenter and physically distributed storage locations when storing huge amount of medium and large files.
Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase targeting big data transactional or operational workloads. The project was named after the Welsh word for transactions. As of April 2021, it is no longer actively developed.
The Yahoo! Cloud Serving Benchmark (YCSB) is an open-source specification and program suite for evaluating retrieval and maintenance capabilities of computer programs. It is often used to compare the relative performance of NoSQL database management systems.
A wide-column store is a column-oriented DBMS and therefore a special type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. A wide-column store can be interpreted as a two-dimensional key–value store. Google's Bigtable is one of the prototypical examples of a wide-column store.
Presto is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, Alluxio, MySQL, MongoDB and Teradata, and allows use of multiple data sources within a query. Presto is community-driven open-source software released under the Apache License.
JanusGraph is an open source, distributed graph database under The Linux Foundation. JanusGraph is available under the Apache License 2.0. The project is supported by IBM, Google, Hortonworks and Grakn Labs.
YugabyteDB is a high-performance transactional distributed SQL database for cloud-native applications, developed by Yugabyte.
{{cite journal}}
: Cite journal requires |journal=
(help){{cite journal}}
: Cite journal requires |journal=
(help)