PACELC theorem

Last updated
The tradeoff between availability, consistency and latency, as described by the PACELC theorem. PACELC theorem.png
The tradeoff between availability, consistency and latency, as described by the PACELC theorem.

In database theory, the PACELC theorem is an extension to the CAP theorem. It states that in case of network partitioning (P) in a distributed computer system, one has to choose between availability (A) and consistency (C) (as per the CAP theorem), but else (E), even when the system is running normally in the absence of partitions, one has to choose between latency (L) and loss of consistency (C).

Contents

Overview

PACELC builds on the CAP theorem. Both theorems describe how distributed databases have limitations and tradeoffs regarding consistency, availability, and partition tolerance. PACELC goes further and states that an additional trade-off exists: between latency and loss of consistency, even in absence of partitions, thus providing a more complete portrayal of the potential consistency trade-offs for distributed systems. [1]

A high availability requirement implies that the system must replicate data. As soon as a distributed system replicates data, a trade-off between consistency and latency arises.

The PACELC theorem was first described by Daniel Abadi from Yale University in 2010 in a blog post, [2] which he later clarified in a paper in 2012. [1] The purpose of PACELC is to address his thesis that "Ignoring the consistency/latency trade-off of replicated systems is a major oversight [in CAP], as it is present at all times during system operation, whereas CAP is only relevant in the arguably rare case of a network partition." The PACELC theorem was proved formally in 2018 in a SIGACT News article. [3]

Database PACELC ratings

[1] Original database PACELC ratings are from. [4] Subsequent updates contributed by wikipedia community.

DDBSP+AP+CE+LE+C
Aerospike [8] Yes check.svgpaid onlyoptionalYes check.svg
Bigtable/HBaseYes check.svgYes check.svg
CassandraYes check.svgYes check.svg [lower-alpha 1]
Cosmos DBYes check.svgYes check.svg [lower-alpha 2]
CouchbaseYes check.svgYes check.svgYes check.svg
DynamoYes check.svgYes check.svg [lower-alpha 1]
DynamoDBYes check.svgYes check.svgYes check.svg
FaunaDB [10] Yes check.svgYes check.svgYes check.svg
Hazelcast IMDG [6] [7] Yes check.svgYes check.svgYes check.svgYes check.svg
MegastoreYes check.svgYes check.svg
MongoDBYes check.svgYes check.svg
MySQL ClusterYes check.svgYes check.svg
PNUTSYes check.svgYes check.svg
PostgreSQLYes check.svgYes check.svgYes check.svgYes check.svg
RiakYes check.svgYes check.svg [lower-alpha 1]
SpiceDB [11] Yes check.svgYes check.svgYes check.svg
VoltDB/H-StoreYes check.svgYes check.svg

See also

Notes

  1. 1 2 3 Dynamo, Cassandra, and Riak have user-adjustable settings to control the LC tradeoff. [4]
  2. Cosmos DB has five selectable consistency levels to control the LC tradeoff. [9]

Related Research Articles

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

<span class="mw-page-title-main">Apache CouchDB</span> Document-oriented NoSQL database

Apache CouchDB is an open-source document-oriented NoSQL database, implemented in Erlang.

<span class="mw-page-title-main">Amazon SimpleDB</span> Cloud-based distributed database service

Amazon SimpleDB is a distributed database written in Erlang by Amazon.com. It is used as a web service in concert with Amazon Elastic Compute Cloud (EC2) and Amazon S3 and is part of Amazon Web Services. It was announced on December 13, 2007.

<span class="mw-page-title-main">Apache Cassandra</span> Free and open-source database management system

Apache Cassandra is a free and open-source, distributed, wide-column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers support for clusters spanning multiple data centers, with asynchronous masterless replication allowing low latency operations for all clients. Cassandra was designed to implement a combination of Amazon's Dynamo distributed storage and replication techniques combined with Google's Bigtable data and storage engine model.

A database shard, or simply a shard, is a horizontal partition of data in a database or search engine. Each shard is held on a separate database server instance, to spread load.

NoSQL is an approach to database design that focuses on providing a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Instead of the typical tabular structure of a relational database, NoSQL databases house data within one data structure. Since this non-relational database design does not require a schema, it offers rapid scalability to manage large and typically unstructured data sets. NoSQL systems are also sometimes called "Not only SQL" to emphasize that they may support SQL-like query languages or sit alongside SQL databases in polyglot-persistent architectures.

<span class="mw-page-title-main">CAP theorem</span> Need to sacrifice consistency or availability in the presence of network partitions

In database theory, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer, states that any distributed data store can provide only two of the following three guarantees:

Volt Active Data is an in-memory database designed by Michael Stonebraker, Sam Madden, and Daniel Abadi.

<span class="mw-page-title-main">Couchbase Server</span> Open-source NoSQL database

Couchbase Server, originally known as Membase, is a source-available, distributed multi-model NoSQL document-oriented database software package optimized for interactive applications. These applications may serve many concurrent users by creating, storing, retrieving, aggregating, manipulating and presenting data. In support of these kinds of application needs, Couchbase Server is designed to provide easy-to-scale key-value, or JSON document access, with low latency and high sustainability throughput. It is designed to be clustered from a single machine to very large-scale deployments spanning many machines.

H-Store is an experimental database management system (DBMS). It was designed for online transaction processing applications. H-Store was developed by a team at Brown University, Carnegie Mellon University, the Massachusetts Institute of Technology, and Yale University in 2007 by researchers Michael Stonebraker, Sam Madden, Andy Pavlo and Daniel Abadi.

<span class="mw-page-title-main">Amazon DynamoDB</span> NoSQL database service

Amazon DynamoDB is a fully managed proprietary NoSQL database offered by Amazon.com as part of the Amazon Web Services portfolio. DynamoDB offers a fast persistent key–value datastore with built-in support for replication, autoscaling, encryption at rest, and on-demand backup among other features.

<span class="mw-page-title-main">Spanner (database)</span> Cloud-based distributed SQL DBMS service

Spanner is a distributed SQL database management and storage service developed by Google. It provides features such as global transactions, strongly consistent reads, and automatic multi-site replication and failover. Spanner is used in Google F1, the database for its advertising business Google Ads, as well as Gmail and Google Photos.

<span class="mw-page-title-main">Oracle NoSQL Database</span> Distributed database

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

NewSQL is a class of relational database management systems that seek to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads while maintaining the ACID guarantees of a traditional database system.

Aerospike Database is a real-time, high performance NoSQL database. Designed for applications that cannot experience any downtime and require high read & write throughput. Aerospike is optimized to run on NVMe SSDs capable of efficiently storing large datasets. Aerospike can also be deployed as a fully in-memory cache database. Aerospike offers Key-Value, JSON Document, and Graph data models. Aerospike is open source distributed NoSQL database management system, marketed by the company also named Aerospike.

<span class="mw-page-title-main">Key–value database</span> Data storage paradigm

A key–value database, or key–value store, is a data storage paradigm designed for storing, retrieving, and managing associative arrays, and a data structure more commonly known today as a dictionary or hash table. Dictionaries contain a collection of objects, or records, which in turn have many different fields within them, each containing data. These records are stored and retrieved using a key that uniquely identifies the record, and is used to find the data within the database.

Azure Cosmos DB is a globally distributed, multi-model database service offered by Microsoft. It is designed to provide high availability, scalability, and low-latency access to data for modern applications. Unlike traditional relational databases, Cosmos DB is a NoSQL and vector database, which means it can handle unstructured, semi-structured, structured, and vector data types.

A distributed SQL database is a single relational database which replicates data across multiple servers. Distributed SQL databases are strongly consistent and most support consistency across racks, data centers, and wide area networks including cloud availability zones and cloud geographic zones. Distributed SQL databases typically use the Paxos or Raft algorithms to achieve consensus across multiple nodes.

Daniel Abadi is the Darnell-Kanal Professor of Computer Science at University of Maryland, College Park. His primary area of research is database systems, with contributions to stream databases, distributed databases, graph databases, and column-store databases. He helped create C-Store, a column-oriented database, and HadoopDB, a hybrid of relational databases and Hadoop. Both database systems were commercialized by companies.

References

  1. 1 2 3 4 Abadi, Daniel J. "Consistency Tradeoffs in Modern Distributed Database System Design" (PDF). Yale University.
  2. Abadi, Daniel J. (2010-04-23). "DBMS Musings: Problems with CAP, and Yahoo's little known NoSQL system" . Retrieved 2016-09-11.
  3. Golab, Wojciech (2018). "Proving PACELC". ACM SIGACT News. 49 (1): 73–81. doi:10.1145/3197406.3197420. S2CID   3989621.
  4. 1 2 3 Abadi, Daniel J.; Murdopo, Arinto (2012-04-17). "Consistency Tradeoffs in Modern Distributed Database System Design" . Retrieved 2022-07-18.
  5. "Global tables - multi-Region replication for DynamoDB". AWS Documentation. Retrieved 4 January 2023.
  6. 1 2 Abadi, Daniel (2017-10-08). "DBMS Musings: Hazelcast and the Mythical PA/EC System". DBMS Musings. Retrieved 2017-10-20.
  7. 1 2 "Hazelcast IMDG Reference Manual". docs.hazelcast.org. Retrieved 2020-09-17.
  8. Porter, Kevin (29 March 2023). "Where does aerospike fall in PACELC?". Aerospike Community Forum. Retrieved 30 March 2023.
  9. "Consistency Levels in Azure Cosmos DB" . Retrieved 2021-06-21.
  10. Abadi, Daniel (2018-09-21). "DBMS Musings: NewSQL database systems are failing to guarantee consistency, and I blame Spanner". DBMS Musings. Retrieved 2019-02-23.
  11. Zelinskie, Jimmy (2024-04-23). "SpiceDB Concepts: Consistency". SpiceDB documentation. Retrieved 2024-05-02.