Causal consistency

Last updated

Causal consistency is one of the major memory consistency models. In concurrent programming, where concurrent processes are accessing a shared memory, a consistency model restricts which accesses are legal. This is useful for defining correct data structures in distributed shared memory or distributed transactions.

Contents

Causal Consistency is “Available under Partition”, meaning that a process can read and write the memory (memory is Available) even while there is no functioning network connection (network is Partitioned) between processes; it is an asynchronous model. Contrast to strong consistency models, such as sequential consistency or linearizability, which cannot be both safe and live under partition, and are slow to respond because they require synchronisation.

Causal consistency was proposed in 1990s [1] as a weaker consistency model for shared memory models. Causal consistency is closely related to the concept of Causal Broadcast in communication protocols. [2] In these models, a distributed execution is represented as a partial order, based on Lamport's happened-before concept of potential causality. [3]

Causal consistency is a useful consistency model because it matches programmers' intuitions about time, is more available than strong consistency models, yet provides more useful guarantees than eventual consistency. For instance, in distributed databases, causal consistency supports the ordering of operations, in contrast to eventual consistency. [4] Also, causal consistency helps with the development of abstract data types such as queues or counters. [5]

Since time and ordering are so fundamental to our intuition, it is hard to reason about a system that does not enforce causal consistency. However, many distributed databases lack this guarantee, even ones that provide serialisability. [6] Spanner does guarantee causal consistency, but it also forces strong consistency, thus eschewing availability under partition. More available databases that ensure causal consistency include MongoDB and AntidoteDB.

Definition

Causal consistency captures the potential causal relationships between operations, and guarantees that all processes observe causally-related operations in a common order. In other words, all processes in the system agree on the order of the causally-related operations. They may disagree on the order of operations that are causally unrelated. [1]

Let us define the following relation. If some process performs a write operation A, and some (the same or another) process that observed A then performs a write operation B, then it is possible that A is the cause of B; we say that A “potentially causes” or “causally precedes” B. Causal Consistency guarantees that if A causally-precedes B, then every process in the system observes A before observing B. Conversely, two write operations C and D are said concurrent, or causally independent, if neither causally precedes the other. In this case, a process may observe either C before D, or D before C. The Causal Precedence relation in shared memory is related to the happened-before relation in message-based communication. [3]

Thus, a system provides causal consistency if this following condition holds: write operations that are related by potential causality are seen by each process of the system in their causal precedence order. Different processes may observe concurrent writes in different orders. [7]

The Causal Consistency model is weaker than sequential consistency, which ensures that all processes observe all write operations in common order, whether causally related or not. [8] However, causal consistency is stronger than PRAM consistency, which requires only the write operations that are done by a single process to be observed in common order by each other process. [9] It follows that when a system is sequentially consistent, it is also causally consistent. Additionally, causal consistency implies PRAM consistency, but not vice versa.

Example

Here is an example of causal consistency. [10]

Causal relations are respected in the following event sequence:

P1 :W(x)1W(x)3
P2 :R(x)1W(x)2
P3 :R(x)1R(x)3R(x)2
P4 :R(x)1R(x)2R(x)3

Process P2 observes, reads, the earlier write W(x)1 that is done by process P1. Therefore, the two writes W(x)1 and W(x)2 are causally related. Under causal consistency, every process observes W(x)1 first, before observing W(x)2. Notice that the two write operations W(x)2 and W(x)3, with no intervening read operations, are concurrent, and processes P3 and P4 observe (read) them in different orders.

Session Guarantees

The causal consistency model can be refined into four session guarantees. [11] They can be summarised as follows:

Transactional session guarantees for serialisability and snapshot isolation are presented by Daudjee and Salem. [12]

Implementation

The system is abstracted as a set of communicating processes. When a process writes into the shared memory, the implementation sends this event to the other processes (via shared memory or as a message). Because of concurrency and failures, a process might receive events in any order. The implementation delivers an event, i.e., makes it visible to the process, only if all the events that causally precede it have themselves been delivered. This requires the implementation to maintain meta-data that represents the causal relationships between memory accesses.

In brief, the implementation includes the following steps: (1) Maintain causal context meta-data at every process to summarise what updates causally precede the current state. (2) When a process updates memory, tag the update event with the causal context of that process, to summarise what updates causally precede this update. (3) A process that has received some update event may deliver it only if the event's tag causally precedes the causal context of the receiving process. (As a side effect of delivery, add the new event to the causal context of the receiving process.) Otherwise, the update was received too early, and must remain buffered until event matches the context. In the meantime, the implementation either passively waits to receive the missing events, or actively fetches them from their source.

This approach enables availability under partition. [13]

There are two common representations for the causal context meta-data. One is to maintain an explicit dependency graph of the causal dependence relation. Because such a graph can grow arbitrarily large, an event is often tagged with only its immediate predecessors; determining its transitive predecessors requires a distributed graph traversal. The other is to maintain a vector clock, with one entry per process (or group of processes), counting the number of events generated by the process or group. This representation has a fixed size, and the ordering of events can be inferred by a simple comparison of the vectors.

To precisely determine which events are dependent and which are concurrent in a fully peer-to-peer system, the size of the metadata is at least proportional to the number of active writers. [14] However, a precise determination of concurrency is generally overkill. Causal consistency requires only that causally-dependent events be delivered in order; it does not matter if two concurrent events end up being ordered. Therefore, the size can be decreased arbitrarily by using safe approximation techniques. [15] In the limit, a single scalar (a Lamport clock [3] ) suffices, at the cost of removing any concurrency. The size of metadata can also be decreased by restricting the communication topology; for instance, in a star, tree or linear topology, a single scalar suffices.

The search for efficient implementations of causal consistency is a very active research area.

Related Research Articles

Safe semantics is a computer hardware consistency model. It describes one type of guarantee that a data register provides when it is shared by several processors in a parallel computer or in a network of computers working together.

<span class="mw-page-title-main">Mutual exclusion</span> In computing, restricting data to be accessible by one thread at a time

In computer science, mutual exclusion is a property of concurrency control, which is instituted for the purpose of preventing race conditions. It is the requirement that one thread of execution never enters a critical section while a concurrent thread of execution is already accessing said critical section, which refers to an interval of time during which a thread of execution accesses a shared resource or shared memory.

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

In computer science, the happened-before relation is a relation between the result of two events, such that if one event should happen before another event, the result must reflect that, even if those events are in reality executed out of order. This involves ordering events based on the potential causal relationship of pairs of events in a concurrent system, especially asynchronous distributed systems. It was formulated by Leslie Lamport.

Memory coherence is an issue that affects the design of computer systems in which two or more processors or cores share a common area of memory.

<span class="mw-page-title-main">Race condition</span> When a systems behavior depends on timing of uncontrollable events

A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events. It becomes a bug when one or more of the possible behaviors is undesirable.

In computer science, an algorithm is called non-blocking if failure or suspension of any thread cannot cause failure or suspension of another thread; for some operations, these algorithms provide a useful alternative to traditional blocking implementations. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress, and wait-free if there is also guaranteed per-thread progress. "Non-blocking" was used as a synonym for "lock-free" in the literature until the introduction of obstruction-freedom in 2003.

<span class="mw-page-title-main">Concurrency (computer science)</span> Ability to execute a task in a non-serial manner

In computer science, concurrency is the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability of a program, algorithm, or problem into order-independent or partially-ordered components or units of computation.

<span class="mw-page-title-main">Linearizability</span> Property of some operation(s) in concurrent programming

In concurrent programming, an operation is linearizable if it consists of an ordered list of invocation and response events, that may be extended by adding response events such that:

  1. The extended list can be re-expressed as a sequential history.
  2. That sequential history is a subset of the original unextended list.
<span class="mw-page-title-main">Vector clock</span> Algorithm for partial ordering of events and detecting causality in distributed systems

A vector clock is a data structure used for determining the partial ordering of events in a distributed system and detecting causality violations. Just as in Lamport timestamps, inter-process messages contain the state of the sending process's logical clock. A vector clock of a system of N processes is an array/vector of N logical clocks, one clock per process; a local "largest possible values" copy of the global clock-array is kept in each process.

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Eventual consistency, also called optimistic replication, is widely deployed in distributed systems and has origins in early mobile computing projects. A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence. Eventual consistency is a weak guarantee – most stronger models, like linearizability, are trivially eventually consistent.

The Lamport timestamp algorithm is a simple logical clock algorithm used to determine the order of events in a distributed computer system. As different nodes or processes will typically not be perfectly synchronized, this algorithm is used to provide a partial ordering of events with minimal overhead, and conceptually provide a starting point for the more advanced vector clock method. The algorithm is named after its creator, Leslie Lamport.

Concurrent computing is a form of computing in which several computations are executed concurrently—during overlapping time periods—instead of sequentially—with one completing before the next starts.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems. OT was originally invented for consistency maintenance and concurrency control in collaborative editing of plain text documents. Its capabilities have been extended and its applications expanded to include group undo, locking, conflict resolution, operation notification and compression, group-awareness, HTML/XML and tree-structured document editing, collaborative office productivity tools, application-sharing, and collaborative computer-aided media design tools. In 2009 OT was adopted as a core technique behind the collaboration features in Apache Wave and Google Docs.

In distributed computing, a shared snapshot object is a type of data structure, which is shared between several threads or processes. For many tasks, it is important to have a data structure, that can provide a consistent view of the state of the memory. In practice, it turns out that it is not possible to get such a consistent state of the memory by just accessing one shared register after another, since the values stored in individual registers can be changed at any time during this process. To solve this problem, snapshot objects store a vector of n components and provide the following two atomic operations: update(i,v) changes the value in the ith component to v, and scan returns the values stored in all n components. Snapshot objects can be constructed using atomic single-writer multi-reader shared registers.

In distributed computing, shared-memory systems and message-passing systems are two means of interprocess communication which have been heavily studied. In shared-memory systems, processes communicate by accessing shared data structures. A shared (read/write) register, sometimes just called a register, is a fundamental type of shared data structure which stores a value and has two operations: read, which returns the value stored in the register, and write, which updates the value stored. Other types of shared data structures include read–modify–write, test-and-set, compare-and-swap etc. The memory location which is concurrently accessed is sometimes called a register.

Processor Consistency is one of the consistency models used in the domain of concurrent computing.

References

  1. 1 2 Ahamad, Mustaque; Neiger, Gil; Burns, James E.; Kohli, Prince; Hutto, Phillip W. (March 1995), "Causal memory: definitions, implementation, and programming", Distributed Computing, 9 (1): 37–49, doi:10.1007/bf01784241, hdl: 1853/6781 , S2CID   6435056
  2. Birman, Kenneth P.; Joseph, Thomas A. (January 1987), "Reliable communication in the presence of failures", ACM Transactions on Computer Systems, 5 (1): 47–76, doi:10.1145/7351.7478, hdl: 1813/6534 , S2CID   11224827
  3. 1 2 3 Lamport, Leslie (1978), "Time, clocks, and the ordering of events in a distributed system", Communications of the ACM, 21 (7): 558–565, doi: 10.1145/359545.359563 , S2CID   215822405
  4. Elbushra, Mawahib Musa; Lindström, Jan (2015), "Causal consistent databases", Open Journal of Databases, 2 (1): 17–35
  5. Perrin, Matthieu; Mostéfaoui, Achour; Jard, Claude (2016), "Causal consistency: beyond memory", in Asenjo, Rafael; Harris, Tim (eds.), Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2016, Barcelona, Spain, March 12-16, 2016, pp. 26:1–26:12, arXiv: 1603.04199 , doi:10.1145/2851141.2851170, S2CID   3010991
  6. Daudjee, Khuzaima; Salem, Kenneth (2004), "Lazy database replication with ordering guarantees", in Özsoyoglu, Z. Meral; Zdonik, Stanley B. (eds.), Proceedings of the 20th International Conference on Data Engineering, ICDE 2004, 30 March – 2 April 2004, Boston, MA, USA, IEEE Computer Society, pp. 424–435, doi:10.1109/ICDE.2004.1320016, S2CID   1850131
  7. Gogia, R., Chhabra, P., & Kumari, R. (2014). Consistency Models in Distributed Shared Memory Systems. International Journal of Computer Science and Mobile Computing,196-201
  8. Lamport, L. (1979). How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE transactions on computers, 100(9), 690-691.
  9. Lipton, R. J., & Sandberg, J. S. (1988). PRAM: A scalable shared memory. Princeton University, Department of Computer Science.Chicago
  10. Mosberger, D. (1993). Memory consistency models. ACM SIGOPS Operating Systems Review, 27(1), 18-26.
  11. Terry, D. B., Demers, A. J., Petersen, K., Spreitzer, M. J., Theimer, M. M., & Welch, B. B. (1994, September). Session guarantees for weakly consistent replicated data. In Parallel and Distributed Information Systems, 1994., Proceedings of the Third International Conference on (pp. 140-149). IEEE.
  12. K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. VLDB 2006.
  13. Carlos Baquero and Nuno Preguiça. Why Logical Clocks Are Easy. Comm. ACM 59(4), pp. 43–47, April 2016.
  14. Charron-Bost, Bernadette (July 1991), "Concerning the size of logical clocks in distributed systems", Information Processing Letters, 39 (1): 11–16, doi:10.1016/0020-0190(91)90055-m
  15. Torres-Rojas, Francisco J.; Ahamad, Mustaque (September 1999), "Plausible clocks: constant size logical clocks for distributed systems", Distributed Computing, 12 (4): 179–195, doi:10.1007/s004460050065, S2CID   2936350