Optimistic replication

Last updated August 24, 2023

Optimistic replication, also known as lazy replication,^[1]^[2] is a strategy for replication, in which replicas are allowed to diverge.^[3]

Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along. Optimistic replication does away with this in favor of eventual consistency, meaning that replicas are guaranteed to converge only when the system has been quiesced for a period of time. As a result, there is no longer a need to wait for all of the copies to be synchronized when updating data, which helps concurrency and parallelism. The trade-off is that different replicas may require explicit reconciliation later on, which might then prove difficult or even insoluble.

Algorithms

An optimistic replication algorithm consists of five elements:

Operation submission: Users submit operations at independent sites.
Propagation: Each site shares the operations it knows about with the rest of the system.
Scheduling: Each site decides on an order for the operations it knows about.
Conflict resolution: If there are any conflicts among the operations a site has scheduled, it must modify them in some way.
Commitment: The sites agree on a final schedule and conflict resolution result, and the operations are made permanent.

There are two strategies for propagation: state transfer, where sites propagate a representation of the current state, and operation transfer, where sites propagate the operations that were performed (essentially, a list of instructions on how to reach the new state).

Scheduling and conflict resolution can either be syntactic or semantic. Syntactic systems rely on general information, such as when or where an operation was submitted. Semantic systems are able to make use of application-specific information to make smarter decisions. Note that state transfer systems generally have no information about the semantics of the data being transferred, and so they have to use syntactic scheduling and conflict resolution.

Examples

One well-known example of a system based on optimistic replication is the CVS version control system, or any other version control system which uses the copy-modify-merge paradigm. CVS covers each of the five elements:

Operation submission: Users edit local versions of files.
Propagation: Users manually pull updates from a central server, or push changes out once the user feels they are ready.
Scheduling: Operations are scheduled in the order that they are received by the central server.
Conflict resolution: When a user pushes to or pulls from the central repository, any conflicts will be flagged for that user to fix manually.
Commitment: Once the central server accepts the changes which a user pushes, they are permanently committed.

A special case of replication is synchronization, where there are only two replicas. For example, personal digital assistants (PDAs) allow users to edit data either on the PDA or a computer, and then to merge these two datasets together. Note, however, that replication is a broader problem than synchronization, since there may be more than two replicas.

Other examples include:

Usenet, and other systems which use the Thomas Write Rule (See Rfc677)
Multi-master database replication ^[4]
The Coda distributed filesystem
Operational Transformation, a theoretical framework for group editing
Peer-to-peer wikis
Conflict-free replicated data types
The Bayou^[5] distributed database
IceCube^[6]

Implications

Applications built on top of optimistic replicated databases need to be careful about ensuring that the delayed updates observed do not impair the correctness of the application.

As a simple example, if an application contains a way of viewing some part of the database state, and a way of editing it, then users may edit that state but then not see it changing in the viewer. Alarmed that their edit "didn't work", they may try it again, potentially more than once. If the updates are not idempotent (e.g., they increment a value), this can lead to disaster. Even if they are idempotent, the spurious database updates can lead to performance bottlenecks, especially when the database systems are processing heavy loads; this can become a vicious circle.

Testing of applications is often done on a testing environment, smaller in size (perhaps only a single server) and less loaded than the "live" environment. The replication behaviour of such an installation may differ from a live environment in ways that mean that replication lag is unlikely to be observed in testing, masking replication-sensitive bugs. Application developers must be very careful about the assumptions they make about the effect of a database update, and must be sure to simulate lag in their testing environments.

Optimistically replicated databases have to be very careful about offering features such as validity constraints on data. If any given update may or may not be accepted based on the current state of the record, then two updates (A and B) may be individually legal against the starting state of the system, but one or more of the updates may not be legal against the state of the system after the other update (e.g., A and B are both legal, but AB or BA are illegal). If A and B are both initiated at roughly the same time within the database, then A may be successfully applied on some nodes and B on others, but as soon as A and B "meet" and one is attempted on a node which has already applied the other, a conflict will be found. The system must, in this case, decide which update finally "wins", and arrange for any nodes that have already applied the losing update to revert it. However, some nodes may temporarily expose the state with the reverted update, and there may be no way to inform the user who initiated the update of its failure, without requiring them to wait (potentially forever) for confirmation of acceptance at every node.

Related Research Articles

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Optimistic concurrency control (OCC), also known as optimistic locking, is a concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted. Optimistic concurrency control was first proposed in 1979 by H. T. Kung and John T. Robinson.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

Barbara Liskov is an American computer scientist who has made pioneering contributions to programming languages and distributed computing. Her notable work includes the development of the Liskov substitution principle which describes the fundamental nature of data abstraction, and is used in type theory and in object-oriented programming. Her work was recognized with the 2008 Turing Award, the highest distinction in computer science.

Eventual consistency is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value. Eventual consistency, also called optimistic replication, is widely deployed in distributed systems and has origins in early mobile computing projects. A system that has achieved eventual consistency is often said to have converged, or achieved replica convergence. Eventual consistency is a weak guarantee – most stronger models, like linearizability, are trivially eventually consistent.

Multi-master replication is a method of database replication which allows data to be stored by a group of computers, and updated by any member of the group. All members are responsive to client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group and resolving any conflicts that might arise between concurrent changes made by different members.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

In concurrency control of databases, transaction processing, and various transactional applications, both centralized and distributed, a transaction schedule is serializable if its outcome is equal to the outcome of its transactions executed serially, i.e. without overlapping in time. Transactions are normally executed concurrently, since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control. As such it is supported in all general purpose database systems. Strong strict two-phase locking (SS2PL) is a popular serializability mechanism utilized in most of the database systems since their early days in the 1970s.

In computer science, state machine replication (SMR) or state machine approach is a general method for implementing a fault-tolerant service by replicating servers and coordinating client interactions with server replicas. The approach also provides a framework for understanding and designing replication management protocols.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

Operational transformation (OT) is a technology for supporting a range of collaboration functionalities in advanced collaborative software systems. OT was originally invented for consistency maintenance and concurrency control in collaborative editing of plain text documents. Its capabilities have been extended and its applications expanded to include group undo, locking, conflict resolution, operation notification and compression, group-awareness, HTML/XML and tree-structured document editing, collaborative office productivity tools, application-sharing, and collaborative computer-aided media design tools. In 2009 OT was adopted as a core technique behind the collaboration features in Apache Wave and Google Docs.

In concurrency control of databases, transaction processing, and other transactional distributed applications, global serializability is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database schedules in a multidatabase environment. Complying with global serializability means that the global schedule is serializable, has the serializability property, while each component database (module) has a serializable schedule as well. In other words, a collection of serializable components provides overall system serializability, which is usually incorrect. A need in correctness across databases in multidatabase systems makes global serializability a major goal for global concurrency control. With the proliferation of the Internet, Cloud computing, Grid computing, and small, portable, powerful computing devices, as well as increase in systems management sophistication, the need for atomic distributed transactions and thus effective global serializability techniques, to ensure correctness in and among distributed transactional applications, seems to increase.

A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before), followed it, or if the two updates happened concurrently. In this way, version vectors enable causality tracking among data replicas and are a basic mechanism for optimistic replication. In mathematical terms, the version vector generates a preorder that tracks the events that precede, and may therefore influence, later updates.

A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a specific software subset of the global aggregate operating system. Each subset is a composite of two distinct service provisioners. The first is a ubiquitous minimal kernel, or microkernel, that directly controls that node's hardware. Second is a higher-level collection of system management components that coordinate the node's individual and collaborative activities. These components abstract microkernel functions and support user applications.

A data grid is an architecture or set of services that gives individuals or groups of users the ability to access, modify and transfer extremely large amounts of geographically distributed data for research purposes. Data grids make this possible through a host of middleware applications and services that pull together data and resources from multiple administrative domains and then present it to users upon request. The data in a data grid can be located at a single site or multiple sites where each site can be its own administrative domain governed by a set of security restrictions as to who may access the data. Likewise, multiple replicas of the data may be distributed throughout the grid outside their original administrative domain and the security restrictions placed on the original data for who may access it must be equally applied to the replicas. Specifically developed data grid middleware is what handles the integration between users and the data they request by controlling access while making it available as efficiently as possible. The adjacent diagram depicts a high level view of a data grid.

<span class="mw-page-title-main">Oracle NoSQL Database</span>

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Liuba Shrira is a professor of computer science at Brandeis University, whose research interests primarily involve distributed systems.

A distributed file system for cloud is a file system that allows many clients to have access to data and supports operations on that data. Each data file may be partitioned into several parts called chunks. Each chunk may be stored on different remote machines, facilitating the parallel execution of applications. Typically, data is stored in files in a hierarchical tree, where the nodes represent directories. There are several ways to share files in a distributed architecture: each solution must be suitable for a certain type of application, depending on how complex the application is. Meanwhile, the security of the system must be ensured. Confidentiality, availability and integrity are the main keys for a secure system.

In distributed computing, a conflict-free replicated data type (CRDT) is a data structure that is replicated across multiple computers in a network, with the following features:

The application can update any replica independently, concurrently and without coordinating with other replicas.
An algorithm automatically resolves any inconsistencies that might occur.
Although replicas may have different state at any particular point in time, they are guaranteed to eventually converge.

References

↑ Ladin, R.; Liskov, B.; Shrira, L.; Ghemawat, S. (1992). "Providing high availability using lazy replication". ACM Transactions on Computer Systems. 10 (4): 360–391. CiteSeerX 10.1.1.586.7749 . doi:10.1145/138873.138877. S2CID 2219840.
↑ Ladin, R.; Liskov, B.; Shrira, L. (1990). Lazy replication: exploiting the semantics of distributed services. Proceedings of the Ninth Annual ACM Symposium on Principles of Distributed Computing. pp. 43–57. doi: 10.1145/93385.93399 .
↑ Saito, Yasushi; Shapiro, Marc (2005). "Optimistic replication". ACM Computing Surveys . 37 (1): 42–81. CiteSeerX 10.1.1.324.3599 . doi:10.1145/1057977.1057980. S2CID 1503367.
↑ Gray, J.; Helland, P.; O’Neil, P.; Shasha, D. (1996). The dangers of replication and a solution (PDF). Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. pp. 173–182. doi:10.1145/233269.233330.^{[ permanent dead link ]}
↑ Terry, D.B.; Theimer, M.M.; Petersen, K.; Demers, A.J.; Spreitzer, M.J.; Hauser, C.H. (1995). Managing update conflicts in Bayou, a weakly connected replicated storage system. Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles. pp. 172–182. doi: 10.1145/224056.224070 .
↑ Kermarrec, A.M.; Rowstron, A.; Shapiro, M.; Druschel, P. (2001). The IceCube approach to the reconciliation of divergent replicas. Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing. pp. 210–218. doi:10.1145/383962.384020.

External links

Saito, Yasushi; Shapiro, Marc (September 2003). "Optimistic Replication" (PDF). Microsoft.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Ladin1992-1] Ladin, R.; Liskov, B.; Shrira, L.; Ghemawat, S. (1992). "Providing high availability using lazy replication". ACM Transactions on Computer Systems. 10 (4): 360–391. CiteSeerX 10.1.1.586.7749 . doi:10.1145/138873.138877. S2CID 2219840.

[Ladin1990-2] Ladin, R.; Liskov, B.; Shrira, L. (1990). Lazy replication: exploiting the semantics of distributed services. Proceedings of the Ninth Annual ACM Symposium on Principles of Distributed Computing. pp. 43–57. doi: 10.1145/93385.93399 .

[saito2005-3] Saito, Yasushi; Shapiro, Marc (2005). "Optimistic replication". ACM Computing Surveys . 37 (1): 42–81. CiteSeerX 10.1.1.324.3599 . doi:10.1145/1057977.1057980. S2CID 1503367.

[Gray1996-4] Gray, J.; Helland, P.; O’Neil, P.; Shasha, D. (1996). The dangers of replication and a solution (PDF). Proceedings of the 1996 ACM SIGMOD International Conference on Management of Data. pp. 173–182. doi:10.1145/233269.233330.^{[ permanent dead link ]}

[Terry1995-5] Terry, D.B.; Theimer, M.M.; Petersen, K.; Demers, A.J.; Spreitzer, M.J.; Hauser, C.H. (1995). Managing update conflicts in Bayou, a weakly connected replicated storage system. Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles. pp. 172–182. doi: 10.1145/224056.224070 .

[Kermarrec2001-6] Kermarrec, A.M.; Rowstron, A.; Shapiro, M.; Druschel, P. (2001). The IceCube approach to the reconciliation of divergent replicas. Proceedings of the Twentieth Annual ACM Symposium on Principles of Distributed Computing. pp. 210–218. doi:10.1145/383962.384020.

[1]

[2]

[3]

[4]

[5]

[6]