Atomic commit

Last updated

In the field of computer science, an atomic commit is an operation that applies a set of distinct changes as a single operation. If the changes are applied, then the atomic commit is said to have succeeded. If there is a failure before the atomic commit can be completed, then all of the changes completed in the atomic commit are reversed. This ensures that the system is always left in a consistent state. The other key property of isolation comes from their nature as atomic operations. Isolation ensures that only one atomic commit is processed at a time. The most common uses of atomic commits are in database systems and version control systems.

Contents

The problem with atomic commits is that they require coordination between multiple systems. [1] As computer networks are unreliable services, this means no algorithm can coordinate with all systems as proven in the Two Generals Problem. As databases become more and more distributed, this coordination will increase the difficulty of making truly atomic commits. [2]

Usage

Atomic commits are essential for multi-step updates to data. This can be clearly shown in a simple example of a money transfer between two checking accounts. [3]

This example is complicated by a transaction to check the balance of account Y during a transaction for transferring 100 dollars from account X to Y. To start, first 100 dollars is removed from account X. Second, 100 dollars is added to account Y. If the entire operation is not completed as one atomic commit, then several problems could occur. If the system fails in the middle of the operation, after removing the money from X and before adding into Y, then 100 dollars has just disappeared. Another issue is if the balance of Y is checked before the 100 dollars is added, the wrong balance for Y will be reported.

With atomic commits neither of these cases can happen, in the first case of the system failure, the atomic commit would be rolled back and the money returned to X. In the second case, the request of the balance of Y cannot occur until the atomic commit is fully completed.

Database systems

Atomic commits in database systems fulfil two of the key properties of ACID, [4] atomicity and consistency. Consistency is only achieved if each change in the atomic commit is consistent.

As shown in the example atomic commits are critical to multistep operations in databases. Due to modern hardware design of the physical disk on which the database resides true atomic commits cannot exist. The smallest area that can be written to on disk is known as a sector. A single database entry may span several different sectors. Only one sector can be written at a time. This writing limit is why true atomic commits are not possible. After the database entries in memory have been modified they are queued up to be written to disk. This means the same problems identified in the example have reoccurred. Any algorithmic solution to this problem will still encounter the Two Generals’ Problem. The two-phase commit protocol and three-phase commit protocol attempt to solve this and some of the other problems associated with atomic commits.

The two-phase commit protocol requires a coordinator to maintain all the information needed to recover the original state of the database if something goes wrong. As the name indicates there are two phases, voting and commit.

During the voting phase each node writes the changes in the atomic commit to its own disk. The nodes then report their status to the coordinator. If any node does not report to the coordinator or their status message is lost the coordinator assumes the node's write failed. Once all of the nodes have reported to the coordinator the second phase begins.

During the commit phase the coordinator sends a commit message to each of the nodes to record in their individual logs. Until this message is added to a node's log, any changes made will be recorded as incomplete. If any of the nodes reported a failure the coordinator will instead send a rollback message. This will remove any changes the nodes have written to disk. [5] [6]

The three-phase commit protocol seeks to remove the main problem with the two phase commit protocol, which occurs if a coordinator and another node fail at the same time during the commit phase neither can tell what action should occur. To solve this problem a third phase is added to the protocol. The prepare to commit phase occurs after the voting phase and before the commit phase.

In the voting phase, similar to the two-phase commit, the coordinator requests that each node is ready to commit. If any node fails the coordinator will timeout while waiting for the failed node. If this happens the coordinator sends an abort message to every node. The same action will be undertaken if any of the nodes return a failure message.

Upon receiving success messages from each node in the voting phase the prepare to commit phase begins. During this phase the coordinator sends a prepare message to each node. Each node must acknowledge the prepare message and reply. If any reply is missed or any node return that they are not prepared then the coordinator sends an abort message. Any node that does not receive a prepare message before the timeout expires aborts the commit.

After all nodes have replied to the prepare message then the commit phase begins. In this phase the coordinator sends a commit message to each node. When each node receives this message it performs the actual commit. If the commit message does not reach a node due to the message being lost or the coordinator fails they will perform the commit if the timeout expires. If the coordinator fails upon recovery it will send a commit message to each node. [7]

Revision control

Atomic commits are a common feature of version control software, and crucial for maintaining a consistent state in the repository. [8] Most version control software will not apply any part of a commit that fails. Notable exceptions are CVS, VSS and IBM Rational ClearCase (when in UCM mode). [9]

For instance, if version control software encounters a merge conflict that can not be automatically resolved, then no part of the changeset is merged. Instead, the developer is given an opportunity to either revert their changes or manually resolve the conflict.

This prevents the entire project from entering a broken state due to a partially applied change set, where one file from a commit is successfully committed, but another file with dependent changes fails. [10]

Atomic commits may also refer to the ability to simultaneously make changes across multiple projects using version control software in a single operation, using a version control software development strategy known as a monorepo. [8]

Atomic commit convention

When using a revision control systems a common convention is to use small commits. These are sometimes referred to as atomic commits as they (ideally) only affect a single aspect of the system. These atomic commits allow for greater understandability, less effort to roll back changes, easier bug identification. [11]

The greater understandability comes from the small size and focused nature of the commit. It is much easier to understand what is changed and reasoning behind the changes if one is only looking for one kind of change. This becomes especially important when making format changes to the source code. If format and functional changes are combined it becomes very difficult to identify useful changes. Imagine if the spacing in a file is changed from using tabs to three spaces every tab in the file will show as having been changed. This becomes critical if some functional changes are also made as a reviewer may simply not see the functional changes. [12] [13]

If only atomic commits are made then commits that introduce errors become much simpler to identify. One need not look through every commit to see if it was the cause of the error, only the commits dealing with that functionality need to be examined. If the error is to be rolled back, atomic commits again make the job much simpler. Instead of having to revert to the offending revision and remove the changes manually before integrating any later changes; the developer can simply revert any changes in the identified commit. This also reduces the risk of a developer accidentally removing unrelated changes that happened to be in the same commit.

Atomic commits also allow bug fixes to be easily reviewed if only a single bug fix is committed at a time. Instead of having to check multiple potentially unrelated files the reviewer must only check files and changes that directly impact the bug being fixed. This also means that bug fixes can be easily packaged for testing as only the changes that fix the bug are in the commit.

See also

Related Research Articles

Concurrent Versions System is a revision control system originally developed by Dick Grune in July 1986.

In software engineering, version control is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections of information. Version control is a component of software configuration management.

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

<span class="mw-page-title-main">Apache Subversion</span> Free and open-source software versioning and revision control system

Apache Subversion is a software versioning and revision control system distributed as open source under the Apache License. Software developers use Subversion to maintain current and historical versions of files such as source code, web pages, and documentation. Its goal is to be a mostly compatible successor to the widely used Concurrent Versions System (CVS).

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in case of failures, including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.

In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction. This protocol achieves its goal even in many cases of temporary system failure, and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.

In computer science and networking in particular, a session is a time-delimited two-way link, a practical layer in the TCP/IP protocol enabling interactive expression and information exchange between two or more communication devices or ends – be they computers, automated systems, or live active users. A session is established at a certain point in time, and then ‘torn down’ - brought to an end - at some later point. An established communication session may involve more than one message in each direction. A session is typically stateful, meaning that at least one of the communicating parties needs to hold current state information and save information about the session history to be able to communicate, as opposed to stateless communication, where the communication consists of independent requests with responses.

<span class="mw-page-title-main">Git</span> Software for version control of files

Git is a distributed version control system that tracks changes in any set of computer files, usually used for coordinating work among programmers who are collaboratively developing source code during software development. Its goals include speed, data integrity, and support for distributed, non-linear workflows.

In computer networking and databases, the three-phase commit protocol (3PC) is a distributed algorithm which lets all nodes in a distributed system agree to commit a transaction. It is a more failure-resilient refinement of the two-phase commit protocol (2PC).

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors. Distributed algorithms are used in different application areas of distributed computing, such as telecommunications, scientific computing, distributed information processing, and real-time process control. Standard problems solved by distributed algorithms include leader election, consensus, distributed search, spanning tree generation, mutual exclusion, and resource allocation.

Commitment ordering (CO) is a class of interoperable serializability techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation of multi-core processors, CO has also been increasingly utilized in concurrent programming, transactional memory, and software transactional memory (STM) to achieve serializability optimistically. CO is also the name of the resulting transaction schedule (history) property, defined in 1988 with the name dynamic atomicity. In a CO compliant schedule, the chronological order of commitment events of transactions is compatible with the precedence order of the respective transactions. CO is a broad special case of conflict serializability and effective means to achieve global serializability across any collection of database systems that possibly use different concurrency control mechanisms.

In software development, version control is a class of systems responsible for managing changes to computer programs or other collections of information such that revisions have a logical and consistent organization. The following tables include general and technical information on notable version control and software configuration management (SCM) software. For SCM software not suitable for source code, see Comparison of open-source configuration management software.

A fundamental problem in distributed computing and multi-agent systems is to achieve overall system reliability in the presence of a number of faulty processes. This often requires coordinating processes to reach consensus, or agree on some data value that is needed during computation. Example applications of consensus include agreeing on what transactions to commit to a database in which order, state machine replication, and atomic broadcasts. Real-world applications often requiring consensus include cloud computing, clock synchronization, PageRank, opinion formation, smart power grids, state estimation, control of UAVs, load balancing, blockchain, and others.

Paxos is a family of protocols for solving consensus in a network of unreliable or fallible processors. Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communications may experience failures.

A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

In version control software, a changeset is a set of alterations packaged together, along with meta-information about the alterations. A changeset describes the exact differences between two successive versions in the version control system's repository of changes. Changesets are typically treated as an atomic unit, an indivisible set, by version control systems. This is one synchronization model.

A VMScluster, originally known as a VAXcluster, is a computer cluster involving a group of computers running the OpenVMS operating system. Whereas tightly coupled multiprocessor systems run a single copy of the operating system, a VMScluster is loosely coupled: each machine runs its own copy of OpenVMS, but the disk storage, lock manager, and security domain are all cluster-wide, providing a single system image abstraction. Machines can join or leave a VMScluster without affecting the rest of the cluster. For enhanced availability, VMSclusters support the use of dual-ported disks connected to two machines or storage controllers simultaneously.

In version control systems, a repository is a data structure that stores metadata for a set of files or directory structure. Depending on whether the version control system in use is distributed, like Git or Mercurial, or centralized, like Subversion, CVS, or Perforce, the whole set of information in the repository may be duplicated on every user's system or may be maintained on a single server. Some of the metadata that a repository contains includes, among other things, a historical record of changes in the repository, a set of commit objects, and a set of references to commit objects, called heads.

Gbcast is a reliable multicast protocol that provides ordered, fault-tolerant (all-or-none) message delivery in a group of receivers within a network of machines that experience crash failure. The protocol is capable of solving Consensus in a network of unreliable processors, and can be used to implement state machine replication. Gbcast can be used in a standalone manner, or can support the virtual synchrony execution model, in which case Gbcast is normally used for group membership management while other, faster, protocols are often favored for routine communication tasks.

References

  1. Bocchi, Wischik (2004). A Process Calculus of Atomic Commit.
  2. Garcia-Molina, Hector; Ullman, Jeff; Widom, Jennifer (2009). Database Systems The Complete Book . Prentice Hall. pp.  1008–1009. ISBN   9780131873254.
  3. Garcia-Molina, Hector; Ullman, Jeff; Widom, Jennifer (2009). Database Systems The Complete Book . Prentice Hall. p.  299. ISBN   9780131873254.
  4. Elmasri, Ramez (2006). Fundamentals of Database Systems 5th Edition. Addison Wesley. p. 620.
  5. Elmasri, Ramez (2006). Fundamentals of Database Systems 5th Edition. Addison Wesley. p. 688.
  6. Bernstein, Philip A.; Hadzilacos, Vassos; Goodman, Nathan (1987). "Chapter 7". Concurrency Control and Recovery in Database Systems. Addison Wesley Publishing Company.
  7. Gaddam, Srinivas R. Three-Phase Commit Protocol.
  8. 1 2 Levenberg, Rachel Potvin, Josh (July 2016). "Why Google Stores Billions of Lines of Code in a Single Repository". Communications of the ACM. Retrieved 20 July 2018.{{cite web}}: CS1 maint: multiple names: authors list (link)
  9. Smart, John Ferguson (2008). Java Power Tools. "O'Reilly Media, Inc.". p. 301. ISBN   9781491954546 . Retrieved 26 July 2018.
  10. Vesperman, Jennifer (2009). Essential CVS (2nd ed.). Sebastopol: O'Reilly Media, Inc. p. 7. ISBN   9780596551407. A feature that CVS doesn't have, and that many teams like, is atomic commits. This feature ensures that while one person is committing changes to the repository, no one else can. Thus, each commit is a separate process, and the repository is never in a state where it has mismatched files.
  11. "Subversion Best Practices". Apache.
  12. Barney, Boisvert. Atomic Commits to Version Control.
  13. "The Benefits of Small Commits". Conifer Systems.