Distributed transaction

Last updated

A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while a transaction manager creates and manages a global transaction that encompasses all operations against such resources. Distributed transactions, as any other transactions, must have all four ACID (atomicity, consistency, isolation, durability) properties, where atomicity guarantees all-or-nothing outcomes for the unit of work (operations bundle).

Contents

The Open Group, a vendor consortium, proposed the X/Open Distributed Transaction Processing Model (X/Open XA), which became a de facto standard for the behavior of transaction model components.

Databases are common transactional resources and, often, transactions span a couple of such databases. In this case, a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. The isolation property (the I of ACID) poses a special challenge for multi database transactions, since the (global) serializability property could be violated, even if each database provides it (see also global serializability). In practice most commercial database systems use strong strict two phase locking (SS2PL) for concurrency control, which ensures global serializability, if all the participating databases employ it. (see also commitment ordering for multidatabases.)

A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a short period of time, ranging from couple of milliseconds to couple of minutes.

There are also long-lived distributed transactions, for example a transaction to book a trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the resources for this long. In this case more sophisticated techniques that involve multiple undo levels are used. The way you can undo the hotel booking by calling a desk and cancelling the reservation, a system can be designed to undo certain operations (unless they are irreversibly finished).

In practice, long-lived distributed transactions are implemented in systems based on web services. Usually these transactions utilize principles of compensating transactions, Optimism and Isolation Without Locking. The X/Open standard does not cover long-lived distributed transactions.[ citation needed ]

Several technologies, including Jakarta Enterprise Beans and Microsoft Transaction Server fully support distributed transaction standards.

See also

Related Research Articles

The Jakarta Transactions, one of the Jakarta EE APIs, enables distributed transactions to be done across multiple X/Open XA resources in a Java environment. JTA was a specification developed under the Java Community Process as JSR 907. JTA provides for:

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

In computer science, transaction processing is information processing that is divided into individual, indivisible operations called transactions. Each transaction must succeed or fail as a complete unit; it can never be only partially complete.

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Optimistic concurrency control (OCC), also known as optimistic locking, is a concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted. Optimistic concurrency control was first proposed in 1979 by H. T. Kung and John T. Robinson.

A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops in which case many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

In databases and transaction processing, two-phase locking (2PL) is a concurrency control method that guarantees serializability. It is also the name of the resulting set of database transaction schedules (histories). The protocol uses locks, applied by a transaction to data, which may block other transactions from accessing the same data during the transaction's life.

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in case of failures, including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.

In the field of databases in computer science, a transaction log is a history of actions executed by a database management system used to guarantee ACID properties over crashes or hardware failures. Physically, a log is a file listing changes to the database, stored in a stable storage format.

In database systems, isolation determines how transaction integrity is visible to other users and systems.

In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is an indivisible and irreducible series of database operations such that either all occurs, or nothing occurs. A guarantee of atomicity prevents updates to the database occurring only partially, which can cause greater problems than rejecting the whole series outright. As a consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next it has already occurred in whole.

In the fields of databases and transaction processing, a schedule of a system is an abstract model to describe execution of transactions running in the system. Often it is a list of operations (actions) ordered by time, performed by a set of transactions that are executed together in the system. If the order in time between certain operations is not determined by the system, then a partial order is used. Examples of such operations are requesting a read operation, reading, writing, aborting, committing, requesting a lock, locking, etc. Not all transaction operation types should be included in a schedule, and typically only selected operation types are included, as needed to reason about and describe certain phenomena. Schedules and schedule properties are fundamental concepts in database concurrency control theory.

In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction. This protocol achieves its goal even in many cases of temporary system failure, and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.

In computer science, software transactional memory (STM) is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. STM is a strategy implemented in software, rather than as a hardware component. A transaction in this context occurs when a piece of code executes a series of reads and writes to shared memory. These reads and writes logically occur at a single instant in time; intermediate states are not visible to other (successful) transactions. The idea of providing hardware support for transactions originated in a 1986 paper by Tom Knight. The idea was popularized by Maurice Herlihy and J. Eliot B. Moss. In 1995 Nir Shavit and Dan Touitou extended this idea to software-only transactional memory (STM). Since 2005, STM has been the focus of intense research and support for practical implementations is growing.

In computer science and data management, a commit is the making of a set of tentative changes permanent, marking the end of a transaction and providing Durability to ACID transactions. A commit is an act of committing. The record of commits is called the commit log.

For transaction processing in computing, the X/Open XA standard is a specification released in 1991 by X/Open for distributed transaction processing (DTP).

In concurrency control of databases, transaction processing, and various transactional applications, both centralized and distributed, a transaction schedule is serializable if its outcome is equal to the outcome of its transactions executed serially, i.e. without overlapping in time. Transactions are normally executed concurrently, since this is the most efficient way. Serializability is the major correctness criterion for concurrent transactions' executions. It is considered the highest level of isolation between transactions, and plays an essential role in concurrency control. As such it is supported in all general purpose database systems. Strong strict two-phase locking (SS2PL) is a popular serializability mechanism utilized in most of the database systems since their early days in the 1970s.

Commitment ordering (CO) is a class of interoperable serializability techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation of multi-core processors, CO has also been increasingly utilized in concurrent programming, transactional memory, and software transactional memory (STM) to achieve serializability optimistically. CO is also the name of the resulting transaction schedule (history) property, defined in 1988 with the name dynamic atomicity. In a CO compliant schedule, the chronological order of commitment events of transactions is compatible with the precedence order of the respective transactions. CO is a broad special case of conflict serializability and effective means to achieve global serializability across any collection of database systems that possibly use different concurrency control mechanisms.

In concurrency control of databases, transaction processing, and other transactional distributed applications, global serializability is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database schedules in a multidatabase environment. Complying with global serializability means that the global schedule is serializable, has the serializability property, while each component database (module) has a serializable schedule as well. In other words, a collection of serializable components provides overall system serializability, which is usually incorrect. A need in correctness across databases in multidatabase systems makes global serializability a major goal for global concurrency control. With the proliferation of the Internet, Cloud computing, Grid computing, and small, portable, powerful computing devices, as well as increase in systems management sophistication, the need for atomic distributed transactions and thus effective global serializability techniques, to ensure correctness in and among distributed transactional applications, seems to increase.

Distributed concurrency control is the concurrency control of a system distributed over a computer network.

References

    Further reading