Database transaction

Last updated

A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

Contents

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops (completely or partially) in which case many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

In a database management system, a transaction is a single unit of logic or work, sometimes made up of multiple operations. Any logical calculation done in a consistent mode in a database is known as a transaction. One example is a transfer from one bank account to another: the complete transaction requires subtracting the amount to be transferred from one account and adding that same amount to the other.

A database transaction, by definition, must be atomic (it must either be complete in its entirety or have no effect whatsoever), consistent (it must conform to existing constraints in the database), isolated (it must not affect other transactions) and durable (it must get written to persistent storage). [1] Database practitioners often refer to these properties of database transactions using the acronym ACID.

Purpose

Databases and other data stores which treat the integrity of data as paramount often include the ability to handle transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When this happens it is often important to ensure that all such processing leaves the database or data store in a consistent state.

Examples from double-entry accounting systems often illustrate the concept of transactions. In double-entry accounting every debit requires the recording of an associated credit. If one writes a check for $100 to buy groceries, a transactional double-entry accounting system must record the following two entries to cover the single transaction:

  1. Debit $100 to Groceries Expense Account
  2. Credit $100 to Checking Account

A transactional system would make both entries pass or both entries would fail. By treating the recording of multiple entries as an atomic transactional unit of work the system maintains the integrity of the data recorded. In other words, nobody ends up with a situation in which a debit is recorded but no associated credit is recorded, or vice versa.

Transactional databases

A transactional database is a DBMS that provides the ACID properties for a bracketed set of database operations (begin-commit). Transactions ensure that the database is always in a consistent state, even in the event of concurrent updates and failures. [2] All the write operations within a transaction have an all-or-nothing effect, that is, either the transaction succeeds and all writes take effect, or otherwise, the database is brought to a state that does not include any of the writes of the transaction. Transactions also ensure that the effect of concurrent transactions satisfies certain guarantees, known as isolation level. The highest isolation level is serializability, which guarantees that the effect of concurrent transactions is equivalent to their serial (i.e. sequential) execution.

Most modern relational database management systems support transactions. NoSQL databases prioritize scalability along with supporting transactions in order to guarantee data consistency in the event of concurrent updates and accesses.

In a database system, a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following:

  1. Begin the transaction.
  2. Execute a set of data manipulations and/or queries.
  3. If no error occurs, then commit the transaction.
  4. If an error occurs, then roll back the transaction.

A transaction commit operation persists all the results of data manipulations within the scope of the transaction to the database. A transaction rollback operation does not persist the partial results of data manipulations within the scope of the transaction to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state.

Internally, multi-user databases store and process transactions, often by using a transaction ID or XID.

There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are a variant of nested transactions where the sub-transactions take place at different levels of a layered system architecture (e.g., with one operation at the database-engine level, one operation at the operating-system level). [3] Another type of transaction is the compensating transaction.

In SQL

Transactions are available in most SQL database implementations, though with varying levels of robustness. For example, MySQL began supporting transactions from early version 3.23, but the InnoDB storage engine was not default before version 5.5. The earlier available storage engine, MyISAM does not support transactions.

A transaction is typically started using the command BEGIN (although the SQL standard specifies START TRANSACTION). When the system processes a COMMIT statement, the transaction ends with successful completion. A ROLLBACK statement can also end the transaction, undoing any work performed since BEGIN. If autocommit was disabled with the start of a transaction, autocommit will also be re-enabled with the end of the transaction.

One can set the isolation level for individual transactional operations as well as globally. At the highest level (READ COMMITTED), the result of any operation performed after a transaction has started will remain invisible to other database users until the transaction has ended. At the lowest level (READ UNCOMMITTED), which may occasionally be used to ensure high concurrency, such changes will be immediately visible.

Object databases

Relational databases are traditionally composed of tables with fixed-size fields and records. Object databases comprise variable-sized blobs, possibly serializable or incorporating a mime-type. The fundamental similarities between Relational and Object databases are the start and the commit or rollback.

After starting a transaction, database records or objects are locked, either read-only or read-write. Reads and writes can then occur. Once the transaction is fully defined, changes are committed or rolled back atomically, such that at the end of the transaction there is no inconsistency.

Distributed transactions

Database systems implement distributed transactions [4] as transactions accessing data over multiple nodes. A distributed transaction enforces the ACID properties over multiple nodes, and might include systems such as databases, storage managers, file systems, messaging systems, and other data managers. In a distributed transaction there is typically an entity coordinating all the process to ensure that all parts of the transaction are applied to all relevant systems. Moreover, the integration of Storage as a Service (StaaS) within these environments is crucial, as it offers a virtually infinite pool of storage resources, accommodating a range of cloud-based data store classes with varying availability, scalability, and ACID properties. This integration is essential for achieving higher availability, lower response time, and cost efficiency in data-intensive applications deployed across cloud-based data stores. [5]

Transactional filesystems

The Namesys Reiser4 filesystem for Linux [6] supports transactions, and as of Microsoft Windows Vista, the Microsoft NTFS filesystem [7] supports distributed transactions across networks. There is occurring research into more data coherent filesystems, such as the Warp Transactional Filesystem (WTF). [8]

See also

Related Research Articles

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

In computer science, transaction processing is information processing that is divided into individual, indivisible operations called transactions. Each transaction must succeed or fail as a complete unit; it can never be only partially complete.

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Multiversion concurrency control, is a non-locking concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in case of failures, including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.

In database systems, isolation determines how transaction integrity is visible to other users and systems. A lower isolation level increases the ability of many users to access the same data at the same time, but also increases the number of concurrency effects users might encounter. Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another.

In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or none occur. A guarantee of atomicity prevents partial database updates from occurring, because they can cause greater problems than rejecting the whole series outright. As a consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next it has already occurred in whole.

In database technologies, a rollback is an operation which returns the database to some previous state. Rollbacks are important for database integrity, because they mean that the database can be restored to a clean copy even after erroneous operations are performed. They are crucial for recovering from database server crashes; by rolling back any transaction which was active at the time of the crash, the database is restored to a consistent state.

<span class="mw-page-title-main">HSQLDB</span> Java-based database engine

HSQLDB is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92, SQL:2008, SQL:2011, and SQL:2016 standards. It offers a fast, small database engine which offers both in-memory and disk-based tables. Both embedded and server modes are available.

In database systems, consistency refers to the requirement that any given database transaction must change affected data only in allowed ways. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted but merely that any programming errors cannot result in the violation of any defined database constraints.

In computer science and data management, a commit is the making of a set of tentative changes permanent, marking the end of a transaction and providing Durability to ACID transactions. A commit is an act of committing. The record of commits is called the commit log.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

The Access Database Engine is a database engine on which several Microsoft products have been built. The first version of Jet was developed in 1992, consisting of three modules which could be used to manipulate a database.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

<span class="mw-page-title-main">H2 (database)</span>

H2 is a relational database management system written in Java. It can be embedded in Java applications or run in client-server mode.

In databases, and transaction processing, snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

In the context of data management, autocommit is a mode of operation of a database connection. Each individual database interaction submitted through the database connection in autocommit mode will be executed in its own transaction that is implicitly committed. A SQL statement executed in autocommit mode cannot be rolled back.

The following is provided as an overview of and topical guide to databases:

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

<span class="mw-page-title-main">Apache Trafodion</span> Relational database management system for Apache Hadoop

Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase targeting big data transactional or operational workloads. The project was named after the Welsh word for transactions. As of April 2021, it is no longer actively developed.

References

  1. "What is a Transaction? (Windows)". msdn.microsoft.com.
  2. DINCĂ, Ana-Maria; AXINTE, Sabina-Daniela; BACIVAROV, Ioan (2022-12-29). "Performance Enhancements for Database Transactions". International Journal of Information Security and Cybercrime. 11 (2): 29–34. doi:10.19107/ijisc.2022.02.02. ISSN   2285-9225.
  3. Beeri, C.; Bernstein, P. A.; Goodman, N. (1989). "A model for concurrency in nested transactions systems". Journal of the ACM. 36 (1): 230–269. doi: 10.1145/62044.62046 . S2CID   12956480.
  4. Özsu, M. Tamer; Valduriez, Patrick (2011). Principles of Distributed Database Systems, Third Edition. Springer. Bibcode:2011podd.book.....O. doi:10.1007/978-1-4419-8834-8. ISBN   978-1-4419-8833-1.
  5. Mansouri, Yaser; Toosi, Adel Nadjaran; Buyya, Rajkumar (2017-12-11). "Data Storage Management in Cloud Environments: Taxonomy, Survey, and Future Directions". ACM Computing Surveys. 50 (6): 91:1–91:51. doi:10.1145/3136623. ISSN   0360-0300.
  6. "Linux.org". Linux.org.
  7. "MSDN Library" . Retrieved 16 October 2014.
  8. https://www.usenix.org/system/files/conference/nsdi16/nsdi16-paper-escriva.pdf [ bare URL PDF ]

Further reading