Isolation (database systems)

Last updated

In database systems, isolation is one of the ACID (Atomicity, Consistency, Isolation, Durability ) transaction properties. It determines how transaction integrity is visible to other users and systems. A lower isolation level increases the ability of many users to access the same data at the same time, but also increases the number of concurrency effects (such as dirty reads or lost updates) users might encounter. Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another. [1]

Contents

DBMS Concurrency Control

Concurrency control comprises the underlying mechanisms in a DBMS which handle isolation and guarantee related correctness. It is heavily used by the database and storage engines both to guarantee the correct execution of concurrent transactions, and (via different mechanisms) the correctness of other DBMS processes. The transaction-related mechanisms typically constrain the database data access operations' timing (transaction schedules) to certain orders characterized as the serializability and recoverability schedule properties. Constraining database access operation execution typically means reduced performance (measured by rates of execution), and thus concurrency control mechanisms are typically designed to provide the best performance possible under the constraints. Often, when possible without harming correctness, the serializability property is compromised for better performance. However, recoverability cannot be compromised, since such typically results in a quick database integrity violation.[ citation needed ]

Two-phase locking is the most common transaction concurrency control method in DBMSs, used to provide both serializability and recoverability for correctness. In order to access a database object a transaction first needs to acquire a lock for this object. Depending on the access operation type (e.g., reading or writing an object) and on the lock type, acquiring the lock may be blocked and postponed, if another transaction is holding a lock for that object.[ citation needed ]

Client Side Isolation

Isolation is typically enforced at the database level. However, various client-side systems can also be used. It can be controlled in application frameworks or runtime containers such as J2EE Entity Beans [2] On older systems, it may be implemented systemically (by the application developers), for example through the use of temporary tables.[ failed verification ] In two-tier, three-tier, or n-tier web applications a transaction manager can be used to maintain isolation. A transaction manager is middleware which sits between an app service (back-end application service) and the operating system. A transaction manager can provide global isolation and atomicity. It tracks when new servers join a transaction and coordinates an atomic commit protocol among the servers. The details are abstracted from the app, making transactions simpler and easier to code. A transaction processing monitor (TPM) is a collection of middle-ware including a transaction manager. A TPM might provide local isolation to an app with a lock manager. [2]

Read Phenomena

The ANSI/ISO standard SQL 92 refers to three different read phenomena when a transaction retrieves data that another transaction might have updated.

In the following examples, two transactions take place. In transaction 1, a query is performed, then in transaction 2, an update is performed, and finally in transaction 1, the same query is performed again.

The examples use the following relation:

users
idnameage
1Alice20
2Bob25

Dirty reads

Main article: Write-read conflict

A dirty read (aka uncommitted dependency) occurs when a transaction retrieves a row that has been updated by another transaction that is not yet committed.

In this example, transaction 1 retrieves the row with id 1, then transaction 2 updates the row with id 1, and finally transaction 1 retrieves the row with id 1 again. Now if transaction 2 rolls back its update (already retrieved by transaction 1) or performs other updates, then the view of the row may be wrong in transaction 1. At the READ UNCOMMITTED isolation level, the second SELECT in transaction 1 retrieves the updated row: this is a dirty read. At the READ COMMITTED, REPEATABLE READ, and SERIALIZABLE isolation levels, the second SELECT in transaction 1 retrieves the initial row.

Transaction 1Transaction 2
BEGIN;SELECTageFROMusersWHEREid=1;-- retrieves 20
BEGIN;UPDATEusersSETage=21WHEREid=1;-- no commit here
SELECTageFROMusersWHEREid=1;-- READ UNCOMMITTED retrieves 21 (dirty read)-- READ COMMITTED retrieves 20 (dirty read has been avoided)-- REPEATABLE READ retrieves 20 (dirty read has been avoided)-- SERIALIZABLE retrieves 20 (dirty read has been avoided)COMMIT;
ROLLBACK;

Non-repeatable reads

A non-repeatable read occurs when a transaction retrieves a row twice and that row is updated by another transaction that is committed in between.

In this example, transaction 1 retrieves the row with id 1, then transaction 2 updates the row with id 1 and is committed, and finally transaction 1 retrieves the row with id 1 again. At the READ UNCOMMITTED and READ COMMITTED isolation levels, the second SELECT in transaction 1 retrieves the updated row: this is a non-repeatable read. At the REPEATABLE READ and SERIALIZABLE isolation levels, the second SELECT in transaction 1 retrieves the initial row.

Transaction 1Transaction 2
BEGIN;SELECTageFROMusersWHEREid=1;-- retrieves 20
BEGIN;UPDATEusersSETage=21WHEREid=1;COMMIT;
SELECTageFROMusersWHEREid=1;-- READ UNCOMMITTED retrieves 21 (non-repeatable read)-- READ COMMITTED retrieves 21 (non-repeatable read)-- REPEATABLE READ retrieves 20 (non-repeatable read has been avoided)-- SERIALIZABLE retrieves 20 (non-repeatable read has been avoided)COMMIT;

Phantom reads

A phantom read occurs when a transaction retrieves a set of rows twice and new rows are inserted into or removed from that set by another transaction that is committed in between.

In this example, transaction 1 retrieves the set of rows with age greater than 17, then transaction 2 inserts a row with age 26 and is committed, and finally transaction 1 retrieves the set of rows with age greater than 17 again. At the READ UNCOMMITTED, READ COMMITTED, and REPEATABLE READ isolation levels, the second SELECT in transaction 1 retrieves the new set of rows that includes the inserted row: this is a phantom read. At the SERIALIZABLE isolation level, the second SELECT in transaction 1 retrieves the initial set of rows.

Transaction 1Transaction 2
BEGIN;SELECTnameFROMusersWHEREage>17;-- retrieves Alice and Bob
BEGIN;INSERTINTOusersVALUES(3,'Carol',26);COMMIT;
SELECTnameFROMusersWHEREage>17;-- READ UNCOMMITTED retrieves Alice, Bob and Carol (phantom read)-- READ COMMITTED retrieves Alice, Bob and Carol (phantom read)-- REPEATABLE READ retrieves Alice, Bob and Carol (phantom read)-- SERIALIZABLE retrieves Alice and Bob (phantom read has been avoided)COMMIT;

There are two basic strategies used to prevent non-repeatable reads and phantom reads. In the first strategy, lock-based concurrency control, transaction 2 is committed after transaction 1 is committed or rolled back. It produces the serial schedule T1, T2. In the other strategy, multiversion concurrency control , transaction 2 is committed immediately while transaction 1, which started before transaction 2, continues to operate on an old snapshot of the database taken at the start of transaction 1, and when transaction 1 eventually tries to commit, if the result of committing would be equivalent to the serial schedule T1, T2, then transaction 1 is committed; otherwise, there is a commit conflict and transaction 1 is rolled back with a serialization failure.

Under lock-based concurrency control, non-repeatable reads and phantom reads may occur when read locks are not acquired when performing a SELECT, or when the acquired locks on affected rows are released as soon as the SELECT is performed. Under multiversion concurrency control, non-repeatable reads and phantom reads may occur when the requirement that a transaction affected by a commit conflict must be rolled back is relaxed.

Isolation levels

Of the four ACID properties in a DBMS (Database Management System), the isolation property is the one most often relaxed. When attempting to maintain the highest level of isolation, a DBMS usually acquires locks on data which may result in a loss of concurrency, or implements multiversion concurrency control. This requires adding logic for the application to function correctly.

Most DBMSs offer a number of transaction isolation levels, which control the degree of locking that occurs when selecting data. For many database applications, the majority of database transactions can be constructed to avoid requiring high isolation levels (e.g. SERIALIZABLE level), thus reducing the locking overhead for the system. The programmer must carefully analyze database access code to ensure that any relaxation of isolation does not cause software bugs that are difficult to find. Conversely, if higher isolation levels are used, the possibility of deadlock is increased, which also requires careful analysis and programming techniques to avoid.

Since each isolation level is stronger than those below, in that no higher isolation level allows an action forbidden by a lower one, the standard permits a DBMS to run a transaction at an isolation level stronger than that requested (e.g., a "Read committed" transaction may actually be performed at a "Repeatable read" isolation level).

The isolation levels defined by the ANSI/ISO SQL standard are listed as follows.

Serializable

This is the highest isolation level.

With a lock-based concurrency control DBMS implementation, serializability requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT query uses a ranged WHERE clause, especially to avoid the phantom reads phenomenon.

When using non-lock based concurrency control, no locks are acquired; however, if the system detects a write collision among several concurrent transactions, only one of them is allowed to commit. See snapshot isolation for more details on this topic.

From : (Second Informal Review Draft) ISO/IEC 9075:1992, Database Language SQL- July 30, 1992: The execution of concurrent SQL-transactions at isolation level SERIALIZABLE is guaranteed to be serializable. A serializable execution is defined to be an execution of the operations of concurrently executing SQL-transactions that produces the same effect as some serial execution of those same SQL-transactions. A serial execution is one in which each SQL-transaction executes to completion before the next SQL-transaction begins.

Repeatable reads

In this isolation level, a lock-based concurrency control DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so phantom reads can occur.

Write skew is possible at this isolation level in some systems. Write skew is a phenomenon where two writes are allowed to the same column(s) in a table by two different writers (who have previously read the columns they are updating), resulting in the column having data that is a mix of the two transactions. [3] [4]

Read committed

In this isolation level, a lock-based concurrency control DBMS implementation keeps write locks (acquired on selected data) until the end of the transaction, but read locks are released as soon as the SELECT operation is performed (so the non-repeatable reads phenomenon can occur in this isolation level). As in the previous level, range-locks are not managed.

Putting it in simpler words, read committed is an isolation level that guarantees that any data read is committed at the moment it is read. It simply restricts the reader from seeing any intermediate, uncommitted, 'dirty' read. It makes no promise whatsoever that if the transaction re-issues the read, it will find the same data; data is free to change after it is read.

Read uncommitted

This is the lowest isolation level. In this level, dirty reads are allowed, so one transaction may see not-yet-committed changes made by other transactions.

Default isolation level

The default isolation level of different DBMS's varies quite widely. Most databases that feature transactions allow the user to set any isolation level. Some DBMS's also require additional syntax when performing a SELECT statement to acquire locks (e.g. SELECT … FOR UPDATE to acquire exclusive write locks on accessed rows).

However, the definitions above have been criticized as being ambiguous, and as not accurately reflecting the isolation provided by many databases:

This paper shows a number of weaknesses in the anomaly approach to defining isolation levels. The three ANSI phenomena are ambiguous, and even in their loosest interpretations do not exclude some anomalous behavior … This leads to some counter-intuitive results. In particular, lock-based isolation levels have different characteristics than their ANSI equivalents. This is disconcerting because commercial database systems typically use locking implementations. Additionally, the ANSI phenomena do not distinguish between a number of types of isolation level behavior that are popular in commercial systems. [5]

There are also other criticisms concerning ANSI SQL's isolation definition, in that it encourages implementors to do "bad things":

... it relies in subtle ways on an assumption that a locking schema is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined. [6]

Isolation levels vs read phenomena

Read phenomenon



Isolation level
Dirty readNon-repeatable readPhantom read
Serializablenonono
Repeatable readnonoyes
Read committednoyesyes
Read uncommittedyesyesyes

Anomaly serializable is not the same as serializable. That is, it is necessary, but not sufficient that a serializable schedule should be free of all three phenomena types. [5]

See also

Related Research Articles

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

<span class="mw-page-title-main">Ingres (database)</span> Database software

Ingres Database is a proprietary SQL relational database management system intended to support large commercial and government applications.

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Multiversion concurrency control, is a non-locking concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.

Optimistic concurrency control (OCC), also known as optimistic locking, is a non-locking concurrency control method applied to transactional systems such as relational database management systems and software transactional memory. OCC assumes that multiple transactions can frequently complete without interfering with each other. While running, transactions use data resources without acquiring locks on those resources. Before committing, each transaction verifies that no other transaction has modified the data it has read. If the check reveals conflicting modifications, the committing transaction rolls back and can be restarted. Optimistic concurrency control was first proposed in 1979 by H. T. Kung and John T. Robinson.

A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

  1. To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops in which case many operations upon a database remain uncompleted, with unclear status.
  2. To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

In computer science, a lock or mutex is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, and with a variety of possible methods there exist multiple unique implementations for different applications.

In databases and transaction processing, two-phase locking (2PL) is a pessimistic concurrency control method that guarantees conflict-serializability. It is also the name of the resulting set of database transaction schedules (histories). The protocol uses locks, applied by a transaction to data, which may block other transactions from accessing the same data during the transaction's life.

In the fields of databases and transaction processing, a schedule of a system is an abstract model to describe the order of executions in a set of transactions running in the system. Often it is a list of operations (actions) ordered by time, performed by a set of transactions that are executed together in the system. If the order in time between certain operations is not determined by the system, then a partial order is used. Examples of such operations are requesting a read operation, reading, writing, aborting, committing, requesting a lock, locking, etc. Often, only a subset of the transaction operation types are included in a schedule.

<span class="mw-page-title-main">HSQLDB</span> Java-based database engine

HSQLDB is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92, SQL:2008, SQL:2011, and SQL:2016 standards. It offers a fast, small database engine which offers both in-memory and disk-based tables. Both embedded and server modes are available.

A nested transaction is a database transaction that is started by an instruction within the scope of an already started transaction.

The Access Database Engine is a database engine on which several Microsoft products have been built. The first version of Jet was developed in 1992, consisting of three modules which could be used to manipulate a database.

In computer science, a database cursor is a mechanism that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records. The database cursor characteristic of traversal makes cursors akin to the programming language concept of iterator.

Commitment ordering (CO) is a class of interoperable serializability techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation of multi-core processors, CO has also been increasingly utilized in concurrent programming, transactional memory, and software transactional memory (STM) to achieve serializability optimistically. CO is also the name of the resulting transaction schedule (history) property, defined in 1988 with the name dynamic atomicity. In a CO compliant schedule, the chronological order of commitment events of transactions is compatible with the precedence order of the respective transactions. CO is a broad special case of conflict serializability and effective means to achieve global serializability across any collection of database systems that possibly use different concurrency control mechanisms.

In databases, and transaction processing, snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database, and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.

<span class="mw-page-title-main">Virtuoso Universal Server</span> Computer software

Virtuoso Universal Server is a middleware and database engine hybrid that combines the functionality of a traditional relational database management system (RDBMS), object–relational database (ORDBMS), virtual database, RDF, XML, free-text, web application server and file server functionality in a single system. Rather than have dedicated servers for each of the aforementioned functionality realms, Virtuoso is a "universal server"; it enables a single multithreaded server process that implements multiple protocols. The free and open source edition of Virtuoso Universal Server is also known as OpenLink Virtuoso. The software has been developed by OpenLink Software with Kingsley Uyi Idehen and Orri Erling as the chief software architects.

Microsoft SQL Server is a proprietary relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which may run either on the same computer or on another computer across a network. Microsoft markets at least a dozen different editions of Microsoft SQL Server, aimed at different audiences and for workloads ranging from small single-machine applications to large Internet-facing applications with many concurrent users.

The following is provided as an overview of and topical guide to databases:

FoundationDB is a free and open-source multi-model distributed NoSQL database developed by Apple Inc. with a shared-nothing architecture. The product was designed around a "core" database, with additional features supplied in "layers." The core database exposes an ordered key–value store with transactions. The transactions are able to read or write multiple keys stored on any machine in the cluster while fully supporting ACID properties. Transactions are used to implement a variety of data models via layers.

<span class="mw-page-title-main">Apache Trafodion</span> Relational database management system for Apache Hadoop

Apache Trafodion is an open-source Top-Level Project at the Apache Software Foundation. It was originally developed by the information technology division of Hewlett-Packard Company and HP Labs to provide the SQL query language on Apache HBase targeting big data transactional or operational workloads. The project was named after the Welsh word for transactions. As of April 2021, it is no longer actively developed.

References

  1. "Isolation Levels in the Database Engine", TechNet, Microsoft, https://technet.microsoft.com/en-us/library/ms189122(v=SQL.105).aspx
  2. 1 2 "The Architecture of Transaction Processing Systems", Chapter 23, Evolution of Processing Systems, Department of Computer Science, Stony Brook University, retrieved 20 March 2014, http://www.cs.sunysb.edu/~liu/cse315/23.pdf
  3. Vlad Mihalcea (2015-10-20). "A beginner's guide to read and write skew phenomena".
  4. "Postgresql wiki - SSI".
  5. 1 2 "A Critique of ANSI SQL Isolation Levels" (PDF). Retrieved 29 July 2012.
  6. salesforce (2010-12-06). "Customer testimonials (SimpleGeo, CLOUDSTOCK 2010)". www.DataStax.com: DataStax. Retrieved 2010-03-09. (see above at about 13:30 minutes of the webcast!)