Durability (database systems)

Last updated January 01, 2025

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in cases of failures,^[1] including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.^[2]

Formally, a database system ensures the durability property if it tolerates three types of failures: transaction, system, and media failures.^[1] In particular, a transaction fails if its execution is interrupted before all its operations have been processed by the system.^[3] These kinds of interruptions can be originated at the transaction level by data-entry errors, operator cancellation, timeout, or application-specific errors, like withdrawing money from a bank account with insufficient funds.^[1] At the system level, a failure occurs if the contents of the volatile storage are lost, due, for instance, to system crashes, like out-of-memory events.^[3] At the media level, where media means a stable storage that withstands system failures, failures happen when the stable storage, or part of it, is lost.^[3] These cases are typically represented by disk failures.^[1]

Thus, to be durable, the database system should implement strategies and operations that guarantee that the effects of transactions that have been committed before the failure will survive the event (even by reconstruction), while the changes of incomplete transactions, which have not been committed yet at the time of failure, will be reverted and will not affect the state of the database system. These behaviours are proven to be correct when the execution of transactions has respectively the resilience and recoverability properties.^[3]

Mechanisms

In transaction-based systems, the mechanisms that assure durability are historically associated with the concept of reliability of systems, as proposed by Jim Gray in 1981.^[1] This concept includes durability, but it also relies on aspects of the atomicity and consistency properties.^[4] Specifically, a reliability mechanism requires primitives that explicitly state the beginning, the end, and the rollback of transactions,^[1] which are also implied for the other two aforementioned properties. In this article, only the mechanisms strictly related to durability have been considered. These mechanisms are divided into three levels: transaction, system, and media level. This can be seen as well for scenarios where failures could happen and that have to be considered in the design of database systems to address durability.^[3]

Transaction level

Durability against failures that occur at transaction level, such as canceled calls and inconsistent actions that may be blocked before committing by constraints and triggers, is guaranteed by the serializability property of the execution of transactions. The state generated by the effects of precedently committed transactions is available in main memory and, thus, is resilient, while the changes carried by non-committed transactions can be undone. In fact, thanks to serializability, they can be discerned from other transactions and, therefore, their changes are discarded.^[3] In addition, it is relevant to consider that in-place changes, which overwrite old values without keeping any kind of history are discouraged.^[1] There exist multiple approaches that keep track of the history of changes, such as timestamp-based solutions^[5] or logging and locking.^[1]

System level

At system level, failures happen, by definition,^[3] when the contents of the volatile storage are lost. This can occur in events like system crashes or power outages. Existing database systems use volatile storage (i.e. the main memory of the system) for different purposes: some store their whole state and data in it, even without any durability guarantee; others keep the state and the data, or part of them, in memory, but also use the non-volatile storage for data; other systems only keep the state in main memory, while keeping all the data on disk.^[6] The reason behind the choice of having volatile storage, which is subject to this type of failure, and non-volatile storage, is found in the performance differences of the existing technologies that are used to implement these kinds of storage. However, the situation is likely to evolve as the popularity of non-volatile memories (NVM) technologies grows.^[7]

In systems that include non-volatile storage, durability can be achieved by keeping and flushing an immutable sequential log of the transactions to such non-volatile storage before acknowledging commitment. Thanks to their atomicity property, the transactions can be considered the unit of work in the recovery process that guarantees durability while exploiting the log. In particular, the logging mechanism is called write-ahead log (WAL) and allows durability by buffering changes to the disk before they are synchronized from the main memory. In this way, by reconstruction from the log file, all committed transactions are resilient to system-level failures, because they can be redone. Non-committed transactions, instead, are recoverable, since their operations are logged to non-volatile storage before they effectively modify the state of the database.^[8] In this way, the partially executed operations can be undone without affecting the state of the system. After that, those transactions that were incomplete can be redone. Therefore, the transaction log from non-volatile storage can be reprocessed to recreate the system state right before any later system-level failure. Logging is done as a combination of tracking data and operations (i.e. transactions) for performance reasons.^[9]

Media level

At media level, failure scenarios affect non-volatile storage, like hard disk drives, solid-state drives, and other types of storage hardware components.^[8] To guarantee durability at this level, the database system shall rely on stable memory, which is a memory that is completely and ideally failure-resistant. This kind of memory can be achieved with mechanisms of replication and robust writing protocols.^[4]

Many tools and technologies are available to provide a logical stable memory, such as the mirroring of disks, and their choice depends on the requirements of the specific applications.^[4] In general, replication and redundancy strategies and architectures that behave like stable memory are available at different levels of the technology stack. In this way, even in case of catastrophic events where the storage hardware is damaged, data loss can be prevented.^[10] At this level, there is a strong bond between durability and system and data recovery, in the sense that the main goal is to preserve the data, not necessarily in online replicas, but also as offline copies.^[4] These last techniques fall into the categories of backup, data loss prevention, and IT disaster recovery.^[11]

Therefore, in case of media failure, the durability of transactions is guaranteed by the ability to reconstruct the state of the database from the log files stored in the stable memory, in any way it was implemented in the database system.^[8] There exist several mechanisms to store and reconstruct the state of a database system that improves the performance, both in terms of space and time, compared to managing all the log files created from the beginning of the database system. These mechanisms often include incremental dumping, differential files, and checkpoints.^[12]

Distributed databases

In distributed transactions, ensuring durability requires additional mechanisms to preserve a consistent state sequence across all database nodes. This means, for example, that a single node may not be enough to decide to conclude a transaction by committing it. In fact, the resources used in that transaction may be on other nodes, where other transactions are occurring concurrently. Otherwise, in case of failure, if consistency could not be guaranteed, it would be impossible to acknowledge a safe state of the database for recovery. For this reason, all participating nodes must coordinate before a commit can be acknowledged. This is usually done by a two-phase commit protocol.^[13]

In addition, in distributed databases, even the protocols for logging and recovery shall address the issues of distributed environments, such as deadlocks, that could prevent the resilience and recoverability of transactions and, thus, durability.^[13] A widely adopted family of algorithms that ensures these properties is Algorithms for Recovery and Isolation Exploiting Semantics (ARIES).^[8]

Related Research Articles

RAID is a data storage virtualization technology that combines multiple physical data storage components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives known as single large expensive disk (SLED).

In computer science, ACID is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction.

In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.

Multiversion concurrency control, is a non-locking concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.

A database transaction symbolizes a unit of work, performed within a database management system against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:

To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops in which case many operations upon a database remain uncompleted, with unclear status.
To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.

In computer science, Algorithms for Recovery and Isolation Exploiting Semantics, or ARIES, is a recovery algorithm designed to work with a no-force, steal database approach; it is used by IBM Db2, Microsoft SQL Server and many other database systems. IBM Fellow Chandrasekaran Mohan is the primary inventor of the ARIES family of algorithms.

In the field of databases in computer science, a transaction log is a history of actions executed by a database management system used to guarantee ACID properties over crashes or hardware failures. Physically, a log is a file listing changes to the database, stored in a stable storage format.

In database systems, isolation is one of the ACID transaction properties. It determines how transaction integrity is visible to other users and systems. A lower isolation level increases the ability of many users to access the same data at the same time, but also increases the number of concurrency effects users might encounter. Conversely, a higher isolation level reduces the types of concurrency effects that users may encounter, but requires more system resources and increases the chances that one transaction will block another.

In database systems, atomicity is one of the ACID transaction properties. An atomic transaction is an indivisible and irreducible series of database operations such that either all occur, or none occur. A guarantee of atomicity prevents partial database updates from occurring, because they can cause greater problems than rejecting the whole series outright. As a consequence, the transaction cannot be observed to be in progress by another database client. At one moment in time, it has not yet happened, and at the next it has already occurred in whole.

Checkpointing is a technique that provides fault tolerance for computing systems. It involves saving a snapshot of an application's state, so that it can restart from that point in case of failure. This is particularly important for long-running applications that are executed in failure-prone computing systems.

In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction. This protocol achieves its goal even in many cases of temporary system failure, and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol.

<span class="mw-page-title-main">HSQLDB</span> Java-based database engine

HSQLDB is a relational database management system written in Java. It has a JDBC driver and supports a large subset of SQL-92, SQL:2008, SQL:2011, and SQL:2016 standards. It offers a fast, small database engine which offers both in-memory and disk-based tables. Both embedded and server modes are available.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It is also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

An in-memory database is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. In-memory databases are faster than disk-optimized databases because disk access is slower than memory access and the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.

Replication in computing refers to maintaining multiple copies of data, processes, or resources to ensure consistency across redundant components. This fundamental technique spans databases, file systems, and distributed systems, serving to improve availability, fault-tolerance, accessibility, and performance. Through replication, systems can continue operating when components fail (failover), serve requests from geographically distributed locations, and balance load across multiple machines. The challenge lies in maintaining consistency between replicas while managing the fundamental tradeoffs between data consistency, system availability, and network partition tolerance – constraints known as the CAP theorem.

Commitment ordering (CO) is a class of interoperable serializability techniques in concurrency control of databases, transaction processing, and related applications. It allows optimistic (non-blocking) implementations. With the proliferation of multi-core processors, CO has also been increasingly utilized in concurrent programming, transactional memory, and software transactional memory (STM) to achieve serializability optimistically. CO is also the name of the resulting transaction schedule (history) property, defined in 1988 with the name dynamic atomicity. In a CO compliant schedule, the chronological order of commitment events of transactions is compatible with the precedence order of the respective transactions. CO is a broad special case of conflict serializability and effective means to achieve global serializability across any collection of database systems that possibly use different concurrency control mechanisms.

sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.

In computer science, the log-structured merge-tree is a data structure with performance characteristics that make it attractive for providing indexed access to files with high insert volume, such as transactional log data. LSM trees, like other search trees, maintain key-value pairs. LSM trees maintain data in two or more separate structures, each of which is optimized for its respective underlying storage medium; data is synchronized between the two structures efficiently, in batches.

In computer science, persistent memory is any method or apparatus for efficiently storing data structures such that they can continue to be accessed using memory instructions or memory APIs even after the end of the process that created or last modified them.

References

1 2 3 4 5 6 7 8 Gray, Jim (1981). "The transaction concept: Virtues and limitations" (PDF). VLDB. 81: 144–154.
↑ "ACID Compliance: What It Means and Why You Should Care". MariaDB. 29 July 2018. Retrieved 22 September 2021.
1 2 3 4 5 6 7 Hadzilacos, Vassos (1988). "A theory of reliability in database systems". Journal of the ACM. 35 (1): 121–145. doi: 10.1145/42267.42272 . ISSN 0004-5411. S2CID 7052304.
1 2 3 4 Atzeni, Paolo, ed. (1999). Database systems: concepts, languages & architectures. New York: McGraw-Hill. pp. 311–320. ISBN 978-0-07-709500-0.
↑ Svobodova, L. (1980). "MANAGEMENT OF OBJECT HISTORIES IN THE SWALLOW REPOSITORY". Mit/LCS Tr-243. USA.
↑ Petrov, Oleksandr (2019). Database internals: a deep dive into how distributed data systems work (1st ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. pp. 40–42. ISBN 978-1-4920-4034-7.
↑ Arulraj, Joy; Pavlo, Andrew (2017-05-09). "How to Build a Non-Volatile Memory Database Management System". Proceedings of the 2017 ACM International Conference on Management of Data. SIGMOD '17. New York, NY, USA: Association for Computing Machinery. pp. 1753–1758. doi:10.1145/3035918.3054780. ISBN 978-1-4503-4197-4. S2CID 648876.
1 2 3 4 Petrov, Oleksandr (2019). Database internals: a deep dive into how distributed data systems work (1st ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. pp. 185–195. ISBN 978-1-4920-4034-7.
↑ Mohan, C.; Haderle, Don; Lindsay, Bruce; Pirahesh, Hamid; Schwarz, Peter (1992-03-01). "ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging". ACM Transactions on Database Systems. 17 (1): 94–162. doi: 10.1145/128765.128770 . ISSN 0362-5915. S2CID 8759704.
↑ Eich, Margaret H. (1987-02-01). "A classification and comparison of main memory database recovery techniques". 1987 IEEE Third International Conference on Data Engineering. IEEE. pp. 332–339. doi:10.1109/ICDE.1987.7272398. ISBN 978-0-8186-0762-2. S2CID 207773738.
↑ Choy, Manhoi; Leong, Hong Va; Wong, Man Hon (2000). "Disaster recovery techniques for database systems". Communications of the ACM. 43 (11es): 6. doi:10.1145/352515.352521. ISSN 0001-0782. S2CID 14781378.
↑ Verhofstad, Joost S. M. (1978-06-01). "Recovery Techniques for Database Systems". ACM Computing Surveys. 10 (2): 167–195. doi:10.1145/356725.356730. S2CID 8847522.
1 2 Mohan, C.; Haderle, Don; Lindsay, Bruce; Pirahesh, Hamid; Schwarz, Peter (1992-03-01). "ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging". ACM Transactions on Database Systems. 17 (1): 94–162. doi: 10.1145/128765.128770 . ISSN 0362-5915. S2CID 8759704.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 3 4 5 6 7 8 Gray, Jim (1981). "The transaction concept: Virtues and limitations" (PDF). VLDB. 81: 144–154.

[2] "ACID Compliance: What It Means and Why You Should Care". MariaDB. 29 July 2018. Retrieved 22 September 2021.

[:1-3] 1 2 3 4 5 6 7 Hadzilacos, Vassos (1988). "A theory of reliability in database systems". Journal of the ACM. 35 (1): 121–145. doi: 10.1145/42267.42272 . ISSN 0004-5411. S2CID 7052304.

[:2-4] 1 2 3 4 Atzeni, Paolo, ed. (1999). Database systems: concepts, languages & architectures. New York: McGraw-Hill. pp. 311–320. ISBN 978-0-07-709500-0.

[5] Svobodova, L. (1980). "MANAGEMENT OF OBJECT HISTORIES IN THE SWALLOW REPOSITORY". Mit/LCS Tr-243. USA.

[6] Petrov, Oleksandr (2019). Database internals: a deep dive into how distributed data systems work (1st ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. pp. 40–42. ISBN 978-1-4920-4034-7.

[7] Arulraj, Joy; Pavlo, Andrew (2017-05-09). "How to Build a Non-Volatile Memory Database Management System". Proceedings of the 2017 ACM International Conference on Management of Data. SIGMOD '17. New York, NY, USA: Association for Computing Machinery. pp. 1753–1758. doi:10.1145/3035918.3054780. ISBN 978-1-4503-4197-4. S2CID 648876.

[:3-8] 1 2 3 4 Petrov, Oleksandr (2019). Database internals: a deep dive into how distributed data systems work (1st ed.). Beijing Boston Farnham Sebastopol Tokyo: O'Reilly. pp. 185–195. ISBN 978-1-4920-4034-7.

[9] Mohan, C.; Haderle, Don; Lindsay, Bruce; Pirahesh, Hamid; Schwarz, Peter (1992-03-01). "ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging". ACM Transactions on Database Systems. 17 (1): 94–162. doi: 10.1145/128765.128770 . ISSN 0362-5915. S2CID 8759704.

[10] Eich, Margaret H. (1987-02-01). "A classification and comparison of main memory database recovery techniques". 1987 IEEE Third International Conference on Data Engineering. IEEE. pp. 332–339. doi:10.1109/ICDE.1987.7272398. ISBN 978-0-8186-0762-2. S2CID 207773738.

[11] Choy, Manhoi; Leong, Hong Va; Wong, Man Hon (2000). "Disaster recovery techniques for database systems". Communications of the ACM. 43 (11es): 6. doi:10.1145/352515.352521. ISSN 0001-0782. S2CID 14781378.

[12] Verhofstad, Joost S. M. (1978-06-01). "Recovery Techniques for Database Systems". ACM Computing Surveys. 10 (2): 167–195. doi:10.1145/356725.356730. S2CID 8847522.

[:4-13] 1 2 Mohan, C.; Haderle, Don; Lindsay, Bruce; Pirahesh, Hamid; Schwarz, Peter (1992-03-01). "ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging". ACM Transactions on Database Systems. 17 (1): 94–162. doi: 10.1145/128765.128770 . ISSN 0362-5915. S2CID 8759704.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]