Data consistency

Last updated

Data inconsistency refers to whether the same data kept at different places do or do not match.

Contents

Point-in-time consistency

Point-in-time consistency is an important property of backup files and a critical objective of software that creates backups. It is also relevant to the design of disk memory systems, specifically relating to what happens when they are unexpectedly shut down.

As a relevant backup example, consider a website with a database such as the online encyclopedia Wikipedia, which needs to be operational around the clock, but also must be backed up with regularity to protect against disaster. Portions of Wikipedia are constantly being updated every minute of every day, meanwhile, Wikipedia's database is stored on servers in the form of one or several very large files which require minutes or hours to back up.

These large files—as with any database—contain numerous data structures which reference each other by location. For example, some structures are indexes which permit the database subsystem to quickly find search results. If the data structures cease to reference each other properly, then the database can be said to be corrupted.

Counter example

The importance of point-in-time consistency can be illustrated with what would happen if a backup were made without it.

Assume Wikipedia's database is a huge file, which has an important index located 20% of the way through, and saves article data at the 75% mark. Consider a scenario where an editor comes and creates a new article at the same time a backup is being performed, which is being made as a simple "file copy" which copies from the beginning to the end of the large file(s) and doesn't consider data consistency - and at the time of the article edit, it is 50% complete. The new article is added to the article space (at the 75% mark) and a corresponding index entry is added (at the 20% mark).

Because the backup is already halfway done and the index already copied, the backup will be written with the article data present, but with the index reference missing. As a result of the inconsistency, this file is considered corrupted.

In real life, a real database such as Wikipedia's may be edited thousands of times per hour, and references are virtually always spread throughout the file and can number into the millions, billions, or more. A sequential "copy" backup would literally contain so many small corruptions that the backup would be completely unusable without a lengthy repair process which could provide no guarantee as to the completeness of what has been recovered.

A backup process which properly accounts for data consistency ensures that the backup is a snapshot of how the entire database looked at a single moment. In the given Wikipedia example, it would ensure that the backup was written without the added article at the 75% mark, so that the article data would be consistent with the index data previously written.

Disk caching systems

Point-in-time consistency is also relevant to computer disk subsystems.

Specifically, operating systems and file systems are designed with the expectation that the computer system they are running on could lose power, crash, fail, or otherwise cease operating at any time. When properly designed, they ensure that data will not be unrecoverably corrupted if the power is lost. Operating systems and file systems do this by ensuring that data is written to a hard disk in a certain order, and rely on that in order to detect and recover from unexpected shutdowns.

On the other hand, rigorously writing data to disk in the order that maximizes data integrity also impacts performance. A process of write caching is used to consolidate and re-sequence write operations such that they can be done faster by minimizing the time spent moving disk heads.

Data consistency concerns arise when write caching changes the sequence in which writes are carried out, because it there exists the possibility of an unexpected shutdown that violates the operating system's expectation that all writes will be committed sequentially.

For example, in order to save a typical document or picture file, an operating system might write the following records to a disk in the following order:

  1. Journal entry saying file XYZ is about to be saved into sector 123.
  2. The actual contents of the file XYZ are written into sector 123.
  3. Sector 123 is now flagged as occupied in the record of free/used space.
  4. Journal entry noting the file completely saved, and its name is XYZ and is located in sector 123.

The operating system relies on the assumption that if it sees item #1 is present (saying the file is about to be saved), but that item #4 is missing (confirming success), that the save operation was unsuccessful and so it should undo any incomplete steps already taken to save it (e.g. marking sector 123 free since it never was properly filled, and removing any record of XYZ from the file directory). It relies on these items being committed to disk in sequential order.

Suppose a caching algorithm determines it would be fastest to write these items to disk in the order 4-3-1-2, and starts doing so, but the power gets shut down after 4 get written, before 3, 1 and 2, and so those writes never occur. When the computer is turned back on, the file system would then show it contains a file named XYZ which is located in sector 123, but this sector really does not contain the file. (Instead, the sector will contain garbage, or zeroes, or a random portion of some old file - and that is what will show if the file is opened).

Further, the file system's free space map will not contain any entry showing that sector 123 is occupied, so later, it will likely assign that sector to the next file to be saved, believing it is available. The file system will then have two files both unexpectedly claiming the same sector (known as a cross-linked file). As a result, a write to one of the files will overwrite part of the other file, invisibly damaging it.

A disk caching subsystem that ensures point-in-time consistency guarantees that in the event of an unexpected shutdown, the four elements would be written one of only five possible ways: completely (1-2-3-4), partially (1, 1-2, 1-2-3), or not at all.

High-end hardware disk controllers of the type found in servers include a small battery back-up unit on their cache memory so that they may offer the performance gains of write caching while mitigating the risk of unintended shutdowns. The battery back-up unit keeps the memory powered even during a shutdown so that when the computer is powered back up, it can quickly complete any writes it has previously committed. With such a controller, the operating system may request four writes (1-2-3-4) in that order, but the controller may decide the quickest way to write them is 4-3-1-2. The controller essentially lies to the operating system and reports that the writes have been completed in order (a lie that improves performance at the expense of data corruption if power is lost), and the battery backup hedges against the risk of data corruption by giving the controller a way to silently fix any and all damage that could occur as a result.

If the power gets shut off after element 4 has been written, the battery backed memory contains the record of commitment for the other three items and ensures that they are written ("flushed") to the disk at the next available opportunity.

Transaction consistency

Consistency (database systems) in the realm of Distributed database systems refers to the property of many ACID databases to ensure that the results of a Database transaction are visible to all nodes simultaneously. That is, once the transaction has been committed all parties attempting to access the database can see the results of that transaction simultaneously.

A good example of the importance of transaction consistency is a database that handles the transfer of money. Suppose a money transfer requires two operations: writing a debit in one place, and a credit in another. If the system crashes or shuts down when one operation has completed but the other has not, and there is nothing in place to correct this, the system can be said to lack transaction consistency. With a money transfer, it is desirable that either the entire transaction completes, or none of it completes. Both of these scenarios keep the balance in check.

Transaction consistency ensures just that - that a system is programmed to be able to detect incomplete transactions when powered on, and undo (or "roll back") the portion of any incomplete transactions that are found.

Application consistency

Application consistency, similar to transaction consistency, is applied on a grander scale. Instead of having the scope of a single transaction, data must be consistent within the confines of many different transaction streams from one or more applications. An application may be made up of many different types of data, various types of files and data feeds from other applications. Application consistency is the state in which all related files and databases are synchronized representing the true status of the application.

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

New Technology File System (NTFS) is a proprietary journaling file system developed by Microsoft. Starting with Windows NT 3.1, it is the default file system of the Windows NT family. It superseded File Allocation Table (FAT) as the preferred filesystem on Windows and is supported in Linux and BSD as well. NTFS reading and writing support is provided using a free and open-source kernel implementation known as NTFS3 in Linux and the NTFS-3G driver in BSD. By using the convert command, Windows can convert FAT32/16/12 into NTFS without the need to rewrite all files. NTFS uses several files typically hidden from the user to store metadata about other files stored on the drive which can help improve speed and performance when reading data. Unlike FAT and High Performance File System (HPFS), NTFS supports access control lists (ACLs), filesystem encryption, transparent compression, sparse files and file system journaling. NTFS also supports shadow copy to allow backups of a system while it is running, but the functionality of the shadow copies varies between different versions of Windows.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

In computer science, write-ahead logging (WAL) is a family of techniques for providing atomicity and durability in database systems. It can be seen as an implementation of the "Event Sourcing" architecture, in which the state of a system is the result of the evolution of incoming events from an initial state. A write ahead log is an append-only auxiliary disk-resident structure used for crash and transaction recovery. The changes are first recorded in the log, which must be written to stable storage, before the changes are written to the database.

In computer science, a consistency model specifies a contract between the programmer and a system, wherein the system guarantees that if the programmer follows the rules for operations on memory, memory will be consistent and the results of reading, writing, or updating memory will be predictable. Consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. Consistency is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

<span class="mw-page-title-main">File system</span> Format or program for storing files and directories

In computing, a file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data are easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system."

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It's also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

<span class="mw-page-title-main">H2 (database)</span>

H2 is a relational database management system written in Java. It can be embedded in Java applications or run in client-server mode.

Database tuning describes a group of activities used to optimize and homogenize the performance of a database. It usually overlaps with query tuning, but refers to design of the database files, selection of the database management system (DBMS) application, and configuration of the database's environment.

sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.

Windows Vista introduced a number of new I/O functions to the Microsoft Windows line of operating systems. They are intended to shorten the time taken to boot the system, improve the responsiveness of the system, and improve the reliability of data storage.

<span class="mw-page-title-main">Disk buffer</span>

In computer storage, disk buffer is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.

<span class="mw-page-title-main">RDM Server</span>

RDM Server is an embeddable, heterogeneous, client/server database management system supporting both C/C++ and SQL APIs for programming flexibility. The databases can be disk resident and/or memory resident. RDM Server implements multi-user locking, hot database backup, and a fully ACID-compliant transaction logging system with automatic crash recovery. It is currently supported on many 32- and 64-bit enterprise and embedded operating systems. The database library can optionally be run in-process with the application, eliminating client/server remote procedure calls.

<span class="mw-page-title-main">Forensic disk controller</span> Forensic Hardware Device Prevent Writing

A forensic disk controller or hardware write-block device is a specialized type of computer hard disk controller made for the purpose of gaining read-only access to computer hard drives without the risk of damaging the drive's contents. The device is named forensic because its most common application is for use in investigations where a computer hard drive may contain evidence. Such a controller historically has been made in the form of a dongle that fits between a computer and an IDE or SCSI hard drive, but with the advent of USB and SATA, forensic disk controllers supporting these newer technologies have become widespread. Steve Bress and Mark Menz invented hard drive write blocking.

<span class="mw-page-title-main">Unisys OS 2200 databases</span> Aspect of Unisys OS 2200 operating system

The OS 2200 database managers are all part of the Universal Data System (UDS). UDS provides a common control structure for multiple different data models. Flat files, network (DMS), and relational (RDMS) data models all share a common locking, recovery, and clustering mechanism. OS 2200 applications can use any mixture of these data models along with the high-volume transaction file system within the same program while retaining a single common recovery mechanism.

<span class="mw-page-title-main">Oracle NoSQL Database</span>

Oracle NoSQL Database is a NoSQL-type distributed key-value database from Oracle Corporation. It provides transactional semantics for data manipulation, horizontal scalability, and simple administration and monitoring.

Lightning Memory-Mapped Database (LMDB) is a software library that provides an embedded transactional database in the form of a key-value store. LMDB is written in C with API bindings for several programming languages. LMDB stores arbitrary key/data pairs as byte arrays, has a range-based search capability, supports multiple data items for a single key and has a special mode for appending records (MDB_APPEND) without checking for consistency. LMDB is not a relational database, it is strictly a key-value store like Berkeley DB and dbm.