Data Integrity Field

Last updated

Data Integrity Field (DIF) is an approach to protect data integrity in computer data storage from data corruption. It was proposed in 2003 by the T10 subcommittee of the International Committee for Information Technology Standards. [1] A similar approach for data integrity was added in 2016 to the NVMe 1.2.1 specification. [2]

Packet-based storage transport protocols have CRC protection on command and data payloads. Interconnect buses have parity protection. Memory systems have parity detection/correction schemes. I/O protocol controllers at the transport/interconnect boundaries have internal data path protection. Data availability in storage systems is frequently measured simply in terms of the reliability of the hardware components and the effects of redundant hardware. But the reliability of the software, its ability to detect errors, and its ability to correctly report or apply corrective actions to a failure have a significant bearing on the overall storage system availability. The data exchange usually takes place between the host CPU and storage disk. There may be a storage data controller in between these two. The controller could be RAID controller or simple storage switches.

DIF included extending the disk sector from its traditional 512 bytes, to 520 bytes, by adding eight additional protection bytes. [1] This extended sector is defined for Small Computer System Interface (SCSI) devices, which is in turn used in many enterprise storage technologies, such as Fibre Channel. [3] Oracle Corporation included support for DIF in the Linux kernel. [4] [5]

An evolution of this technology called T10 Protection Information was introduced in 2011. [6] [7]

Related Research Articles

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk, is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

<span class="mw-page-title-main">Data striping</span>

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

<span class="mw-page-title-main">Self-Monitoring, Analysis and Reporting Technology</span> Monitoring system in computer drives

Self-Monitoring, Analysis, and Reporting Technology is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs). Its primary function is to detect and report various indicators of drive reliability with the intent of anticipating imminent hardware failures.

<span class="mw-page-title-main">Data corruption</span> Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines (IBM) as a term to describe the robustness of their mainframe computers.

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

The Massbus is a high-performance computer input/output bus designed in the 1970s by Digital Equipment Corporation (DEC). The architecture development was sponsored by Gordon Bell and John Levy was the principal architect.

<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functions as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device, or a solid-state disk, even though SSDs lack the physical spinning disks and movable read-write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

In computer storage, the standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5, and RAID 6. Multiple RAID levels can also be combined or nested, for instance RAID 10 or RAID 01. RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard. The numerical values only serve as identifiers and do not signify performance, reliability, generation, or any other metric.

Nested RAID levels, also known as hybrid RAID, combine two or more of the standard RAID levels to gain performance, additional redundancy or both, as a result of combining properties of different standard RAID layouts.

Although all RAID implementations differ from the specification to some extent, some companies and open-source projects have developed non-standard RAID implementations that differ substantially from the standard. Additionally, there are non-RAID drive architectures, providing configurations of multiple hard drives not referred to by RAID acronyms.

SCSI / ATA Translation (SAT) is a set of standards developed by the T10 subcommittee, defining how to communicate with ATA devices through a SCSI application layer. The standard attempts to be consistent with the SCSI architectural model, the SCSI Primary Commands, and the SCSI Block Commands standards.

A trim command allows an operating system to inform a solid-state drive (SSD) which blocks of data are no longer considered to be 'in use' and therefore can be erased internally.

<span class="mw-page-title-main">Advanced Format</span> Disk format and access using sector sizes larger than 512 bytes

Advanced Format (AF) is any disk sector format used to store data on magnetic disks in hard disk drives (HDDs) that exceeds 528 bytes per sector, frequently 4096, 4112, 4160, or 4224-byte (4 KB) sectors. Larger sectors of an Advanced Format Drive (AFD) enable the integration of stronger error correction algorithms to maintain data integrity at higher storage densities.

NVM Express (NVMe) or Non-Volatile Memory Host Controller Interface Specification (NVMHCIS) is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via the PCI Express bus. The initial NVM stands for non-volatile memory, which is often NAND flash memory that comes in several physical form factors, including solid-state drives (SSDs), PCIe add-in cards, and M.2 cards, the successor to mSATA cards. NVM Express, as a logical-device interface, has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.

<span class="mw-page-title-main">LIO (SCSI target)</span> Open-source version of SCSI target

In computing, Linux-IO (LIO) Target is an open-source implementation of the SCSI target that has become the standard one included in the Linux kernel. Internally, LIO does not initiate sessions, but instead provides one or more Logical Unit Numbers (LUNs), waits for SCSI commands from a SCSI initiator, and performs required input/output data transfers. LIO supports common storage fabrics, including FCoE, Fibre Channel, IEEE 1394, iSCSI, iSCSI Extensions for RDMA (iSER), SCSI RDMA Protocol (SRP) and USB. It is included in most Linux distributions; native support for LIO in QEMU/KVM, libvirt, and OpenStack makes LIO also a storage option for cloud deployments.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in hard disk drives (HDDs) to increase storage density and overall per-drive storage capacity. Conventional hard disk drives record data by writing non-overlapping magnetic tracks parallel to each other, while shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because, if the writing head is made too narrow, it cannot provide the very high fields required in the recording layer of the disk.

ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005 before being placed under a closed source license when Oracle Corporation acquired Sun in 2009–2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, including ZFS, to continue its development as an open source project. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

References

  1. 1 2 Keith Holt (July 1, 2003). "End-to-End Data Protection Justification" (PDF). T10 Technical Committee document 03-224r0. Retrieved August 29, 2013.
  2. "NVM Express Revision 1.2.1" (PDF). NVM Express, Inc. June 5, 2016.
  3. "Data Integrity Extension" (PDF). T10 Technical Committee document 03-111r0. May 2, 2003. Retrieved August 29, 2013.[ permanent dead link ]
  4. Martin K. Petersen (2009). "Linux Data Integrity Project" . Retrieved August 29, 2013.
  5. Martin K. Petersen (January 3, 2008). "Proactively Preventing Data Corruption" (PDF). Enterprise Open Source Magazine. Retrieved August 29, 2013.
  6. Safeguarding Data From Corruption - Technology Paper. PDF, Seagate, 2011
  7. EMC Corporation (September 18, 2012). "An Integrated End-to-End Data Integrity Solution to Protect Against Silent Data Corruption" (PDF). White paper. Oracle Corporation. Retrieved August 29, 2013.