Data degradation

Last updated

Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. It is also referred to as data decay, data rot or bit rot. [1] This results in a decline in data quality over time, even when the data is not being utilized. The concept of data degradation involves progressively minimizing data in interconnected processes, where data is used for multiple purposes at different levels of detail. At specific points in the process chain, data is irreversibly reduced to a level that remains sufficient for the successful completion of the following steps [2]

Contents

Primary storages

Data degradation in dynamic random-access memory (DRAM) can occur when the electric charge of a bit in DRAM disperses, possibly altering program code or stored data. DRAM may be altered by cosmic rays [3] or other high-energy particles. Such data degradation is known as a soft error. [4] ECC memory can be used to mitigate this type of data degradation. [5]

Secondary storages

Data degradation results from the gradual decay of storage media over the course of years or longer. Causes vary by medium:

Solid-state media
EPROMs, flash memory and other solid-state drive store data using electrical charges, which can slowly leak away due to imperfect insulation. Modern flash controller chips account for this leak by trying several lower threshold voltages (until ECC passes), prolonging the age of data. Multi-level cells with much lower distance between voltage levels cannot be considered stable without this functionality. [6]
The chip itself is not affected by this, so reprogramming it approximately once per decade prevents decay. An undamaged copy of the master data is required for the reprogramming. A checksum can be used to assure that the on-chip data is not yet damaged and ready for reprogramming.
The typical SD card, USB stick and M.2 NVMe all have a limited endurance. Power on can usually recover data[ citation needed ] but error rates will eventually degrade the media to illegibility. Writing zeros to a degraded NAND device can revive the storage to close to new condition for further use. Refresh cycles should be no longer than 6 months to be sure the device is legible.
Magnetic media
Magnetic media, such as hard disk drives, floppy disks and magnetic tapes, may experience data decay as bits lose their magnetic orientation. Higher temperature speeds up the rate of magnetic loss. As with solid-state media, re-writing is useful as long as the medium itself is not damaged (see below). [7] Modern hard drives use Giant magnetoresistance and have a higher magnetic lifespan on the order of decades. They also automatically correct any errors detected by ECC through rewriting. The reliance on a factory servo track can complicate data recovery if it becomes unrecoverable, however.
Floppy disks and tapes are poorly protected against ambient air. In warm/humid conditions, they are prone to the physical decomposition of the storage medium. [8] [7]
Optical media
Optical media such as CD-R, DVD-R and BD-R, may experience data decay from the breakdown of the storage medium. This can be mitigated by storing discs in a dark, cool, low humidity location. "Archival quality" discs are available with an extended lifetime, but are still not permanent. However, data integrity scanning that measures the rates of various types of errors is able to predict data decay on optical media well ahead of uncorrectable data loss occurring. [9]
Both the disc dye and the disc backing layer are potentially susceptible to breakdown. Early cyanine-based dyes used in CD-R were notorious for their lack of UV stability. Early CDs also suffered from CD bronzing, and is related to a combination of bad lacquer material and failure of the aluminum reflection layer. [10] Later discs use more stable dyes or forgo them for an inorganic mixture. The aluminum layer is also commonly swapped out for gold or silver alloy.
Paper media
Paper media, such as punched cards and punched tape, may literally rot. Mylar punched tape is another approach that does not rely on electromagnetic stability. Degradation of books and printing paper is primarily driven by acid hydrolysis of glycosidic bonds within the cellulose molecule as well as by oxidation; [11] degradation of paper is accelerated by high relative humidity, high temperature, as well as by exposure to acids, oxygen, light, and various pollutants, including various volatile organic compounds and nitrogen dioxide. [12]
Streaming Media
Data degradation in streaming media acquisition modules, as addressed by the repair algorithms, reflects real-time data quality issues caused by device limitations. However, a more general form of data degradation refers to the gradual decay of storage media over extended periods, influenced by factors like physical wear, environmental conditions, or technological obsolescence. Causes of such degradation can vary depending on the medium, such as magnetic fields in hard drives, moisture or temperature for tape storage, or electronic failure over time. [13]

Example

Below are several digital images illustrating data degradation, all consisting of 326,272 bits. The original photo is displayed first. In the next image, a single bit was changed from 0 to 1. In the next two images, two and three bits were flipped. On Linux systems, the binary difference between files can be revealed using cmp command (e.g. cmp -b bitrot-original.jpg bitrot-1bit-changed.jpg).

Causes

This deterioration can be caused by a variety of factors that impact the reliability and integrity of digital information, including physical factors, software errors, security breaches, human error, obsolete technology, and unauthorized access incidents. [14] [15] [16] [17]

Hardware failures

Most disk, disk controller and higher-level systems are subject to a slight chance of unrecoverable failure. With ever-growing disk capacities, file sizes, and increases in the amount of data stored on a disk, the likelihood of the occurrence of data decay and other forms of uncorrected and undetected data corruption increases. [18]

Low-level disk controllers typically employ error correction codes (ECC) to correct erroneous data. [19]

Higher-level software systems may be employed to mitigate the risk of such underlying failures by increasing redundancy and implementing integrity checking, error correction codes and self-repairing algorithms. [20] The ZFS file system was designed to address many of these data corruption issues. [21] The Btrfs file system also includes data protection and recovery mechanisms, [22] as does ReFS. [23]

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Disk storage</span> General category of storage mechanisms

Disk storage is a data storage mechanism based on a rotating disk. The recording employs various electronic, magnetic, optical, or mechanical changes to the disk's surface layer. A disk drive is a device implementing such a storage mechanism. Notable types are hard disk drives (HDD), containing one or more non-removable rigid platters; the floppy disk drive (FDD) and its removable floppy disk; and various optical disc drives (ODD) and associated optical disc media.

<span class="mw-page-title-main">Error detection and correction</span> Techniques that enable reliable delivery of digital data over unreliable communication channels

In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases.

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

RAID is a data storage virtualization technology that combines multiple physical data storage components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives known as single large expensive disk (SLED).

In computing, a block, sometimes called a physical record, is a sequence of bytes or bits, usually containing some whole number of records, having a maximum length; a block size. Data thus structured are said to be blocked. The process of putting data into blocks is called blocking, while deblocking is the process of extracting data from blocks. Blocked data is normally stored in a data buffer, and read or written a whole block at a time. Blocking reduces the overhead and speeds up the handling of the data stream. For some devices, such as magnetic tape and CKD disk devices, blocking reduces the amount of external storage required for the data. Blocking is almost universally employed when storing data to 9-track magnetic tape, NAND flash memory, and rotating media such as floppy disks, hard disks, and optical discs.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of IT disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

<span class="mw-page-title-main">Digital obsolescence</span> Data loss as the format goes into disuse

Digital obsolescence is the risk of data loss because of inabilities to access digital assets, due to the hardware or software required for information retrieval being repeatedly replaced by newer devices and systems, resulting in increasingly incompatible formats. While the threat of an eventual "digital dark age" was initially met with little concern until the 1990s, modern digital preservation efforts in the information and archival fields have implemented protocols and strategies such as data migration and technical audits, while the salvage and emulation of antiquated hardware and software address digital obsolescence to limit the potential damage to long-term information access.

<span class="mw-page-title-main">Data corruption</span> Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

Write once read many (WORM) describes a data storage device in which information, once written, cannot be modified. This write protection affords the assurance that the data cannot be tampered with once it is written to the device, excluding the possibility of data loss from human error, computer bugs, or malware.

Preservation of documents, pictures, recordings, digital content, etc., is a major aspect of archival science. It is also an important consideration for people who are creating time capsules, family history, historical documents, scrapbooks and family trees. Common storage media are not permanent, and there are few reliable methods of preserving documents and pictures for the future.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a complete copy of the actual data, or only select pieces of data that allow detection of errors and reconstruction of lost or damaged data up to a certain level.

<span class="mw-page-title-main">Bad sector</span> Sector of a disk storage unit which is unreadable

A bad sector in computing is a disk sector on a disk storage unit that is unreadable. Upon taking damage, all information stored on that sector is lost. When a bad sector is found and marked, the operating system like Windows or Linux will skip it in the future. Bad sectors are a threat to information security in the sense of data remanence.

Disk encryption is a technology which protects information by converting it into code that cannot be deciphered easily by unauthorized people or processes. Disk encryption uses disk encryption software or hardware to encrypt every bit of data that goes on a disk or disk volume. It is used to prevent unauthorized access to data storage.

<span class="mw-page-title-main">Disk sector</span> Logical or physical division of storage media

In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. For most disks, each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs), and 2048 bytes for CD-ROMs, DVD-ROMs and BD-ROMs. Newer HDDs and SSDs use 4096 byte (4 KiB) sectors, which are known as the Advanced Format (AF).

The preservation of optical media is essential because it is a resource in libraries, and stores audio, video, and computer data. While optical discs are generally more reliable and durable than older media types, environmental conditions and/or poor handling can result in lost information.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was created by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

In digital storage, a Medium Error is a class of errors that a storage device can experience, which imply that a physical problem was encountered when trying to access the device. The word "medium" refers to the physical storage layer, the medium on which the data is stored; as opposed to errors related to e.g. protocol, device/controller/driver state, etc.

References

  1. Rouse, Margaret (25 March 2020). "What is Bit Rot?". Techopedia Dictionary. Retrieved 10 April 2024.
  2. Zaman, Rashid; Hassani, Marwan (July 2020). "On Enabling GDPR Compliance in Business Processes Through Data-Driven Solutions". SN Computer Science. 1 (4). doi: 10.1007/s42979-020-00215-x . ISSN   2662-995X.
  3. "The Invisible Neutron Threat | National Security Science Magazine". Los Alamos National Laboratory. Retrieved 2020-03-13.
  4. O'Gorman, T. J.; Ross, J. M.; Taber, A. H.; Ziegler, J. F.; Muhlfeld, H. P.; Montrose, C. J.; Curtis, H. W.; Walsh, J. L. (January 1996). "Field testing for cosmic ray soft errors in semiconductor memories". IBM Journal of Research and Development. 40 (1): 41–50. doi:10.1147/rd.401.0041.
  5. Single Event Upset at Ground Level, Eugene Normand, Member, IEEE, Boeing Defense & Space Group, Seattle, WA 98124-2499
  6. Li, Qianhui; Wang, Qi; Yang, Liu; Yu, Xiaolei; Jiang, Yiyang; He, Jing; Huo, Zongliang (April 2022). "Optimal read voltages decision scheme eliminating read retry operations for 3D NAND flash memories". Microelectronics Reliability. 131: 114509. Bibcode:2022MiRe..13114509L. doi:10.1016/j.microrel.2022.114509.
  7. 1 2 "Preserving magnetic media". National Archives of Australia. Retrieved 3 November 2020. High temperature and humidity and fluctuations may cause the magnetic and base layers in a reel of tape to separate, or cause adjacent loops to block together. High temperatures may also weaken the magnetic signal, and ultimately de-magnetise the magnetic layer.
  8. Riss, Dan (July 1993). "Conserve O Gram (number 19/8) Preservation Of Magnetic Media" (PDF). nps.gov. Harpers Ferry, West Virginia: National Park Service / Department of the Interior (US). p. 2. The longevity of magnetic media is most seriously affected by processes that attack the binder resin. Moisture from the air is absorbed by the binder and reacts with the resin. The result is a gummy residue that can deposit on tape heads and cause tape layers to stick together. Reaction with moisture also can result in breaks in the long molecular chains of the binder. This weakens the physical properties of the binder and can result in a lack of adhesion to the backing. These reactions are greatly accelerated by the presence of acids. Typical sources would be the usual pollutant gases in the air, such as sulphur dioxide (SO2) and nitrous oxides (NOx), which react with moist air to form acids. Though acid inhibitors are usually built into the binder layer, over time they can lose their effectiveness.
  9. "QPxTool glossary". qpxtool.sourceforge.io. QPxTool. 2008-08-01. Retrieved 22 July 2020.
  10. "Bronzed CD alert!". IASA Information Bulletin no. 22. July 1997. Archived from the original on 22 July 2011. Retrieved 3 August 2007.
  11. Małachowska, Edyta; Pawcenis, Dominika; Dańczak, Jacek; Paczkowska, Joanna; Przybysz, Kamila (26 March 2021). "Paper Ageing: The Effect of Paper Chemical Composition on Hydrolysis and Oxidation". Polymers. 13 (7): 1029. doi: 10.3390/polym13071029 . PMC   8036582 . PMID   33810293.
  12. Menart, Eva; De Bruin, Gerrit; Strlič, Matija (9 September 2011). "Dose–response functions for historic paper" (PDF). Polymer Degradation and Stability. 96 (12): 2029–2039. doi:10.1016/j.polymdegradstab.2011.09.002 . Retrieved 5 June 2023.
  13. Yu, Wenwu; Jiang, Jingjing; Zhai, Yue; Xu, Peng (2022-05-20). Rajakani, Kalidoss (ed.). "Perceived Integrity of Distributed Streaming Media Based on AWTC-TT Algorithm Optimization". Wireless Communications and Mobile Computing. 2022: 1–17. doi: 10.1155/2022/7522174 . ISSN   1530-8677.
  14. Sheng Lance, Li (22 July 2015). "What is data decay?". Tech in Asia . Retrieved 10 April 2024.
  15. "Definition of data degradation". PC Magazine . Retrieved 10 April 2024.
  16. Hakob, Mike (27 December 2023). "Data Decay: What are the Causes?". FormStory. Retrieved 10 April 2024.
  17. Triches, Robert (16 March 2006). "Forskare: Billiga cd-skivor håller bara i två år". Aftonbladet . Retrieved 10 April 2024.
  18. Gray, Jim; van Ingen, Catharine (December 2005). "Empirical Measurements of Disk Failure Rates and Error Rates" (PDF). Microsoft Research Technical Report MSR-TR-2005-166. Retrieved 4 March 2013.
  19. "ECC and Spare Blocks help to keep Kingston SSD data protected from errors". Kingston Technology Company. Retrieved 30 March 2021.
  20. Salter, Jim (15 January 2014). "Bitrot and atomic COWs: Inside "next-gen" filesystems". Ars Technica. Archived from the original on 6 March 2015. Retrieved 15 January 2014.
  21. Bonwick, Jeff. "ZFS: The Last Word in File Systems" (PDF). Storage Networking Industry Association (SNIA). Archived from the original (PDF) on 21 September 2013. Retrieved 4 March 2013.
  22. "btrfs Wiki: Features". The btrfs Project. Retrieved 19 September 2013.
  23. Wlodarz, Derrick (15 January 2014). "Windows Storage Spaces and ReFS: is it time to ditch RAID for good?". Betanews. Retrieved 2014-02-09.