The Gutmann method is an algorithm for securely erasing the contents of computer hard disk drives, such as files. Devised by Peter Gutmann and Colin Plumb and presented in the paper Secure Deletion of Data from Magnetic and Solid-State Memory in July 1996, it involved writing a series of 35 patterns over the region to be erased.
The selection of patterns assumes that the user does not know the encoding mechanism used by the drive, so it includes patterns designed specifically for three types of drives. A user who knows which type of encoding the drive uses can choose only those patterns intended for their drive. A drive with a different encoding mechanism would need different patterns.
Most of the patterns in the Gutmann method were designed for older MFM/RLL encoded disks. Gutmann himself has noted that more modern drives no longer use these older encoding techniques, making parts of the method irrelevant. He said "In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques". [1] [2]
Since about 2001, some ATA IDE and SATA hard drive manufacturer designs include support for the ATA Secure Erase standard, obviating the need to apply the Gutmann method when erasing an entire drive. [3] The Gutmann method does not apply to USB sticks: a 2011 study reports that 71.7% of data remained available. On solid state drives it resulted in 0.8–4.3% recovery. [4]
The delete function in most operating systems simply marks the space occupied by the file as reusable (removes the pointer to the file) without immediately removing any of its contents. At this point the file can be fairly easily recovered by numerous recovery applications. However, once the space is overwritten with other data, there is no known way to use software to recover it. It cannot be done with software alone since the storage device only returns its current contents via its normal interface. Gutmann claims that intelligence agencies have sophisticated tools, including magnetic force microscopes, which together with image analysis, can detect the previous values of bits on the affected area of the media (for example hard disk). This claim however seems to be invalid based on the thesis "Data Reconstruction from a Hard Disk Drive using Magnetic Force Microscopy". [5]
An overwrite session consists of a lead-in of four random write patterns, followed by patterns 5 to 31 (see rows of table below), executed in a random order, and a lead-out of four more random patterns.
Each of patterns 5 to 31 was designed with a specific magnetic media encoding scheme in mind, which each pattern targets. The drive is written to for all the passes even though the table below only shows the bit patterns for the passes that are specifically targeted at each encoding scheme. The end result should obscure any data on the drive so that only the most advanced physical scanning (e.g., using a magnetic force microscope) of the drive is likely to be able to recover any data.
The series of patterns is as follows:
Pass | Data written | Pattern written to disk for targeted encoding scheme | |||
---|---|---|---|---|---|
In binary notation | In hex notation | (1,7) RLL | (2,7) RLL | MFM | |
1 | (Random) | (Random) | |||
2 | (Random) | (Random) | |||
3 | (Random) | (Random) | |||
4 | (Random) | (Random) | |||
5 | 01010101 01010101 01010101 | 55 55 55 | 100 ... | 000 1000 ... | |
6 | 10101010 10101010 10101010 | AA AA AA | 00 100 ... | 0 1000 ... | |
7 | 10010010 01001001 00100100 | 92 49 24 | 00 100000 ... | 0 100 ... | |
8 | 01001001 00100100 10010010 | 49 24 92 | 0000 100000 ... | 100 100 ... | |
9 | 00100100 10010010 01001001 | 24 92 49 | 100000 ... | 00 100 ... | |
10 | 00000000 00000000 00000000 | 00 00 00 | 101000 ... | 1000 ... | |
11 | 00010001 00010001 00010001 | 11 11 11 | 0 100000 ... | ||
12 | 00100010 00100010 00100010 | 22 22 22 | 00000 100000 ... | ||
13 | 00110011 00110011 00110011 | 33 33 33 | 10 ... | 1000000 ... | |
14 | 01000100 01000100 01000100 | 44 44 44 | 000 100000 ... | ||
15 | 01010101 01010101 01010101 | 55 55 55 | 100 ... | 000 1000 ... | |
16 | 01100110 01100110 01100110 | 66 66 66 | 0000 100000 ... | 000000 10000000 ... | |
17 | 01110111 01110111 01110111 | 77 77 77 | 100010 ... | ||
18 | 10001000 10001000 10001000 | 88 88 88 | 00 100000 ... | ||
19 | 10011001 10011001 10011001 | 99 99 99 | 0 100000 ... | 00 10000000 ... | |
20 | 10101010 10101010 10101010 | AA AA AA | 00 100 ... | 0 1000 ... | |
21 | 10111011 10111011 10111011 | BB BB BB | 00 101000 ... | ||
22 | 11001100 11001100 11001100 | CC CC CC | 0 10 ... | 0000 10000000 ... | |
23 | 11011101 11011101 11011101 | DD DD DD | 0 101000 ... | ||
24 | 11101110 11101110 11101110 | EE EE EE | 0 100010 ... | ||
25 | 11111111 11111111 11111111 | FF FF FF | 0 100 ... | 000 100000 ... | |
26 | 10010010 01001001 00100100 | 92 49 24 | 00 100000 ... | 0 100 ... | |
27 | 01001001 00100100 10010010 | 49 24 92 | 0000 100000 ... | 100 100 ... | |
28 | 00100100 10010010 01001001 | 24 92 49 | 100000 ... | 00 100 ... | |
29 | 01101101 10110110 11011011 | 6D B6 DB | 0 100 … | ||
30 | 10110110 11011011 01101101 | B6 DB 6D | 100 … | ||
31 | 11011011 01101101 10110110 | DB 6D B6 | 00 100 … | ||
32 | (Random) | (Random) | |||
33 | (Random) | (Random) | |||
34 | (Random) | (Random) | |||
35 | (Random) | (Random) |
Encoded bits shown in bold are what should be present in the ideal pattern, although due to the encoding the complementary bit is actually present at the start of the track.
Daniel Feenberg of the National Bureau of Economic Research, an American private nonprofit research organization, criticized Gutmann's claim that intelligence agencies are likely to be able to read overwritten data, citing a lack of evidence for such claims. He finds that Gutmann cites one non-existent source and sources that do not actually demonstrate recovery, only partially-successful observations. The definition of "random" is also quite different from the usual one used: Gutmann expects the use of pseudorandom data with sequences known to the recovering side, not an unpredictable one such as a cryptographically secure pseudorandom number generator. [6]
Nevertheless, some published government security procedures consider a disk overwritten once to still be sensitive. [7]
Gutmann himself has responded to some of these criticisms and also criticized how his algorithm has been abused in an epilogue to his original paper, in which he states: [1] [2]
In the time since this paper was published, some people have treated the 35-pass overwrite technique described in it more as a kind of voodoo incantation to banish evil spirits than the result of a technical analysis of drive encoding techniques. As a result, they advocate applying the voodoo to PRML and EPRML drives even though it will have no more effect than a simple scrubbing with random data. In fact performing the full 35-pass overwrite is pointless for any drive since it targets a blend of scenarios involving all types of (normally-used) encoding technology, which covers everything back to 30+-year-old MFM methods (if you don't understand that statement, re-read the paper). If you're using a drive which uses encoding technology X, you only need to perform the passes specific to X, and you never need to perform all 35 passes. For any modern PRML/EPRML drive, a few passes of random scrubbing is the best you can do. As the paper says, "A good scrubbing with random data will do about as well as can be expected". This was true in 1996, and is still true now.
— Peter Gutmann, Secure Deletion of Data from Magnetic and Solid-State Memory, University of Auckland Department of Computer Science
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.
In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted.
Disk formatting is the process of preparing a data storage device such as a hard disk drive, solid-state drive, floppy disk, memory card or USB flash drive for initial use. In some cases, the formatting operation may also create one or more new file systems. The first part of the formatting process that performs basic medium preparation is often referred to as "low-level formatting". Partitioning is the common term for the second part of the process, dividing the device into several sub-devices and, in some cases, writing information to the device allowing an operating system to be booted from it. The third part of the process, usually termed "high-level formatting" most often refers to the process of generating a new file system. In some operating systems all or parts of these three processes can be combined or repeated at different levels and the term "format" is understood to mean an operation in which a new disk medium is fully prepared to store files. Some formatting utilities allow distinguishing between a quick format, which does not erase all existing data and a long option that does erase all existing data.
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.
Non-volatile memory (NVM) or non-volatile storage is a type of computer memory that can retain stored information even after power is removed. In contrast, volatile memory needs constant power in order to retain data.
Degaussing is the process of decreasing or eliminating a remnant magnetic field. It is named after the gauss, a unit of magnetism, which in turn was named after Carl Friedrich Gauss. Due to magnetic hysteresis, it is generally not possible to reduce a magnetic field completely to zero, so degaussing typically induces a very small "known" field referred to as bias. Degaussing was originally applied to reduce ships' magnetic signatures during World War II. Degaussing is also used to reduce magnetic fields in cathode ray tube monitors and to destroy data held on magnetic storage.
Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment.
File deletion is the removal of a file from a computer's file system.
Undeletion is a feature for restoring computer files which have been removed from a file system by file deletion. Deleted data can be recovered on many file systems, but not all file systems provide an undeletion feature. Recovering data without an undeletion facility is usually called data recovery, rather than undeletion. Undeletion can both help prevent users from accidentally losing data, or can pose a computer security risk, since users may not be aware that deleted files remain accessible.
In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).
A backup rotation scheme is a system of backing up data to computer media that minimizes, by re-use, the number of media used. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it. Different techniques have evolved over time to balance data retention and restoration needs with the cost of extra data storage media. Such a scheme can be quite complicated if it takes incremental backups, multiple retention periods, and off-site storage into consideration.
Peter Claus Gutmann is a computer scientist in the Department of Computer Science at the University of Auckland, Auckland, New Zealand. He has a Ph.D. in computer science from the University of Auckland. His Ph.D. thesis and a book based on the thesis were about a cryptographic security architecture. He is interested in computer security issues, including security architecture, security usability, and hardware security; he has discovered several flaws in publicly released cryptosystems and protocols. He is the developer of the cryptlib open source software security library and contributed to PGP version 2. In 1994 he developed the Secure FileSystem (SFS). He is also known for his analysis of data deletion on electronic memory media, magnetic and otherwise, and devised the Gutmann method for erasing data from a hard drive more or less securely. Having lived in New Zealand for some time, he has written on such subjects as weta, and the Auckland power crisis of 1998, during which the electrical power system failed completely in the central city for five weeks, which he has blogged about. He has also written on his career as an "arms courier" for New Zealand, detailing the difficulties faced in complying with customs control regulations with respect to cryptographic products, which were once classed as "munitions" by various jurisdictions including the United States.
Hardware-based full disk encryption (FDE) is available from many hard disk drive (HDD/SSD) vendors, including: Hitachi, Integral Memory, iStorage Limited, Micron, Seagate Technology, Samsung, Toshiba, Viasat UK, Western Digital. The symmetric encryption key is maintained independently from the computer's CPU, thus allowing the complete data store to be encrypted and removing computer memory as a potential attack vector.
Data erasure is a software-based method of data sanitization that aims to completely destroy all electronic data residing on a hard disk drive or other digital media by overwriting data onto all sectors of the device in an irreversible process. By overwriting the data on the storage device, the data is rendered irrecoverable.
shred is a command on Unix-like operating systems that can be used to securely delete files and devices so that it is extremely difficult to recover them, even with specialized hardware and technology; assuming recovery is possible at all, which is not always the case. It is a part of GNU Core Utilities. Being based on the Gutmann method paper, it suffers from the same criticisms and possible shortcomings.
A trim command allows an operating system to inform a solid-state drive (SSD) which blocks of data are no longer considered to be "in use" and therefore can be erased internally.
HMG Infosec Standard 5, or IS5, is a data destruction standard used by the British government.
nwipe is a Linux computer program used to securely erase data. It is maintained by Martijn van Brummelen and is free software, released under the GNU General Public License 2.0 licence. The program is a fork of the dwipe program that was previously incorporated in the DBAN secure erase disk.
Data sanitization involves the secure and permanent erasure of sensitive data from datasets and media to guarantee that no residual data can be recovered even through extensive forensic analysis. Data sanitization has a wide range of applications but is mainly used for clearing out end-of-life electronic devices or for the sharing and use of large datasets that contain sensitive information. The main strategies for erasing personal data from devices are physical destruction, cryptographic erasure, and data erasure. While the term data sanitization may lead some to believe that it only includes data on electronic media, the term also broadly covers physical media, such as paper copies. These data types are termed soft for electronic files and hard for physical media paper copies. Data sanitization methods are also applied for the cleaning of sensitive data, such as through heuristic-based methods, machine-learning based methods, and k-source anonymity.