A hard disk drive failure occurs when a hard disk drive malfunctions and the stored information cannot be accessed with a properly configured computer.
A hard disk failure may occur in the course of normal operation, or due to an external factor such as exposure to fire or water or high magnetic fields, or suffering a sharp impact or environmental contamination, which can lead to a head crash.
The stored information on a hard drive may also be rendered inaccessible as a result of data corruption, disruption or destruction of the hard drive's master boot record, or by malware deliberately destroying the disk's contents.
There are a number of causes for hard drives to fail including: human error, hardware failure, firmware corruption, media damage, heat, water damage, power issues and mishaps. [1] Drive manufacturers typically specify a mean time between failures (MTBF) or an annualized failure rate (AFR) which are population statistics that can't predict the behavior of an individual unit. [2] These are calculated by constantly running samples of the drive for a short period of time, analyzing the resultant wear and tear upon the physical components of the drive, and extrapolating to provide a reasonable estimate of its lifespan. Hard disk drive failures tend to follow the concept of the bathtub curve. [3] Drives typically fail within a short time if there is a defect present from manufacturing. If a drive proves reliable for a period of a few months after installation, the drive has a significantly greater chance of remaining reliable. Therefore, even if a drive is subjected to several years of heavy daily use, it may not show any notable signs of wear unless closely inspected. On the other hand, a drive can fail at any time in many different situations.
The most notorious cause of drive failure is a head crash, where the internal read-and-write head of the device, usually just hovering above the surface, touches a platter, or scratches the magnetic data-storage surface. A head crash usually incurs severe data loss, and data recovery attempts may cause further damage if not done by a specialist with proper equipment. Drive platters are coated with an extremely thin layer of non-electrostatic lubricant, so that the read-and-write head will likely simply glance off the surface of the platter should a collision occur. However, this head hovers mere nanometers from the platter's surface which makes a collision an acknowledged risk.
Another cause of failure is a faulty air filter. The air filters on today's drives equalize the atmospheric pressure and moisture between the drive enclosure and its outside environment. If the filter fails to capture a dust particle, the particle can land on the platter, causing a head crash if the head happens to sweep over it. After a head crash, particles from the damaged platter and head media can cause one or more bad sectors. These, in addition to platter damage, will quickly render a drive useless.
A drive also includes controller electronics, which occasionally fail. In such cases, it may be possible to recover all data by replacing the controller board.
Failure of a hard disk drive can be catastrophic or gradual. The former typically presents as a drive that can no longer be detected by CMOS setup, or that fails to pass BIOS POST so that the operating system never sees it. Gradual hard-drive failure can be harder to diagnose, because its symptoms, such as corrupted data and slowing down of the PC (caused by gradually failing areas of the hard drive requiring repeated read attempts before successful access), can be caused by many other computer issues, such as malware. A rising number of bad sectors can be a sign of a failing hard drive, but because the hard drive automatically adds them to its own growth defect table, [4] they may not become evident to utilities such as ScanDisk unless the utility can catch them before the hard drive's defect management system does, or the backup sectors held in reserve by the internal hard-drive defect management system run out (by which point the drive is on the point of failing outright). A cyclical repetitive pattern of seek activity such as rapid or slower seek-to-end noises (click of death) can be indicative of hard drive problems. [5]
During normal operation, heads in HDDs fly above the data recorded on the disks. Modern HDDs prevent power interruptions or other malfunctions from landing its heads in the data zone by either physically moving (parking) the heads to a special landing zone on the platters that is not used for data storage, or by physically locking the heads in a suspended (unloaded) position raised off the platters. Some early PC HDDs did not park the heads automatically when power was prematurely disconnected and the heads would land on data. In some other early units the user would run a program to manually park the heads.
A landing zone is an area of the platter usually near its inner diameter (ID), where no data is stored. This area is called the Contact Start/Stop (CSS) zone, or the landing zone. Disks are designed such that either a spring or, more recently, rotational inertia in the platters is used to park the heads in the case of unexpected power loss. In this case, the spindle motor temporarily acts as a generator, providing power to the actuator.
Spring tension from the head mounting constantly pushes the heads towards the platter. While the disk is spinning, the heads are supported by an air bearing and experience no physical contact or wear. In CSS drives the sliders carrying the head sensors (often also just called heads) are designed to survive a number of landings and takeoffs from the media surface, though wear and tear on these microscopic components eventually takes its toll. Most manufacturers design the sliders to survive 50,000 contact cycles before the chance of damage on startup rises above 50%. However, the decay rate is not linear: when a disk is younger and has had fewer start-stop cycles, it has a better chance of surviving the next startup than an older, higher-mileage disk (as the head literally drags along the disk's surface until the air bearing is established). For example, the Seagate Barracuda 7200.10 series of desktop hard disk drives are rated to 50,000 start–stop cycles; in other words, no failures attributed to the head–platter interface were seen before at least 50,000 start–stop cycles during testing. [6]
Around 1995 IBM pioneered a technology where a landing zone on the disk is made by a precision laser process (Laser Zone Texture = LZT) producing an array of smooth nanometer-scale "bumps" in a landing zone, [7] thus vastly improving stiction and wear performance. This technology is still in use today, predominantly in lower-capacity Seagate desktop drives, [8] but has been phased out in 2.5" drives, as well as higher-capacity desktop, NAS, and enterprise drives in favor of load/unload ramps. In general, CSS technology can be prone to increased stiction (the tendency for the heads to stick to the platter surface), e.g. as a consequence of increased humidity. Excessive stiction can cause physical damage to the platter and slider or spindle motor.
Load/unload technology relies on the heads being lifted off the platters into a safe location, thus eliminating the risks of wear and stiction altogether. The first HDD RAMAC and most early disk drives used complex mechanisms to load and unload the heads. Nearly all modern HDDs use ramp loading, first introduced by Memorex in 1967, [9] to load/unload onto plastic "ramps" near the outer disk edge. Laptop drives adopted this due to the need for increased shock resistance, and then ultimately it was adopted on most desktop drives.
Addressing shock robustness, IBM also created a technology for their ThinkPad line of laptop computers called the Active Protection System. When a sudden, sharp movement is detected by the built-in accelerometer in the ThinkPad, internal hard disk heads automatically unload themselves to reduce the risk of any potential data loss or scratch defects. Apple later also utilized this technology in their PowerBook, iBook, MacBook Pro, and MacBook line, known as the Sudden Motion Sensor. Sony, [10] HP with their HP 3D DriveGuard, [11] and Toshiba [12] have released similar technology in their notebook computers.
Hard drives may fail in a number of ways. Failure may be immediate and total, progressive, or limited. Data may be totally destroyed, or partially or totally recoverable.
Earlier drives had a tendency toward developing bad sectors with use and wear; these bad sectors could be "mapped out" so they were not used and did not affect operation of a drive, and this was considered normal unless many bad sectors developed in a short period of time. Some early drives even had a table attached to a drive's case on which bad sectors were to be listed as they appeared. [13] Later drives map out bad sectors automatically, in a way invisible to the user; a drive with remapped sectors may continue to be used, though performance may decrease as the drive must physically move the heads to the remapped sector. Statistics and logs available through S.M.A.R.T. (Self-Monitoring, Analysis, and Reporting Technology) provide information about the remapping. In modern HDDs, each drive ships with zero user-visible bad sectors, and any bad/reallocated sectors may predict the impending failure of a drive.
Other failures, which may be either progressive or limited, are usually considered to be a reason to replace a drive; the value of data potentially at risk usually far outweighs the cost saved by continuing to use a drive which may be failing. Repeated but recoverable read or write errors, unusual noises, excessive and unusual heating, and other abnormalities, are warning signs.
Most major hard disk and motherboard vendors support S.M.A.R.T, which measures drive characteristics such as operating temperature, spin-up time, data error rates, etc. Certain trends and sudden changes in these parameters are thought to be associated with increased likelihood of drive failure and data loss. However, S.M.A.R.T. parameters alone may not be useful for predicting individual drive failures. [16] While several S.M.A.R.T. parameters affect failure probability, a large fraction of failed drives do not produce predictive S.M.A.R.T. parameters. [16] Unpredictable breakdown may occur at any time in normal use, with potential loss of all data. Recovery of some or even all data from a damaged drive is sometimes, but not always possible, and is normally costly.
A 2007 study published by Google suggested very little correlation between failure rates and either high temperature or activity level. Indeed, the Google study indicated that "one of our key findings has been the lack of a consistent pattern of higher failure rates for higher temperature drives or for those drives at higher utilization levels.". [17] Hard drives with S.M.A.R.T.-reported average temperatures below 27 °C (81 °F) had higher failure rates than hard drives with the highest reported average temperature of 50 °C (122 °F), failure rates at least twice as high as the optimum S.M.A.R.T.-reported temperature range of 36 °C (97 °F) to 47 °C (117 °F). [16] The correlation between manufacturers, models and the failure rate was relatively strong. Statistics in this matter are kept highly secret by most entities; Google did not relate manufacturers' names with failure rates, [16] though it has been revealed that Google uses Hitachi Deskstar drives in some of its servers. [18]
Google's 2007 study found, based on a large field sample of drives, that actual annualized failure rates (AFRs) for individual drives ranged from 1.7% for first year drives to over 8.6% for three-year-old drives. [19] A similar 2007 study at CMU on enterprise drives showed that measured MTBF was 3–4 times lower than the manufacturer's specification, with an estimated 3% mean AFR over 1–5 years based on replacement logs for a large sample of drives, and that hard drive failures were highly correlated in time. [20]
A 2007 study of latent sector errors (as opposed to the above studies of complete disk failures) showed that 3.45% of 1.5 million disks developed latent sector errors over 32 months (3.15% of nearline disks and 1.46% of enterprise class disks developed at least one latent sector error within twelve months of their ship date), with the annual sector error rate increasing between the first and second years. Enterprise drives showed less sector errors than consumer drives. Background scrubbing was found to be effective in correcting these errors. [21]
SCSI, SAS, and FC drives are more expensive than consumer-grade SATA drives, and usually used in servers and disk arrays, where SATA drives were sold to the home computer and desktop and near-line storage market and were perceived to be less reliable. This distinction is now becoming blurred.
The mean time between failures (MTBF) of SATA drives is usually specified to be about 1 million hours. Some drives such as Western Digital Raptor have rated 1.4 million hours MTBF, [22] while SAS/FC drives are rated for upwards of 1.6 million hours. [23] Modern helium-filled drives are completely sealed without a breather port, thus eliminating the risk of debris ingression, resulting in a typical MTBF of 2.5 million hours. However, independent research indicates that MTBF is not a reliable estimate of a drive's longevity (service life). [24] MTBF is conducted in laboratory environments in test chambers and is an important metric to determine the quality of a disk drive, but is designed to only measure the relatively constant failure rate over the service life of the drive (the middle of the "bathtub curve") before final wear-out phase. [20] [25] [26] A more interpretable, but equivalent, metric to MTBF is annualized failure rate (AFR). AFR is the percentage of drive failures expected per year. Both AFR and MTBF tend to measure reliability only in the initial part of the life of a hard disk drive thereby understating the real probability of failure of a used drive. [27] Server and industrial drives usually have higher MTBF and lower AFR.
The cloud storage company Backblaze produces an annual report into hard drive reliability. However, the company states that it mainly uses commodity consumer drives, which are deployed in enterprise conditions, rather than in their representative conditions and for their intended use. Consumer drives are also not tested to work with enterprise RAID cards of the kind used in a datacenter, and may not respond in the time a RAID controller expects; such cards will be identified as having failed when they have not. [28] The result of tests of this kind may be relevant or irrelevant to different users, since they accurately represent the performance of consumer drives in the enterprise or under extreme stress, but may not accurately represent their performance in normal or intended use. [29]
In order to avoid the loss of data due to disk failure, common solutions include:
Data from a failed drive can sometimes be partially or totally recovered if the platters' magnetic coating is not totally destroyed. Specialized companies carry out data recovery, at significant cost. It may be possible to recover data by opening the drives in a clean room and using appropriate equipment to replace or revitalize failed components. [35] If the electronics have failed, it is sometimes possible to replace the electronics board, though often drives of nominally exactly the same model manufactured at different times have different circuit boards that are incompatible. Moreover, electronics boards of modern drives usually contain drive-specific adaptation data required for accessing their system areas, so the related componentry needs to be either reprogrammed (if possible) or unsoldered and transferred between two electronics boards. [36] [37] [38]
Sometimes operation can be restored for long enough to recover data, perhaps requiring reconstruction techniques such as file carving. Risky techniques may be justifiable if the drive is otherwise dead. If a drive is started up once it may continue to run for a shorter or longer time but never start again, so as much data as possible is recovered as soon as the drive starts.
Disk storage is a data storage mechanism based on a rotating disk. The recording employs various electronic, magnetic, optical, or mechanical changes to the disk's surface layer. A disk drive is a device implementing such a storage mechanism. Notable types are hard disk drives (HDD), containing one or more non-removable rigid platters; the floppy disk drive (FDD) and its removable floppy disk; and various optical disc drives (ODD) and associated optical disc media.
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.
The ST-506 and ST-412 were early hard disk drive products introduced by Seagate in 1980 and 1981 respectively, that later became construed as hard disk drive interfaces: the ST-506 disk interface and the ST-412 disk interface. Compared to the ST-506 precursor, the ST-412 implemented a refinement to the seek speed, and increased the drive capacity from 5 MB to 10 MB, but was otherwise highly similar.
Seagate Technology Holdings plc is an American data storage company. It was incorporated in 1978 as Shugart Technology and commenced business in 1979. Since 2010, the company has been incorporated in Dublin, Ireland, with operational headquarters in Fremont, California, United States.
Self-Monitoring, Analysis, and Reporting Technology is a monitoring system included in computer hard disk drives (HDDs) and solid-state drives (SSDs). Its a primary function is to detect and report various indicators of drive reliability, or how long a drive can function while anticipating imminent hardware failures.
Annualized failure rate (AFR) gives the estimated probability that a device or component will fail during a full year of use. It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year. AFR is estimated from a sample of like components—AFR and MTBF as given by vendors are population statistics that can not predict the behaviour of an individual unit.
A head crash is a hard-disk failure that occurs when a read–write head of a hard disk drive makes contact with its rotating platter, slashing its surface and permanently damaging its magnetic media. It is most often caused by a sudden severe motion of the disk, for example the jolt caused by dropping a laptop to the ground while it is operating or physically shocking a computer.
Density is a measure of the quantity of information bits that can be stored on a given physical space of a computer storage medium. There are three types of density: length of track, area of the surface, or in a given volume.
The Microdrive is a type of miniature, 1-inch hard disk produced by IBM and Hitachi. These rotational media storage devices were designed to fit in CompactFlash (CF) Type II slots.
Perpendicular recording, also known as conventional magnetic recording (CMR), is a technology for data recording on magnetic media, particularly hard disks. It was first proven advantageous in 1976 by Shun-ichi Iwasaki, then professor of the Tohoku University in Japan, and first commercially implemented in 2005. The first industry-standard demonstration showing unprecedented advantage of PMR over longitudinal magnetic recording (LMR) at nanoscale dimensions was made in 1998 at IBM Almaden Research Center in collaboration with researchers of Data Storage Systems Center (DSSC) – a National Science Foundation (NSF) Engineering Research Center (ERCs) at Carnegie Mellon University (CMU).
In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).
Heat-assisted magnetic recording (HAMR) is a magnetic storage technology for greatly increasing the amount of data that can be stored on a magnetic device such as a hard disk drive by temporarily heating the disk material during writing, which makes it much more receptive to magnetic effects and allows writing to much smaller regions.
Deskstar was the name of a product line of computer hard disk drives. It was originally announced by IBM in October 1994. The line was continued by Hitachi, when in 2003 it bought IBM's hard disk drive division and renamed it Hitachi Global Storage Technologies. In 2012, Hitachi sold the division to Western Digital, who continued the drive product line brand as HGST Deskstar. In 2018, Western Digital began winding down the HGST brand, and as of 2020 it is defunct.
In 1953, IBM recognized the immediate application for what it termed a "Random Access File" having high capacity and rapid random access at a relatively low cost. After considering technologies such as wire matrices, rod arrays, drums, drum arrays, etc., the engineers at IBM's San Jose California laboratory invented the hard disk drive. The disk drive created a new level in the computer data hierarchy, then termed Random Access Storage but today known as secondary storage, less expensive and slower than main memory but faster and more expensive than tape drives.
A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, and solid-state disk.
Fixed-block architecture (FBA) is an IBM term for the hard disk drive (HDD) layout in which each addressable block on the disk has the same size, utilizing 4 byte block numbers and a new set of command codes. FBA as a term was created and used by IBM for its 3310 and 3370 HDDs beginning in 1979 to distinguish such drives as IBM transitioned away from their variable record size format used on IBM's mainframe hard disk drives beginning in 1964 with its System/360.
The Seagate Barracuda is a series of hard disk drives and later solid state drives produced by Seagate Technology that was first introduced in 1993.
Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time .
The ST3000DM001 is a hard disk drive released by Seagate Technology in 2011 as part of the Seagate Barracuda series. It has a capacity of 3 terabytes (TB) and a spindle speed of 7200 RPM. This particular drive model was reported to have unusually high failure rates, due to a parking ramp that was made from different materials. The failure rates were approximately 5.7 times higher in comparison to other 3 TB drives, for which Seagate faced a class-action lawsuit.
So surely, the data they provide is invaluable to average consumers… right? Well, maybe not.