Disk buffer

Last updated
On this hard disk drive, the controller board contains a RAM integrated circuit used for the disk buffer. Fujitsu MPG3307AH ms avrs-110909.jpg
On this hard disk drive, the controller board contains a RAM integrated circuit used for the disk buffer.
A 500 GB Western Digital hard disk drive with a 16 MB buffer WD5000AAKX 16MB Buffer.jpg
A 500 GB Western Digital hard disk drive with a 16 MB buffer

In computer storage, disk buffer (often ambiguously called disk cache or cache buffer) is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. [1] Modern hard disk drives come with 8 to 256  MiB of such memory, and solid-state drives come with up to 4 GB of cache memory. [2]

Contents

Since the late 1980s, nearly all disks sold have embedded microcontrollers and either an ATA, Serial ATA, SCSI, or Fibre Channel interface. The drive circuitry usually has a small amount of memory, used to store the data going to and coming from the disk platters.

The disk buffer is physically distinct from and is used differently from the page cache typically kept by the operating system in the computer's main memory. The disk buffer is controlled by the microcontroller in the hard disk drive, and the page cache is controlled by the computer to which that disk is attached. The disk buffer is usually quite small, ranging between 8 MB to 4 GB, and the page cache is generally all unused main memory. While data in the page cache is reused multiple times, the data in the disk buffer is rarely reused. [3] In this sense, the terms disk cache and cache buffer are misnomers; the embedded controller's memory is more appropriately called disk buffer.

Note that disk array controllers, as opposed to disk controllers, usually have normal cache memory of around 0.58 GiB.

Uses

Read-ahead/read-behind

When a disk's controller executes a physical read, the actuator moves the read/write head to (or near to) the correct cylinder. After some settling and possibly fine-actuating the read head begins to pick up track data, and all is left to do is wait until platter rotation brings the requested data.

The data read ahead of request during this wait is unrequested but free, so typically saved in the disk buffer in case it is requested later.

Similarly, data can be read for free behind the requested one if the head can stay on track because there is no other read to execute or the next actuating can start later and still complete in time. [4]

If several requested reads are on the same track (or close by on a spiral track), most unrequested data between them will be both read ahead and behind.

Speed matching

The speed of the disk's I/O interface to the computer almost never matches the speed at which the bits are transferred to and from the hard disk platter. The disk buffer is used so that both the I/O interface and the disk read/write head can operate at full speed.

Write acceleration

The disk's embedded microcontroller may signal the main computer that a disk write is complete immediately after receiving the write data, before the data is actually written to the platter. This early signal allows the main computer to continue working even though the data has not actually been written yet. This can be somewhat dangerous, because if power is lost before the data is permanently fixed in the magnetic media, the data will be lost from the disk buffer, and the file system on the disk may be left in an inconsistent state.

On some disks, this vulnerable period between signaling the write complete and fixing the data can be arbitrarily long, as the write can be deferred indefinitely by newly arriving requests. For this reason, the use of write acceleration can be controversial. Consistency can be maintained, however, by using a battery-backed memory system for caching data, although this is typically only found in high-end RAID controllers.

Alternatively, the caching can simply be turned off when the integrity of data is deemed more important than write performance. Another option is to send data to disk in a carefully managed order and to issue "cache flush" commands in the right places, which is usually referred to as the implementation of write barriers.

Command queuing

Newer SATA and most SCSI disks can accept multiple commands while any one command is in operation through "command queuing" (see NCQ and TCQ). These commands are stored by the disk's embedded controller until they are completed. One benefit is that the commands can be re-ordered to be processed more efficiently, so that commands affecting the same area of a disk are grouped together. Should a read reference the data at the destination of a queued write, the to-be-written data will be returned.

NCQ is usually used in combination with enabled write buffering. In case of a read/write FPDMA command with Force Unit Access (FUA) bit set to 0 and enabled write buffering, an operating system may see the write operation finished before the data is physically written to the media. In case of FUA bit set to 1 and enabled write buffering, write operation returns only after the data is physically written to the media.

Cache control from the host

Cache flushing

Data that was accepted in write cache of a disk device will be eventually written to disk platters, provided that no starvation condition occurs as a result of firmware flaw, and that disk power supply is not interrupted before cached writes are forced to disk platters. In order to control write cache, ATA specification included FLUSH CACHE (E7h) and FLUSH CACHE EXT (EAh) commands. These commands cause the disk to complete writing data from its cache, and disk will return good status after data in the write cache is written to disk media. In addition, when the drive received STANDBY IMMEDIATE command, on disk media this command will park the head, on flash media this command will save FTL mapping table. [5]

An operating system will send FLUSH CACHE and STANDBY IMMEDIATE comand to hard disk drives in the shutdown process.

Mandatory cache flushing is used in Linux for write barriers in some filesystems (for example, ext4), together with Force Unit Access write command for journal commit blocks. [6]

Force Unit Access (FUA)

Force Unit Access (FUA) is an I/O write command option that forces written data all the way to stable storage. [7] FUA write commands (WRITE DMA FUA EXT  3Dh, WRITE DMA QUEUED FUA EXT  3Eh, WRITE MULTIPLE FUA EXT  CEh), in contrast to corresponding commands without FUA, write data directly to the media, regardless of whether write caching in the device is enabled or not. FUA write command will not return until data is written to media, thus data written by a completed FUA write command is on permanent media even if the device is powered off before issuing a FLUSH CACHE command. [8] [ dead link ] [9]

FUA appeared in the SCSI command set, and was later adopted by SATA with NCQ. FUA is more fine-grained as it allows a single write operation to be forced to stable media and thus has smaller overall performance impact when compared to commands that flush the entire disk cache, such as the ATA FLUSH CACHE family of commands. [9] [10]

Windows (Vista and up) supports FUA as part of Transactional NTFS, but only for SCSI or Fibre Channel disks where support for FUA is common. [11] It is not known whether a SATA drive that supports FUA write commands will actually honor the command and write data to disk platters as instructed;[ citation needed ] thus, Windows 8 and Windows Server 2012 instead send commands to flush the disk write cache after certain write operations. [12] [ dead link ]

Although the Linux kernel gained support for NCQ around 2007, SATA FUA remains disabled by default because of regressions that were found in 2012 when the kernel's support for FUA was tested. [13] [14] The Linux kernel supports FUA at the block layer level. [15]

See also

Related Research Articles

<span class="mw-page-title-main">Parallel ATA</span> Interface standard for the connection of storage devices

Parallel ATA (PATA), originally AT Attachment, also known as Integrated Drive Electronics (IDE), is a standard interface designed for IBM PC-compatible computers. It was first developed by Western Digital and Compaq in 1986 for compatible hard drives and CD or DVD drives. The connection is used for storage devices such as hard disk drives, floppy disk drives, optical disc drives, and tape drives in computers.

<span class="mw-page-title-main">Hard disk drive</span> Electro-mechanical data storage device

A hard disk drive (HDD), hard disk, hard drive, or fixed disk, is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnetic material. The platters are paired with magnetic heads, usually arranged on a moving actuator arm, which read and write data to the platter surfaces. Data is accessed in a random-access manner, meaning that individual blocks of data can be stored and retrieved in any order. HDDs are a type of non-volatile storage, retaining stored data when powered off. Modern HDDs are typically in the form of a small rectangular box.

<span class="mw-page-title-main">SCSI</span> Set of computer and peripheral connection standards

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced in the 1980s and has seen widespread use on servers and high-end workstations, with new SCSI standards being published as recently as SAS-4 in 2017.

<span class="mw-page-title-main">SATA</span> Computer bus interface for storage devices

SATA is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard to become the predominant interface for storage devices.

A disk array controller is a device that manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, thus it is sometimes referred to as RAID controller. It also often provides additional disk cache.

<span class="mw-page-title-main">Native Command Queuing</span>

In computing, Native Command Queuing (NCQ) is an extension of the Serial ATA protocol allowing hard disk drives to internally optimize the order in which received read and write commands are executed. This can reduce the amount of unnecessary drive head movement, resulting in increased performance for workloads where multiple simultaneous read/write requests are outstanding, most often occurring in server-type applications.

<span class="mw-page-title-main">USB mass storage device class</span> USB device class for drives

The USB mass storage device class is a set of computing communications protocols, specifically a USB Device Class, defined by the USB Implementers Forum that makes a USB device accessible to a host computing device and enables file transfers between the host and the USB device. To a host, the USB device acts as an external hard drive; the protocol set interfaces with a number of storage devices.

<span class="mw-page-title-main">Western Digital Raptor</span>

The Western Digital Raptor is a discontinued series of high performance hard disk drives produced by Western Digital first marketed in 2003. The drive occupied a niche in the enthusiast, workstation and small-server market. Traditionally, the majority of servers used hard drives featuring a SCSI interface because of their advantages in both performance and reliability over consumer-level ATA drives.

INT 13h is shorthand for BIOS interrupt call 13hex, the 20th interrupt vector in an x86-based computer system. The BIOS typically sets up a real mode interrupt handler at this vector that provides sector-based hard disk and floppy disk read and write services using cylinder-head-sector (CHS) addressing. Modern PC BIOSes also include INT 13h extension functions, originated by IBM and Microsoft in 1992, that provide those same disk access services using 64-bit LBA addressing; with minor additions, these were quasi-standardized by Phoenix Technologies and others as the EDD BIOS extensions.

Tagged Command Queuing (TCQ) is a technology built into certain ATA and SCSI hard drives. It allows the operating system to send multiple read and write requests to a hard drive. ATA TCQ is not identical in function to the more efficient Native Command Queuing (NCQ) used by SATA drives. SCSI TCQ does not suffer from the same limitations as ATA TCQ.

sync is a standard system call in the Unix operating system, which commits all data from the kernel filesystem buffers to non-volatile storage, i.e., data which has been scheduled for writing via low-level I/O system calls. Higher-level I/O layers such as stdio may maintain separate buffers of their own.

A bad sector in computing is a disk sector on a disk storage unit that is unreadable. Upon taking damage, all information stored on that sector is lost. When a bad sector is found and marked, the operating system like Windows or Linux will skip it in the future. Bad sectors are a threat to information security in the sense of data remanence.

<span class="mw-page-title-main">Solid-state drive</span> Computer storage device with no moving parts

A solid-state drive (SSD) is a solid-state storage device. It provides persistent data storage using no moving parts.

In computing, error recovery control (ERC) is a feature of hard disks which allow a system administrator to configure the amount of time a drive's firmware is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in hardware or software RAID environments. In some cases, there is a conflict as to whether error handling should be undertaken by the hard drive or by the RAID implementation, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided.

In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

In Unix-like operating systems, a device file or device node or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

SCSI / ATA Translation (SAT) is a set of standards developed by the T10 subcommittee, defining how to communicate with ATA devices through a SCSI application layer. The standard attempts to be consistent with the SCSI architectural model, the SCSI Primary Commands, and the SCSI Block Commands standards.

<span class="mw-page-title-main">Seagate Barracuda</span> Series of hard disk drives produced by Seagate Technology

The Seagate Barracuda is a series of hard disk drives and later solid state drives produced by Seagate Technology that was first introduced in 1993.

A trim command allows an operating system to inform a solid-state drive (SSD) which blocks of data are no longer considered to be "in use" and therefore can be erased internally.

Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in hard disk drives (HDDs) to increase storage density and overall per-drive storage capacity. Conventional hard disk drives record data by writing non-overlapping magnetic tracks parallel to each other, while shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because, if the writing head is made too narrow, it cannot provide the very high fields required in the recording layer of the disk.

References

  1. Mark Kyrnin. "What to Look for in a Hard Drive". about.com. Archived from the original on 2015-04-04. Retrieved 2014-12-20. A drive's buffer is an amount of RAM on the drive to store frequently accessed data from the drive.
  2. "Samsung SSD 860 PRO | Samsung V-NAND Consumer SSD | Samsung Semiconductor Global Website". Samsung. Archived from the original on April 6, 2018. Retrieved July 16, 2018. CACHE MEMORY: 4 GB Low Power DDR4 (4,096 GB)
  3. Charles M. Kozierok (2001-04-17). "Internal Cache (Buffer) Size". pcguide.com. Retrieved 2014-12-20.
  4. Disks for Data Centers.
  5. Hitachi (2006). Deskstar 7K80 Disk Drive Specification, 4th Edition (Revision 1.6)(12 September 2006) Final. Hitachi Global Storage Technologies. pp. 99, 130, 131.
  6. Christoph Hellwig; Theodore Ts'o. "Does ext4 send FUA to flush disk cache". spinics.net. Retrieved 2014-03-18.
  7. Jonathan Corbet (2010-08-18). "The end of block barriers". LWN.net . Retrieved 2015-06-27.
  8. "Information technology - AT Attachment 8 - ATA/ATAPI Command Set (ATA8-ACS)" (PDF). T13/1699-D, revision 6a, September 6, 2008. American National Standards Institute, Inc. Archived from the original (PDF) on 6 August 2020. Retrieved 14 December 2020.
  9. 1 2 Gregory Smith (2010). PostgreSQL 9.0: High Performance. Packt Publishing Ltd. p. 78. ISBN   978-1-84951-031-8.
  10. Bruce Jacob; Spencer Ng; David Wang (2010). Memory Systems: Cache, DRAM, Disk. Morgan Kaufmann. p. 734. ISBN   978-0-08-055384-9.
  11. "Deploying Transactional NTFS (Windows)". Msdn.microsoft.com. 2013-12-05. Retrieved 2014-01-24.
  12. "Forced Unit Access | Working Hard In IT". Workinghardinit.wordpress.com. 2012-10-12. Archived from the original on 2014-01-12. Retrieved 2014-01-24.
  13. "Storage related regression in linux-next 20120824". 2012-09-12. Retrieved 2015-06-27.
  14. "Revert "libata: enable SATA disk fua detection on default"". GitHub . 2012-09-13. Retrieved 2015-06-27.
  15. "Linux kernel documentation: Documentation/block/writeback_cache_control.txt". kernel.org. 2013-08-12. Retrieved 2014-01-24.