Developer(s) | Kent Overstreet and others |
---|---|
Initial release | June 30, 2013 (Linux 3.10) |
Repository | |
Written in | C |
Operating system | Linux |
Type | Linux kernel features |
License | GNU GPL |
Website | bcache |
bcache (abbreviated from block cache) is a cache mechanism in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.
Designed around the nature and performance characteristics of SSDs, bcache also minimizes write amplification by avoiding random writes and turning them into sequential writes instead. This merging of I/O operations is performed for both the cache and the primary storage, helping in extending the lifetime of flash-based devices used as caches, and in improving the performance of write-sensitive primary storages, such as RAID 5 sets.
bcache is licensed under the GNU General Public License (GPL), and Kent Overstreet is its primary developer. Overstreet considers bcache as a "prototype" for the development of bcachefs, a filesystem with significant improvements over bcache. [1]
Using bcache makes it possible to have SSDs as another level of indirection within the data storage access paths, resulting in improved overall performance by using fast flash-based SSDs as caches for slower mechanical hard disk drives (HDDs) with rotational magnetic media. That way, the gap between SSDs and HDDs can be bridged – the costly speed of SSDs gets combined with the cheap storage capacity of traditional HDDs. [2]
Caching is implemented by using SSDs for storing data associated with performed random reads and random writes, using near-zero seek times as the most prominent feature of SSDs. Sequential I/O is not cached, to avoid rapid SSD cache invalidation on such operations that are already suitable enough for HDDs; going around the cache for big sequential writes is known as the write-around policy. Not caching the sequential I/O also helps in extending the lifetime of SSDs used as caches. [3] Write amplification is avoided by not performing random writes to SSDs; instead, all random writes to SSD caches are always combined into block-level writes, ending up with rewriting only the complete erase blocks on SSDs. [4] [5]
Both write-back and write-through (which is the default) policies are supported for caching write operations. In case of the write-back policy, written data is stored inside the SSD caches first, and propagated to the HDDs later in a batched way while performing seek-friendly operations – making bcache to act also as an I/O scheduler. For the write-through policy, which ensures that no write operation is marked as finished until the data requested to be written has reached both SSDs and HDDs, performance improvements are reduced by effectively performing only caching of the written data. [4] [5]
Write-back policy with batched writes to HDDs provides additional benefits to write-sensitive redundant array of independent disks (RAID) layouts such as RAID 5 and RAID 6, which perform actual write operations as atomic read-modify-write sequences. That way, performance penalties [6] of small random writes are reduced or avoided for such RAID layouts, by grouping them together and performing as batched sequential writes. [4] [5]
Caching performed by bcache operates at the block device level, making itself file system –agnostic as long as the file system provides an embedded universally unique identifier (UUID); this requirement is satisfied by virtually all standard Linux file systems, as well as by swap partitions. Sizes of the logical blocks used internally by bcache as caching extents can go down to the size of a single HDD sector. [7]
bcache was first announced by Kent Overstreet in July 2010, as a completely working Linux kernel module, though at its early beta stage. [8] The development continued for almost two years, until May 2012, at which point bcache reached its production-ready state. [5]
It was merged into the Linux kernel mainline in kernel version 3.10, released on June 30, 2013. [9] [10] Overstreet has since been developing the file system bcachefs, based on ideas first developed in bcache that he said began "evolving ... into a full blown, general-purpose POSIX filesystem". [11] He describes bcache as a "prototype" for the ideas that became bcachefs and intends bcachefs to replace bcache. [12] He officially announced bcachefs in 2015 and got it merged into the mainline Linux kernel in October 2023. [13]
As of version 3.10 of the Linux kernel, the following features are provided by bcache: [4]
As of February 2014 [update] , the following new features are planned for the future releases of bcache: [10]
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.
A hybrid drive is a logical or physical computer storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding some of the speed of SSDs to the cost-effective storage capacity of traditional HDDs. The purpose of the SSD in a hybrid drive is to act as a cache for the data stored on the HDD, improving the overall performance by keeping copies of the most frequently used data on the faster SSD drive.
The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.
A solid-state drive (SSD) is a type of solid-state storage device that uses integrated circuits to store data persistently. It is sometimes called semiconductor storage device, solid-state device, and solid-state disk.
In computer storage, the standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5, and RAID 6. Multiple RAID levels can also be combined or nested, for instance RAID 10 or RAID 01. RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard. The numerical values only serve as identifiers and do not signify performance, reliability, generation, hierarchy, or any other metric.
In computer storage, a disk buffer is the embedded memory in a hard disk drive (HDD) or solid-state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.
Input/output (I/O) scheduling is the method that computer operating systems use to decide in which order I/O operations will be submitted to storage volumes. I/O scheduling is sometimes called disk scheduling.
Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was created by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.
zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as general-purpose RAM disk. The two most common uses for zram are for the storage of temporary files and as a swap device. Initially, zram had only the latter function, hence the original name "compcache". Unlike swap, zram only uses 0.1% of the maximum size of the disk when not in use.
F2FS is a flash file system initially developed by Samsung Electronics for the Linux kernel.
Flashcache is a disk cache component for the Linux kernel, initially developed by Facebook since April 2010, and released as open source in 2011. Since January 2013, there is a fork of Flashcache, named EnhanceIO and developed by sTec, Inc. Since 2015 that fork became unmaintained and it was forked again and maintained by individuals.
Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in hard disk drives (HDDs) to increase storage density and overall per-drive storage capacity. Conventional hard disk drives record data by writing non-overlapping concentric magnetic tracks, while shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because, if the writing head is made too narrow, it cannot provide the very high fields required in the recording layer of the disk.
dm-cache is a component of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements.
zswap is a Linux kernel feature that provides a compressed write-back cache for swapped pages, as a form of virtual memory compression. Instead of moving memory pages to a swap device when they are to be swapped out, zswap performs their compression and then stores them into a memory pool dynamically allocated in the system RAM. Later writeback to the actual swap device is deferred or even completely avoided, resulting in a significantly reduced I/O for Linux systems that require swapping; the tradeoff is the need for additional CPU cycles to perform the compression.
An open-channel solid state drive is a solid-state drive which does not have a firmware Flash Translation Layer implemented on the device, but instead leaves the management of the physical solid-state storage to the computer's operating system. The Linux 4.4 kernel is an example of an operating system kernel that supports open-channel SSDs which follow the NVM Express specification. The interface used by the operating system to access open-channel solid state drives is called LightNVM.
Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and it was added to the Linux kernel beginning with 6.7. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS.
EROFS is a lightweight read-only file system initially developed by Huawei, originally for the Linux kernel and now maintained by an open-source community from all over the world.
io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read
/write
or aio_read
/aio_write
etc. for operations on data accessed by file descriptors.