Bcache

Last updated

bcache
Developer(s) Kent Overstreet and others
Initial releaseJune 30, 2013;10 years ago (2013-06-30) (Linux 3.10)
Repository
Written in C
Operating system Linux
Type Linux kernel features
License GNU GPL
Website bcache.evilpiepirate.org

bcache (abbreviated from block cache) is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

Contents

Designed around the nature and performance characteristics of SSDs, bcache also minimizes write amplification by avoiding random writes and turning them into sequential writes instead. This merging of I/O operations is performed for both the cache and the primary storage, helping in extending the lifetime of flash-based devices used as caches, and in improving the performance of write-sensitive primary storages, such as RAID 5 sets.

bcache is licensed under the GNU General Public License (GPL), and Kent Overstreet is its primary developer. Overstreet considers bcache as a "prototype" for the development of bcachefs, a filesystem with significant improvements over bcache. [1]

Overview

Using bcache makes it possible to have SSDs as another level of indirection within the data storage access paths, resulting in improved overall performance by using fast flash-based SSDs as caches for slower mechanical hard disk drives (HDDs) with rotational magnetic media. That way, the gap between SSDs and HDDs can be bridged  the costly speed of SSDs gets combined with the cheap storage capacity of traditional HDDs. [2]

Caching is implemented by using SSDs for storing data associated with performed random reads and random writes, using near-zero seek times as the most prominent feature of SSDs. Sequential I/O is not cached, to avoid rapid SSD cache invalidation on such operations that are already suitable enough for HDDs; going around the cache for big sequential writes is known as the write-around policy. Not caching the sequential I/O also helps in extending the lifetime of SSDs used as caches. [3] Write amplification is avoided by not performing random writes to SSDs; instead, all random writes to SSD caches are always combined into block-level writes, ending up with rewriting only the complete erase blocks on SSDs. [4] [5]

Both write-back and write-through (which is the default) policies are supported for caching write operations. In case of the write-back policy, written data is stored inside the SSD caches first, and propagated to the HDDs later in a batched way while performing seek-friendly operations  making bcache to act also as an I/O scheduler. For the write-through policy, which ensures that no write operation is marked as finished until the data requested to be written has reached both SSDs and HDDs, performance improvements are reduced by effectively performing only caching of the written data. [4] [5]

Write-back policy with batched writes to HDDs provides additional benefits to write-sensitive redundant array of independent disks (RAID) layouts such as RAID 5 and RAID 6, which perform actual write operations as atomic read-modify-write sequences. That way, performance penalties [6] of small random writes are reduced or avoided for such RAID layouts, by grouping them together and performing as batched sequential writes. [4] [5]

Caching performed by bcache operates at the block device level, making itself file system agnostic as long as the file system provides an embedded universally unique identifier (UUID); this requirement is satisfied by virtually all standard Linux file systems, as well as by swap partitions. Sizes of the logical blocks used internally by bcache as caching extents can go down to the size of a single HDD sector. [7]

History

bcache was first announced by Kent Overstreet in July 2010, as a completely working Linux kernel module, though at its early beta stage. [8] The development continued for almost two years, until May 2012, at which point bcache reached its production-ready state. [5]

It was merged into the Linux kernel mainline in kernel version 3.10, released on June 30, 2013. [9] [10] Overstreet has since been developing the file system bcachefs, based on ideas first developed in bcache that he said began "evolving ... into a full blown, general-purpose POSIX filesystem". [11] He describes bcache as a "prototype" for the ideas that became bcachefs and intends bcachefs to replace bcache. [12] He officially announced bcachefs in 2015 and got it merged into the mainline Linux kernel in October 2023. [13]

Features

As of version 3.10 of the Linux kernel, the following features are provided by bcache: [4]

Improvements

As of February 2014, the following new features are planned for the future releases of bcache: [10]

See also

Related Research Articles

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

In computing, a hybrid drive is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding some of the speed of SSDs to the cost-effective storage capacity of traditional HDDs. The purpose of the SSD in a hybrid drive is to act as a cache for the data stored on the HDD, improving the overall performance by keeping copies of the most frequently used data on the faster SSD drive.

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functions as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device, or a solid-state disk, even though SSDs lack the physical spinning disks and movable read-write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

In computer storage, the standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 (mirroring) and its variants, RAID 5, and RAID 6. Multiple RAID levels can also be combined or nested, for instance RAID 10 or RAID 01. RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard. The numerical values only serve as identifiers and do not signify performance, reliability, generation, or any other metric.

<span class="mw-page-title-main">Disk buffer</span>

In computer storage, disk buffer is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.

<span class="mw-page-title-main">I/O scheduling</span> Arbiter for mass storage access in an operating system

Input/output (I/O) scheduling is the method that computer operating systems use to decide in which order I/O operations will be submitted to storage volumes. I/O scheduling is sometimes called disk scheduling.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

zram, formerly called compcache, is a Linux kernel module for creating a compressed block device in RAM, i.e. a RAM disk with on-the-fly disk compression. The block device created with zram can then be used for swap or as general-purpose RAM disk. The two most common uses for zram are for the storage of temporary files and as a swap device. Initially, zram had only the latter function, hence the original name "compcache". Unlike swap, zram only uses 0.1% of the maximum size of the disk when not in use.

F2FS is a flash file system initially developed by Samsung Electronics for the Linux kernel.

Flashcache is a disk cache component for the Linux kernel, initially developed by Facebook since April 2010, and released as open source in 2011. Since January 2013, there is a fork of Flashcache, named EnhanceIO and developed by sTec, Inc. Since 2015 that fork became unmaintained and it was forked again and maintained by individuals.

Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in hard disk drives (HDDs) to increase storage density and overall per-drive storage capacity. Conventional hard disk drives record data by writing non-overlapping magnetic tracks parallel to each other, while shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because, if the writing head is made too narrow, it cannot provide the very high fields required in the recording layer of the disk.

dm-cache is a component of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements.

zswap is a Linux kernel feature that provides a compressed write-back cache for swapped pages, as a form of virtual memory compression. Instead of moving memory pages to a swap device when they are to be swapped out, zswap performs their compression and then stores them into a memory pool dynamically allocated in the system RAM. Later writeback to the actual swap device is deferred or even completely avoided, resulting in a significantly reduced I/O for Linux systems that require swapping; the tradeoff is the need for additional CPU cycles to perform the compression.

Solid-state storage (SSS) is a type of non-volatile computer storage that stores and retrieves digital information using only electronic circuits, without any involvement of moving mechanical parts. This differs fundamentally from the traditional electromechanical storage, which records data using rotating or linearly moving media coated with magnetic material.

An open-channel solid state drive is a solid-state drive which does not have a firmware Flash Translation Layer implemented on the device, but instead leaves the management of the physical solid-state storage to the computer's operating system. The Linux 4.4 kernel is an example of an operating system kernel that supports open-channel SSDs which follow the NVM Express specification. The interface used by the operating system to access open-channel solid state drives is called LightNVM.

Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and it was added to the Linux kernel beginning with 6.7. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS. It self-describes as "stable", as of December 2022.

EROFS is a lightweight read-only file system initially developed by Huawei, originally for the Linux kernel and now maintained by an open-source community from all over the world.

io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read /write or aio_read /aio_write etc. for operations on data accessed by file descriptors.

References

  1. "bcache FAQ". bcache.evilpiepirate.org. Retrieved May 7, 2021.
  2. Petros Koutoupis (November 25, 2013). "Advanced Hard Drive Caching Techniques". Linux Journal . Retrieved December 2, 2013.
  3. 1 2 3 "Linux kernel documentation: Documentation/bcache.txt". kernel.org. August 12, 2013. Retrieved January 24, 2014.
  4. 1 2 3 4 Kent Overstreet. "bcache: Linux kernel block layer cache". bcache.evilpiepirate.org. Retrieved December 2, 2013.
  5. 1 2 3 4 Jonathan Corbet (May 12, 2012). "A bcache update". LWN.net . Retrieved October 4, 2013.
  6. "Basic RAID Organizations". ecs.umass.edu. Retrieved October 4, 2013.
  7. William Stearns; Kent Overstreet (July 2, 2010). "bcache: Caching beyond just RAM". LWN.net . Retrieved October 4, 2013.
  8. Kent Overstreet (July 4, 2010). "bcache: Version 6". LWN.net . Retrieved October 4, 2013.
  9. "Linux kernel 3.10, Section 1.2. bcache, a block layer cache for SSD caching". kernelnewbies.org. June 30, 2013. Retrieved October 4, 2013.
  10. 1 2 Libby Clark (June 11, 2013). "All About the Linux Kernel: bcache". linux.com. Archived from the original on September 29, 2013. Retrieved October 9, 2013.
  11. Larabel, Michael (August 21, 2015). "A New Linux File-System Aims For Speed While Having ZFS/Btrfs-Like Features". Phoronix . Retrieved November 22, 2018.
  12. Edge, Jake (May 23, 2018). "An update on bcachefs". LWN.net . Retrieved November 22, 2018.
  13. Larabel, Michael (October 31, 2023). "Bcachefs Merged Into The Linux 6.7 Kernel". Phoronix . Retrieved November 20, 2023.