Dm-cache

Last updated

dm-cache
Developer(s) Joe Thornber, Heinz Mauelshagen, Mike Snitzer and others
Initial releaseApril 28, 2013;10 years ago (2013-04-28) (Linux 3.9)
Written in C
Operating system Linux
Type Linux kernel feature
License GNU GPL
Website kernel.org

dm-cache is a component (more specifically, a target) of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements.

Contents

The design of dm-cache requires three physical storage devices for the creation of a single hybrid volume; dm-cache uses those storage devices to separately store actual data, cache data, and required metadata. Configurable operating modes and cache policies, with the latter in the form of separate modules, determine the way data caching is actually performed.

dm-cache is licensed under the terms of GNU General Public License (GPL), with Joe Thornber, Heinz Mauelshagen and Mike Snitzer as its primary developers.

Overview

dm-cache uses solid-state drives (SSDs) as an additional level of indirection while accessing hard disk drives (HDDs), improving the overall performance by using fast flash-based SSDs as caches for the slower mechanical HDDs based on rotational magnetic media. As a result, the costly speed of SSDs becomes combined with the storage capacity offered by slower but less expensive HDDs. [1] Moreover, in the case of storage area networks (SANs) used in cloud environments as shared storage systems for virtual machines, dm-cache can also improve overall performance and reduce the load of SANs by providing data caching using client-side local storage. [2] [3] [4]

dm-cache is implemented as a component of the Linux kernel's device mapper, which is a volume management framework that allows various mappings to be created between physical and virtual block devices. The way a mapping between devices is created determines how the virtual blocks are translated into underlying physical blocks, with the specific translation types referred to as targets. [5] Acting as a mapping target, dm-cache makes it possible for SSD-based caching to be part of the created virtual block device, while the configurable operating modes and cache policies determine how dm-cache works internally. The operating mode selects the way in which the data is kept in sync between an HDD and an SSD, while the cache policy, selectable from separate modules that implement each of the policies, provides the algorithm for determining which blocks are promoted (moved from an HDD to an SSD), demoted (moved from an SSD to an HDD), cleaned, etc. [6]

When configured to use the multiqueue (mq) or stochastic multiqueue (smq) cache policy, with the latter being the default, dm-cache uses SSDs to store the data associated with performed random reads and writes, capitalizing on near-zero seek times of SSDs and avoiding such I/O operations as typical HDD performance bottlenecks. The data associated with sequential reads and writes is not cached on SSDs, avoiding undesirable cache invalidation during such operations; performance-wise, this is beneficial because the sequential I/O operations are suitable for HDDs due to their mechanical nature. Not caching the sequential I/O also helps in extending the lifetime of SSDs used as caches. [7]

History

Another dm-cache project with similar goals was announced by Eric Van Hensbergen and Ming Zhao in 2006, as the result of an internship work at IBM. [8]

Later, Joe Thornber, Heinz Mauelshagen and Mike Snitzer provided their own implementation of the concept, which resulted in the inclusion of dm-cache into the Linux kernel. dm-cache was merged into the Linux kernel mainline in kernel version 3.9, which was released on April 28, 2013. [6] [9]

Design

In dm-cache, creating a mapped virtual block device that acts as a hybrid volume requires three physical storage devices: [6]

Internally, dm-cache references to each of the origin devices through a number of fixed-size blocks; the size of these blocks, equaling to the size of a caching extent, is configurable only during the creation of a hybrid volume. The size of a caching extent must range between 32  KB and 1  GB, and it must be a multiple of 32 KB; typically, the size of a caching extent is between 256 and 1024 KB. The choice of the caching extents bigger than disk sectors acts a compromise between the size of metadata and the possibility for wasting cache space. Having too small caching extents increases the size of metadata, both on the metadata device and in kernel memory, while having too large caching extents increases the amount of wasted cache space due to caching whole extents even in the case of high hit rates only for some of their parts. [6] [10]

Operating modes supported by dm-cache are write-back , which is the default, write-through , and pass-through. In the write-back operating mode, writes to cached blocks go only to the cache device, while the blocks on origin device are only marked as dirty in the metadata. For the write-through operating mode, write requests are not returned as completed until the data reaches both the origin and cache devices, with no clean blocks becoming marked as dirty. In the pass-through operating mode, all reads are performed directly from the origin device, avoiding the cache, while all writes go directly to the origin device; any cache write hits also cause invalidation of the cached blocks. The pass-through mode allows a hybrid volume to be activated when the state of a cache device is not known to be consistent with the origin device. [6] [11]

The rate of data migration that dm-cache performs in both directions (i.e., data promotions and demotions) can be throttled down to a configured speed so regular I/O to the origin and cache devices can be preserved. Decommissioning a hybrid volume or shrinking a cache device requires use of the cleaner policy, which effectively flushes all blocks marked in metadata as dirty from the cache device to the origin device. [6] [7]

Cache policies

As of August 2015 and version 4.2 of the Linux kernel, [12] the following three cache policies are distributed with the Linux kernel mainline, out of which dm-cache by default uses the stochastic multiqueue policy: [6] [7]

multiqueue (mq)
The multiqueue (mq) policy has three sets of 16 queues, using the first set for entries waiting for the cache and the remaining two sets for entries already in the cache, with the latter separated so the clean and dirty entries belong to each of the two sets. The age of cache entries in the queues is based on their associated logical time. The selection of entries going into the cache (i.e., becoming promoted) is based on variable thresholds, and queue selection is based on the hit count of an entry. This policy aims to take different cache miss costs into account, and to make automatic adjustments to different load patterns.
This policy internally tracks sequential I/O operations so they can be routed around the cache, with different configurable thresholds for the differentiation between random I/O and sequential I/O operations. As a result, large contiguous I/O operations are left to be performed by the origin device because such data access patterns are suitable for HDDs, and because they avoid undesirable cache invalidation.
stochastic multiqueue (smq)
The stochastic multiqueue (smq) policy performs in a similar way as the multiqueue policy, but requires fewer resources to operate; in particular, it uses substantially smaller amounts of main memory to track cached blocks. It also replaces the hit counting from the multiqueue policy with a "hotspot" queue, and decides on data promotion and demotion on a least-recently used (LRU) basis. As a result, this policy provides better performance compared to the multiqueue policy, adjusts better automatically to different load patterns, and eliminates the configuration of various thresholds.
cleaner
The cleaner policy writes back to the origin device all blocks that are marked as dirty in the metadata. After the completion of this operation, a hybrid volume can be decommissioned or the size of a cache device can be reduced.

Use with LVM

Logical Volume Manager includes lvmcache, which provides a wrapper for dm-cache integrated with LVM. [13]

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

In computing, the Global File System 2 or GFS2 is a shared-disk file system for Linux computer clusters. GFS2 allows all members of a cluster to have direct concurrent access to the same shared block storage, in contrast to distributed file systems which distribute data throughout the cluster. GFS2 can also be used as a local file system on a single computer.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

In computing, a hybrid drive is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding some of the speed of SSDs to the cost-effective storage capacity of traditional HDDs. The purpose of the SSD in a hybrid drive is to act as a cache for the data stored on the HDD, improving the overall performance by keeping copies of the most frequently used data on the faster SSD drive.

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

A NetApp FAS is a computer storage product by NetApp running the ONTAP operating system; the terms ONTAP, AFF, ASA, FAS are often used as synonyms. "Filer" is also used as a synonym although this is not an official name. There are three types of FAS systems: Hybrid, All-Flash, and All SAN Array:

  1. NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
  2. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS
  3. All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.
<span class="mw-page-title-main">Solid-state drive</span> Data storage device

A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functions as secondary storage in the hierarchy of computer storage. It is also sometimes called a semiconductor storage device, a solid-state device, or a solid-state disk, even though SSDs lack the physical spinning disks and movable read-write heads used in hard disk drives (HDDs) and floppy disks. SSD also has rich internal parallelism for data processing.

<span class="mw-page-title-main">Disk buffer</span>

In computer storage, disk buffer is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard disk platter or flash memory that is used for storage. Modern hard disk drives come with 8 to 256 MiB of such memory, and solid-state drives come with up to 4 GB of cache memory.

In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache in otherwise unused portions of the main memory (RAM), resulting in quicker access to the contents of cached pages and overall performance improvements. A page cache is implemented in kernels with the paging memory management, and is mostly transparent to applications.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

A hybrid array is a form of hierarchical storage management that combines hard disk drives (HDDs) with solid-state drives (SSDs) for I/O speed improvements.

In computer data storage, Smart Response Technology is a proprietary caching mechanism introduced in 2011 by Intel for their Z68 chipset, which allows a SATA solid-state drive (SSD) to function as cache for a hard disk drive (HDD).

Fusion Drive is a type of hybrid drive technology created by Apple Inc. It combines a hard disk drive with a NAND flash storage and presents it as a single Core Storage managed logical volume with the space of both drives combined.

Flashcache is a disk cache component for the Linux kernel, initially developed by Facebook since April 2010, and released as open source in 2011. Since January 2013, there is a fork of Flashcache, named EnhanceIO and developed by sTec, Inc. Since 2015 that fork became unmaintained and it was forked again and maintained by individuals.

Shingled magnetic recording (SMR) is a magnetic storage data recording technology used in hard disk drives (HDDs) to increase storage density and overall per-drive storage capacity. Conventional hard disk drives record data by writing non-overlapping magnetic tracks parallel to each other, while shingled recording writes new tracks that overlap part of the previously written magnetic track, leaving the previous track narrower and allowing higher track density. Thus, the tracks partially overlap similar to roof shingles. This approach was selected because, if the writing head is made too narrow, it cannot provide the very high fields required in the recording layer of the disk.

bcache is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

ExpressCache is a Windows-based SSD caching technology developed by Condusiv Technologies and licensed to a number of laptop manufacturers including Acer, ASUS, Samsung, Sony, Lenovo, and Fujitsu. ExpressCache is also bundled with some SanDisk products such as ReadyCache; SanDisk currently holds an exclusive ExpressCache license for stand-alone storage products.

References

  1. Petros Koutoupis (November 25, 2013). "Advanced Hard Drive Caching Techniques". Linux Journal . Retrieved December 2, 2013.
  2. "dm-cache: Dynamic Block-level Storage Caching". visa.cs.fiu.edu. Archived from the original on July 18, 2014. Retrieved July 24, 2014.
  3. Dulcardo Arteaga; Douglas Otstott; Ming Zhao (May 16, 2012). "Dynamic Block-level Cache Management for Cloud Computing Systems". visa.cs.fiu.edu. Archived from the original (PDF) on December 3, 2013. Retrieved December 2, 2013.
  4. Dulcardo Arteaga; Ming Zhao (June 21, 2014). "Client-side Flash Caching for Cloud Systems". visa.cs.fiu.edu. ACM. Archived from the original (PDF) on September 6, 2015. Retrieved August 31, 2015.
  5. "Red Hat Enterprise Linux 6 Documentation, Appendix A. The Device Mapper". Red Hat. October 8, 2014. Retrieved December 23, 2014.
  6. 1 2 3 4 5 6 7 Joe Thornber; Heinz Mauelshagen; Mike Snitzer (July 20, 2015). "Linux kernel documentation: Documentation/device-mapper/cache.txt". kernel.org . Retrieved August 31, 2015.
  7. 1 2 3 Joe Thornber; Heinz Mauelshagen; Mike Snitzer (June 29, 2015). "Linux kernel documentation: Documentation/device-mapper/cache-policies.txt". kernel.org . Retrieved August 31, 2015.
  8. Eric Van Hensbergen; Ming Zhao (November 28, 2006). "Dynamic Policy Disk Caching for Storage Networking" (PDF). IBM Research Report. IBM . Retrieved December 2, 2013.
  9. "Linux kernel 3.9, Section 1.3. SSD cache devices". kernelnewbies.org. April 28, 2013. Retrieved October 7, 2013.
  10. Jake Edge (May 1, 2013). "LSFMM: Caching dm-cache and bcache". LWN.net . Retrieved October 7, 2013.
  11. Joe Thornber (November 11, 2013). "Linux kernel source tree: kernel/git/torvalds/linux.git: dm cache: add passthrough mode". kernel.org . Retrieved February 6, 2014.
  12. Jonathan Corbet (July 1, 2015). "4.2 Merge window part 2". LWN.net . Retrieved August 31, 2015.
  13. Red Hat, Inc. "lvmcache — LVM caching". Debian Manpages. A read and write hot-spot cache, using the dm-cache kernel module.