Logical volume management

Last updated

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions (or block devices in general) into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

Contents

Volume management represents just one of many forms of storage virtualization; its implementation takes place in a layer in the device-driver stack of an operating system (OS) (as opposed to within storage devices or in a network).

Design

Linux Logical Volume Manager (LVM) v1 LVM1.svg
Linux Logical Volume Manager (LVM) v1

Most volume-manager implementations share the same basic design. They start with physical volumes (PVs), which can be either hard disks, hard disk partitions, or Logical Unit Numbers (LUNs) of an external storage device. Volume management treats each PV as being composed of a sequence of chunks called physical extents (PEs). Some volume managers (such as that in HP-UX and Linux) have PEs of a uniform size; others (such as that in Veritas) have variably-sized PEs that can be split and merged at will.

Normally, PEs simply map one-to-one to logical extents (LEs). With mirroring, multiple PEs map to each LE. These PEs are drawn from a physical volume group (PVG), a set of same-sized PVs which act similarly to hard disks in a RAID1 array. PVGs are usually laid out so that they reside on different disks or data buses for maximum redundancy.

The system pools LEs into a volume group (VG). The pooled LEs can then be concatenated together into virtual disk partitions called logical volumes or LVs. Systems can use LVs as raw block devices just like disk partitions: creating mountable file systems on them, or using them as swap storage.

Striped LVs allocate each successive LE from a different PV; depending on the size of the LE, this can improve performance on large sequential reads by bringing to bear the combined read-throughput of multiple PVs.

Administrators can grow LVs (by concatenating more LEs) or shrink them (by returning LEs to the pool). The concatenated LEs do not have to be contiguous. This allows LVs to grow without having to move already-allocated LEs. Some volume managers allow the re-sizing of LVs in either direction while online. Changing the size of the LV does not necessarily change the size of a file system on it; it merely changes the size of its containing space. A file system that can be resized online is recommended in that it allows the system to adjust its storage on-the-fly without interrupting applications.

PVs and LVs cannot be shared between or span different VGs (although some volume managers may allow moving them at will between VGs on the same host). This allows administrators conveniently to bring VGs online, to take them offline or to move them between host systems as a single administrative unit.

VGs can grow their storage pool by absorbing new PVs or shrink by retracting from PVs. This may involve moving already-allocated LEs out of the PV. Most volume managers can perform this movement online; if the underlying hardware is hot-pluggable this allows engineers to upgrade or replace storage without system downtime.

Concepts

Hybrid volume

A hybrid volume is any volume that intentionally and opaquely makes use of two separate physical volumes. For instance, a workload may consist of random seeks so an SSD may be used to permanently store frequently used or recently written data, while using higher-capacity rotational magnetic media for long-term storage of rarely needed data. On Linux, bcache or dm-cache may be used for this purpose, while Fusion Drive may be used on OS X. ZFS also implements this functionality at the file system level, by allowing administrators to configure multi-level read/write caching.

Hybrid volumes present a similar concept as hybrid drives, which also combine solid-state storage and rotational magnetic media.

Snapshots

Some volume managers also implement snapshots by applying copy-on-write to each LE. In this scheme, the volume manager will copy the LE to a copy-on-write table just before it is written to. This preserves an old version of the LV, the snapshot, which may be later reconstructed by overlaying the copy-on-write table atop the current LV. Unless the volume management supports both thin provisioning and discard, once an LE in the origin volume is written to, it is permanently stored in the snapshot volume. If the snapshot volume was made smaller than its origin, which is a common practice, this may render the snapshot inoperable.

Snapshots can be useful for backing up self-consistent versions of volatile data such as table files from a busy database, or for rolling back large changes (such as an operating system upgrade) in a single operation. Snapshots have a similar effect as rendering storage quiescent, and are similar to the shadow copy (VSS) service in Microsoft Windows.

Some Linux-based Live CDs also use snapshots to simulate read-write access to a read-only optical disc.

Implementations

VendorIntroduced inVolume managerAllocate anywhere [lower-alpha 1] Snapshots RAID 0 RAID 1 RAID 5 RAID 10 Thin provisioning Notes
IBM AIX 3.0 (1989) Logical Volume Manager YesYes [lower-alpha 2] YesYesNoYes [lower-alpha 3] Refers to PEs as PPs (physical partitions), and to LEs as LPs (logical partitions). Does not have a copy-on-write snapshot mechanism; creates snapshots by freezing one volume of a mirror pair.
Hewlett-Packard HP-UX 9.0HP Logical Volume ManagerYesYesYesYesNoYes
FreeBSD Foundation FreeBSD Vinum Volume Manager YesYes [lower-alpha 4] YesYesYesYesThe FreeBSD fast file system (UFS) supports snapshots.
FreeBSD Foundation FreeBSD ZFS YesYesYesYesYesYesYesA file system with integrated volume management
The NetBSD Foundation, Inc. NetBSD Logical Volume Manager YesNoYesYesNoNoNetBSD from version 6.0 supports its own re-implementation of Linux LVM. Re-implementation is based on a BSD licensed device-mapper driver and uses a port of Linux lvm tools as the userspace part of LVM. There is no need to support RAID5 in LVM because of NetBSD superior RAIDFrame subsystem.
NetBSD ZFS YesYesYesYesYesYesYesA file system with integrated volume management
NetBSD § 5.0 (2009) bioctl arcmsr [1] NoNoYes [2] Yes [2] Yes [2] Yes [2] bioctl on NetBSD can be used for both maintenance and initialisation of hardware RAID, although initialisation (through BIOCVOLOPS ioctl) is only supported by a single driver as of 2019 — arcmsr(4) [1] [2] ; software RAID is supported separately through RAIDframe [3] [4] and ZFS
The OpenBSD Project OpenBSD 4.2 (2007) bioctl softraid [5] YesNoYesYesYesYes bioctl on OpenBSD can be used for maintenance of hardware RAID, as well as for both initialisation and maintenance of software RAID
Sistina Linux 2.2 Logical Volume Manager version 1 YesYesYesYesNoNo
IBM Linux 2.4 Enterprise Volume Management System YesYesYesYesYesNo
Sistina Linux 2.6 and above Logical Volume Manager version 2 YesYesYesYesYesYesYes
Oracle Linux 2.6 and above Btrfs YesYesYesYesYes (not stable)YesA file system with integrated volume management
Silicon Graphics IRIX or Linux XVM Volume Manager YesYesYesYesYes
Sun Microsystems SunOS Solaris Volume Manager (was Solstice DiskSuite).NoNoYesYesYesYesRefers to PVs as volumes (which can be combined with RAID0, RAID1 or RAID5 primitives into larger volumes), to LVs as soft partitions (which are contiguous extents placeable anywhere on volumes, but which cannot span multiple volumes), and to VGs as disk sets.
Solaris 10 ZFS YesYesYesYesYesYesYesA file system with integrated volume management
illumos ZFS YesYesYesYesYesYesYesA file system with integrated volume management
Veritas [lower-alpha 5] Cross-OS Veritas Volume Manager (VxVM)YesYesYesYesYesYesRefers to LVs as volumes, to VGs as disk groups; has variably-sized PEs called subdisks and LEs called plexes.
Microsoft Windows 2000 and later NT-based operating systems Logical Disk Manager YesYes [lower-alpha 6] YesYesYesNoNoDoes not have a concept of PEs or LEs; can only RAID0, RAID1, RAID5 or concatenate disk partitions into larger volumes; file systems must span whole volumes.
Windows 8 Storage Spaces [6] YesYesNoYesYesNoYesHigher-level logic than RAID1 and RAID5 - multiple storage spaces span multiple disks of different size, storage spaces are resilient from physical failure with either mirroring (at least 2 disks) or striped parity (at least 3 disks), disk management and data recovery is fully automatic
Windows 10 Storage Spaces YesYesYesYesYesYesYesRAID 10 is called disk mirroring
Red Hat Linux 4.14 and above Stratis [7] YesYesNoNoNoNoYesRAID support planned in 2.0 version [8]
Apple Mac OS X Lion Core Storage Yes [9] NoNoNoNoNoNoCurrently, it is used in Lion's implementation of FileVault, in order to allow for full disk encryption, as well as Fusion Drive, which is merely a multi-PV LVG.

Snapshots are handled by Time Machine; Software-based RAID is provided by AppleRAID. Both are separate from Core Storage.

Disadvantages

Logical volumes can suffer from external fragmentation when the underlying storage devices do not allocate their PEs contiguously. This can reduce I/O performance on slow-seeking media such as magnetic disks and other rotational media. Volume managers that use fixed-size PEs, however, typically make PEs relatively large (for example, Linux LVM uses 4 MB by default) in order to amortize the cost of these seeks.

With implementations that are solely volume management, such as Core Storage and Linux LVM, separating and abstracting away volume management from the file system loses the ability to easily make storage decisions for particular files or directories. For example, if a certain directory (but not the entire file system) is to be permanently moved to faster storage, both the file system layout and the underlying volume management layer need to be traversed. For example, on Linux it would be needed to manually determine the offset of a file's contents within a file system and then manually pvmove the extents (along with data not related to that file) to the faster storage. Having volume and file management implemented within the same subsystem, instead of having them implemented as separate subsystems, makes the overall process theoretically simpler.

Notes

  1. Denotes whether the volume manager allows LVs to grow and span onto any PV in the VG
  2. JFS2 snapshots
  3. AIX 5.1
  4. UFS snapshots
  5. Third-party product, available for Windows and many Unix-like OSes
  6. Windows Server 2003 and later

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk, before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

fdisk Command line utility of DOS and Microsoft Windows operating systems

fdisk is a command-line utility for disk partitioning. It has been part of DOS, DR FlexOS, IBM OS/2, and early versions of Microsoft Windows, as well as certain ports of FreeBSD, NetBSD, OpenBSD, DragonFly BSD and macOS for compatibility reasons. Windows 2000 and its successors have replaced fdisk with a more advanced tool called diskpart.

<span class="mw-page-title-main">Data striping</span> Data segmentation technique

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

<span class="mw-page-title-main">Snapshot (computer storage)</span> Recorded state of a computer storage system at a particular point in time

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

<span class="mw-page-title-main">GUID Partition Table</span> Computer disk partitioning standard

The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers, which are also known as globally unique identifiers (GUIDs). Forming a part of the Unified Extensible Firmware Interface (UEFI) standard, it is nevertheless also used for some BIOSs, because of the limitations of master boot record (MBR) partition tables, which use 32 bits for logical block addressing (LBA) of traditional 512-byte disk sectors.

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

iostat

iostat is a computer system monitor tool used to collect and show operating system storage input and output statistics. It is often used to identify performance issues with storage devices, including local disks, or remote disks accessed over network file systems such as NFS. It can also be used to provide information about terminal (TTY) input and output, and also includes some basic CPU information.

The Logical Disk Manager (LDM) is an implementation of a logical volume manager for Microsoft Windows NT, developed by Microsoft and Veritas Software. It was introduced with the Windows 2000 operating system, and is supported in Windows XP, Windows Server 2003, Windows Vista, Windows 7, Windows 8, Windows 10 and Windows 11. The MMC-based Disk Management snap-in hosts the Logical Disk Manager. On Windows 8 and Windows Server 2012, Microsoft deprecated LDM in favor of Storage Spaces.

The Linux Unified Key Setup (LUKS) is a disk encryption specification created by Clemens Fruhwirth in 2004 and originally intended for Linux.

dm-crypt is a transparent block device encryption subsystem in Linux kernel versions 2.6 and later and in DragonFly BSD. It is part of the device mapper (dm) infrastructure, and uses cryptographic routines from the kernel's Crypto API. Unlike its predecessor cryptoloop, dm-crypt was designed to support advanced modes of operation, such as XTS, LRW and ESSIV, in order to avoid watermarking attacks. In addition to that, dm-crypt addresses some reliability problems of cryptoloop.

GEOM is the main storage framework for the FreeBSD operating system. It is available in FreeBSD 5.0 and later releases, and provides a standardized way to access storage layers. GEOM is modular and allows for geom modules to connect to the framework. For example, the geom_mirror module provides RAID1 or mirroring functionality to the system. A number of modules are provided as part of FreeBSD and others have been developed independently and are distributed via (e.g.) GitHub.

gpart is a software utility which scans a storage device, examining the data in order to detect partitions which may exist but are absent from the disk's partition tables. Gpart was written by Michail Brzitwa of Germany. The release on the author's website is now older than the releases some distributions are using. It appears that Michail Brzitwa does not actively maintain the code, instead the various distributions appear to maintain their own versions.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an application program to interact with a device by using its device driver via standard input/output system calls. Using standard system calls simplifies many programming tasks, and leads to consistent user-space I/O mechanisms regardless of device features and functions.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

Core Storage is a logical volume management system on macOS that was introduced by Apple to Mac OS X Lion. Core Storage is a layer between the disk partition and the file system.

Stratis is a user-space configuration daemon that configures and monitors existing components from Linux's underlying storage components of logical volume management (LVM) and XFS filesystem via D-Bus.

References

  1. 1 2 Juan Romero Pardines (2007/2008); David Gwynne (2006). "arcmsr — Areca Technology Corporation SATA/SAS RAID controller". NetBSD Kernel Interfaces Manual. NetBSD.{{cite web}}: CS1 maint: numeric names: authors list (link)
  2. 1 2 3 4 5 Juan Romero Pardines (2007/2008); David Gwynne (2006). "arcmsr.c § arc_bio_volops". BSD Cross Reference. NetBSD.{{cite web}}: CS1 maint: numeric names: authors list (link)
  3. The NetBSD Foundation, Inc. (1998); Carnegie-Mellon University (1995). "raid — RAIDframe disk driver". NetBSD Kernel Interfaces Manual. NetBSD.{{cite web}}: CS1 maint: numeric names: authors list (link)
  4. The NetBSD Foundation, Inc. (1998); Carnegie-Mellon University (1995). "raidctl — configuration utility for the RAIDframe disk driver". NetBSD System Manager's Manual. NetBSD.{{cite web}}: CS1 maint: numeric names: authors list (link)
  5. Marco Peereboom; Todd T. Fries (2007). "softraid — software RAID". Device Drivers Manual. OpenBSD.
  6. "MSDN Blogs - Building Windows 8: Virtualizing Storage for Scale, Resiliency, and Efficiency". Blogs.MSDN.com.
  7. "Stratis Storage". Stratis-storage.github.io. Retrieved 2019-08-05.
  8. "Stratis Software Design: Version 1.0.0∗" (PDF). September 27, 2018. Retrieved 2019-08-05.
  9. "man page diskutil section 8". ManPagez.com. Retrieved 2011-10-06.

Sources