Logical Volume Manager (Linux)

Last updated
Logical Volume Manager
Original author(s) Heinz Mauelshagen [1]
Stable release
2.03.21 [2]   OOjs UI icon edit-ltr-progressive.svg / 21 April 2023;20 months ago (21 April 2023)
Repository sourceware.org/git/?p=lvm2.git
Written in C
Operating system Linux, NetBSD
License GPLv2
Website sourceware.org/lvm2/

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume. [3] [4] [5]

Contents

Heinz Mauelshagen wrote the original LVM code in 1998, when he was working at Sistina Software, taking its primary design guidelines from the HP-UX's volume manager. [1]

Uses

LVM is used for the following purposes:

LVM can be considered as a thin software layer on top of the hard disks and partitions, which creates an abstraction of continuity and ease-of-use for managing hard drive replacement, repartitioning and backup.

Features

Various elements of the LVM Lvm.svg
Various elements of the LVM

Basic functionality

Advanced functionality

RAID

High availability

The LVM also works in a shared-storage cluster in which disks holding the PVs are shared between multiple host computers, but can require an additional daemon to mediate metadata access via a form of locking.

CLVM
A distributed lock manager is used to broker concurrent LVM metadata accesses. Whenever a cluster node needs to modify the LVM metadata, it must secure permission from its local clvmd, which is in constant contact with other clvmd daemons in the cluster and can communicate a desire to get a lock on a particular set of objects.
HA-LVM
Cluster-awareness is left to the application providing the high availability function. For the LVM's part, HA-LVM can use CLVM as a locking mechanism, or can continue to use the default file locking and reduce "collisions" by restricting access to only those LVM objects that have appropriate tags. Since this simpler solution avoids contention rather than mitigating it, no concurrent accesses are allowed, so HA-LVM is considered useful only in active-passive configurations.
lvmlockd
As of 2017, a stable LVM component that is designed to replace clvmd by making the locking of LVM objects transparent to the rest of LVM, without relying on a distributed lock manager. [14] It saw massive development during 2016. [15]

The above described mechanisms only resolve the issues with LVM's access to the storage. The file system selected to be on top of such LVs must either support clustering by itself (such as GFS2 or VxFS) or it must only be mounted by a single cluster node at any time (such as in an active-passive configuration).

Volume group allocation policy

LVM VGs must contain a default allocation policy for new volumes created from it. This can later be changed for each LV using the lvconvert -A command, or on the VG itself via vgchange --alloc. To minimize fragmentation, LVM will attempt the strictest policy (contiguous) first and then progress toward the most liberal policy defined for the LVM object until allocation finally succeeds.

In RAID configurations, almost all policies are applied to each leg in isolation. For example, even if a LV has a policy of cling, expanding the file system will not result in LVM using a PV if it is already used by one of the other legs in the RAID setup. LVs with RAID functionality will put each leg on different PVs, making the other PVs unavailable to any other given leg. If this was the only option available, expansion of the LV would fail. In this sense, the logic behind cling will only apply to expanding each of the individual legs of the array.

Available allocation policies are:

Implementation

Basic example of an LVM head Example LVM head.png
Basic example of an LVM head
Inner workings of the version 1 of LVM. In this diagram, PE stands for a Physical Extent. LVM1.svg
Inner workings of the version 1 of LVM. In this diagram, PE stands for a Physical Extent.

Typically, the first megabyte of each physical volume contains a mostly ASCII-encoded structure referred to as an "LVM header" or "LVM head". Originally, the LVM head used to be written in the first and last megabyte of each PV for redundancy (in case of a partial hardware failure); however, this was later changed to only the first megabyte. Each PV's header is a complete copy of the entire volume group's layout, including the UUIDs of all other PVs and of LVs, and allocation map of PEs to LEs. This simplifies data recovery if a PV is lost.

In the 2.6-series of the Linux Kernel, the LVM is implemented in terms of the device mapper, a simple block-level scheme for creating virtual block devices and mapping their contents onto other block devices. This minimizes the amount of relatively hard-to-debug kernel code needed to implement the LVM. It also allows its I/O redirection services to be shared with other volume managers (such as EVMS). Any LVM-specific code is pushed out into its user-space tools, which merely manipulate these mappings and reconstruct their state from on-disk metadata upon each invocation.

To bring a volume group online, the "vgchange" tool:

  1. Searches for PVs in all available block devices.
  2. Parses the metadata header in each PV found.
  3. Computes the layouts of all visible volume groups.
  4. Loops over each logical volume in the volume group to be brought online and:
    1. Checks if the logical volume to be brought online has all its PVs visible.
    2. Creates a new, empty device mapping.
    3. Maps it (with the "linear" target) onto the data areas of the PVs the logical volume belongs to.

To move an online logical volume between PVs on the same Volume Group, use the "pvmove" tool:

  1. Creates a new, empty device mapping for the destination.
  2. Applies the "mirror" target to the original and destination maps. The kernel will start the mirror in "degraded" mode and begin copying data from the original to the destination to bring it into sync.
  3. Replaces the original mapping with the destination when the mirror comes into sync, then destroys the original.

These device mapper operations take place transparently, without applications or file systems being aware that their underlying storage is moving.

Caveats

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ext3, or third extended filesystem, is a journaled file system that is commonly used with the Linux kernel. It used to be the default file system for many popular Linux distributions but generally has been supplanted by its successor version ext4. The main advantage of ext3 over its predecessor, ext2, is journaling, which improves reliability and eliminates the need to check the file system after an improper, a.k.a. unclean, shutdown.

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

<span class="mw-page-title-main">Data striping</span> Data segmentation technique

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

Sistina Software was a US company that focused on storage solutions designed around a Linux platform. It originated in the University of Minnesota.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

<span class="mw-page-title-main">Snapshot (computer storage)</span> Recorded state of a computer storage system at a particular point in time

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

<span class="mw-page-title-main">Disk mirroring</span>

In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representation of separate volume copies.

<span class="mw-page-title-main">DRBD</span> Distributed replicated storage system for Linux

DRBD is a distributed replicated storage system for the Linux platform. It mirrors block devices between multiple hosts, functioning transparently to applications on the host systems. This replication can involve any type of block device, such as hard drives, partitions, RAID setups, or logical volumes.

Enterprise Volume Management System (EVMS) was a flexible, integrated volume management software used to manage storage systems under Linux.

AdvFS, also known as Tru64 UNIX Advanced File System, is a file system developed in the late 1980s to mid-1990s by Digital Equipment Corporation for their OSF/1 version of the Unix operating system. In June 2008, it was released as free software under the GPL-2.0-only license. AdvFS has been used in high-availability systems where fast recovery from downtime is essential.

The following tables compare general and technical information for a number of file systems.

The Linux Unified Key Setup (LUKS) is a disk encryption specification created by Clemens Fruhwirth in 2004 and originally intended for Linux.

dm-crypt is a transparent block device encryption subsystem in Linux kernel versions 2.6 and later and in DragonFly BSD. It is part of the device mapper (dm) infrastructure, and uses cryptographic routines from the kernel's Crypto API. Unlike its predecessor cryptoloop, dm-crypt was designed to support advanced modes of operation, such as XTS, LRW and ESSIV, in order to avoid watermarking attacks. In addition to that, dm-crypt addresses some reliability problems of cryptoloop.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was created by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

Core Storage is a logical volume management system on macOS that was introduced by Apple to Mac OS X Lion. Core Storage is a layer between the disk partition and the file system.

bcache is a cache mechanism in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

References

  1. 1 2 "LVM README". 2003-11-17. Retrieved 2014-06-25.
  2. "[lvm-devel] v2_03_21 annotated tag has been created". 21 April 2023. Retrieved 22 April 2023.
  3. "7.1.2 LVM Configuration with YaST". SUSE. 12 July 2011. Archived from the original on 25 July 2015. Retrieved 2015-05-22.
  4. "HowTo: Set up Ubuntu Desktop with LVM Partitions". Ubuntu. 1 June 2014. Archived from the original on 4 March 2016. Retrieved 2015-05-22.
  5. "9.15.4 Create LVM Logical Volume". Red Hat. 8 October 2014. Retrieved 2015-05-22.
  6. "BTRFS performance compared to LVM+EXT4 with regards to database workloads". 29 May 2018.
  7. "Tagging LVM2 Storage Objects". Micro Focus International. Retrieved 21 May 2015.
  8. "The Metadata Daemon". Red Hat Inc. Retrieved 22 May 2015.
  9. "Using LVM's new cache feature". 22 May 2014. Retrieved 2014-07-11.
  10. "2.3.5. Thinly-Provisioned Logical Volumes (Thin Volumes)". Access.redhat.com. Retrieved 2014-06-20.
  11. "4.101.3. RHBA-2012:0161 — lvm2 bug fix and enhancement update" . Retrieved 2014-06-08.
  12. "5.4.16. RAID Logical Volumes". Access.redhat.com. Retrieved 2017-02-07.
  13. "Controlling I/O Operations on a RAID1 Logical Volume". redhat.com. Retrieved 16 June 2014.
  14. "Re: LVM snapshot with Clustered VG [SOLVED]". 15 Mar 2013. Retrieved 2015-06-08.
  15. ""vmlockd.c git history"". Archived from the original on January 4, 2024.
  16. "Bug 9554 – write barriers over device mapper are not supported". 2009-07-01. Retrieved 2010-01-24.
  17. "Barriers and journaling filesystems". LWN. 2008-05-22. Retrieved 2008-05-28.
  18. "will pvmove'ing (an LV at a time) defragment?". 2010-04-29. Retrieved 2015-05-22.
  19. "Gotchas". btrfs Wiki. Archived from the original on January 4, 2024. Retrieved 2017-04-24.

Further reading