Mdadm

Last updated
mdadm
Original author(s) Neil Brown
Developer(s) Jes Sorensen
Initial release2001
Stable release
4.3 [1] / February 15, 2024;2 months ago (2024-02-15)
Repository github.com/md-raid-utilities/mdadm/
Written in C
Operating system Linux
Available in English
Type Disk utility
License GNU GPL
Website raid.wiki.kernel.org   OOjs UI icon edit-ltr-progressive.svg

mdadm is a Linux utility used to manage and monitor software RAID devices. It is used in modern Linux distributions in place of older software RAID utilities such as raidtools2 or raidtools. [2] [3] [4]

Contents

mdadm is free software originally [5] maintained by, and copyrighted to, Neil Brown of SUSE, and licensed under the terms of version 2 or later of the GNU General Public License.

Name

The name is derived from the md (multiple device) device nodes it administers or manages, and it replaced a previous utility mdctl.[ citation needed ] The original name was "Mirror Disk", but was changed as more functions were added.[ citation needed ] The name is now understood to be short for Multiple Disk and Device Management. [2]

Overview

Linux software RAID configurations can include anything presented to the Linux kernel as a block device. This includes whole hard drives (for example, /dev/sda), and their partitions (for example, /dev/sda1).

RAID configurations

RAID 10 is distinct from RAID 0+1, which consists of a top-level RAID 1 mirror composed of high-performance RAID 0 stripes directly across the physical hard disks. A single-drive failure in a RAID 10 configuration results in one of the lower-level mirrors entering degraded mode, but the top-level stripe performing normally (except for the performance hit). A single-drive failure in a RAID 0+1 configuration results in one of the lower-level stripes completely failing, and the top-level mirror entering degraded mode. Which of the two setups is preferable depends on the details of the application in question, such as whether or not spare disks are available, and how they should be spun up.

Non-RAID configurations

Features

The original (standard) form of names for md devices is /dev/md<n>, where <n> is a number between 0 and 99. More recent kernels have support for names such as /dev/md/Home. Under 2.4.x kernels and earlier these two were the only options. Both of them are non-partitionable.

Since 2.6.x kernels, a new type of MD device was introduced, a partitionable array. The device names were modified by changing md to md_d. The partitions were identified by adding p<n>, where <n> is the partition number; thus /dev/md/md_d2p3 for example. Since version 2.6.28 of the Linux kernel mainline, non-partitionable arrays can be partitioned, the partitions being referred to in the same way as for partitionable arrays  for example, /dev/md/md1p2.

Since version 3.7 of the Linux kernel mainline, md supports TRIM operations for the underlying solid-state drives (SSDs), for linear, RAID 0, RAID 1, RAID 5 and RAID 10 layouts. [7]

Booting

Since support for MD is found in the kernel, there is an issue with using it before the kernel is running. Specifically it will not be present if the boot loader is either (e)LiLo or GRUB legacy. Although normally present, it may not be present for GRUB 2. In order to circumvent this problem a /boot filesystem must be used either without md support, or else with RAID1. In the latter case the system will boot by treating the RAID1 device as a normal filesystem, and once the system is running it can be remounted as md and the second disk added to it. This will result in a catch-up, but /boot filesystems are usually small.

With more recent bootloaders it is possible to load the MD support as a kernel module through the initramfs mechanism. This approach allows the /boot filesystem to be inside any RAID system without the need of a complex manual configuration.

External metadata

Besides its own formats for RAID volumes metadata, Linux software RAID also supports external metadata formats, since version 2.6.27 of the Linux kernel and version 3.0 of the mdadm userspace utility. This allows Linux to use various firmware- or driver-based RAID volumes, also known as "fake RAID". [8]

As of October 2013, there are two supported formats of the external metadata:

mdmpd

mdmpd was [10] a daemon used for monitoring MD multipath devices up to Linux kernel 2.6.10-rc1, developed by Red Hat as part of the mdadm package. [11] The program was used to monitor multipath (RAID) devices, and is usually started at boot time as a service, and afterwards running as a daemon.

Enterprise storage requirements often include the desire to have more than one way to talk to a single disk drive so that in the event of some failure to talk to a disk drive via one controller, the system can automatically switch to another controller and keep going. This is called multipath disk access. The linux kernel implements multipath disk access via the software RAID stack known as the md (Multiple Devices) driver. The kernel portion of the md multipath driver only handles routing I/O requests to the proper device and handling failures on the active path. It does not try to find out if a path that has previously failed might be working again. That's what this daemon does. Upon startup, it reads the current state of the md raid arrays, saves that state, and then waits for the kernel to tell it something interesting has happened. It then wakes up, checks to see if any paths on a multipath device have failed, and if they have then it starts to poll the failed path once every 15 seconds until it starts working again. Once it starts working again, the daemon will then add the path back into the multipath md device it was originally part of as a new spare path.

If one is using the /proc filesystem, /proc/mdstat lists all active md devices with information about them. Mdmpd requires this to find arrays to monitor paths on, to get notification of interesting events and to monitor array reconstruction on Monitor mode. [12]

Technical details RAID 1

The data on a RAID 1 volume is the same as on a normal partition. The RAID information is stored in the last 128kB of the partition. This means, to convert a RAID 1 volume to normal data partition, it is possible to decrease the partition size by 128kB and change the partition ID from fd to 83 (for Linux).

See also

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

<span class="mw-page-title-main">Disk partitioning</span> Creation of separate accessible storage areas on a secondary computer storage device

Disk partitioning or disk slicing is the creation of one or more regions on secondary storage, so that each region can be managed separately. These regions are called partitions. It is typically the first step of preparing a newly installed disk, before any file system is created. The disk stores the information about the partitions' locations and sizes in an area known as the partition table that the operating system reads before any other part of the disk. Each partition then appears to the operating system as a distinct "logical" disk that uses part of the actual disk. System administrators use a program called a partition editor to create, resize, delete, and manipulate the partitions. Partitioning allows the use of different filesystems to be installed for different kinds of files. Separating user data from system data can prevent the system partition from becoming full and rendering the system unusable. Partitioning can also make backing up easier. A disadvantage is that it can be difficult to properly size partitions, resulting in having one partition with too much free space and another nearly totally allocated.

<span class="mw-page-title-main">Data striping</span> Data segmentation technique

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

udev is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the /dev directory. At the same time, udev also handles all user space events raised when hardware devices are added into the system or removed from it, including firmware loading as required by certain devices.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level virtual block devices. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, and offers additional features such as file system snapshots.

<span class="mw-page-title-main">Disk mirroring</span>

In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representation of separate volume copies.

In Linux systems, initrd is a scheme for loading a temporary root file system into memory, to be used as part of the Linux startup process. initrd and initramfs refer to two different methods of achieving this. Both are commonly used to make preparations before the real root file system can be mounted.

Vinum is a logical volume manager, also called software RAID, allowing implementations of the RAID-0, RAID-1 and RAID-5 models, both individually and in combination. The original Vinum was part of the base distribution of the FreeBSD operating system since 3.0, and also NetBSD between 2003-10-10 and 2006-02-25, as well as descendants of FreeBSD, including DragonFly BSD; in more recent versions of FreeBSD, it has been replaced with gvinum, which was first introduced around FreeBSD 6. Vinum source code is maintained in the FreeBSD and DragonFly source trees. Vinum supports RAID levels 0, 1, 5, and JBOD. Vinum was inspired by Veritas Volume Manager.

The Linux Unified Key Setup (LUKS) is a disk encryption specification created by Clemens Fruhwirth in 2004 and originally intended for Linux.

Although all RAID implementations differ from the specification to some extent, some companies and open-source projects have developed non-standard RAID implementations that differ substantially from the standard. Additionally, there are non-RAID drive architectures, providing configurations of multiple hard drives not referred to by RAID acronyms.

The Linux booting process involves multiple stages and is in many ways similar to the BSD and other Unix-style boot processes, from which it derives. Although the Linux booting process depends very much on the computer architecture, those architectures share similar stages and software components, including system startup, bootloader execution, loading and startup of a Linux kernel image, and execution of various startup scripts and daemons. Those are grouped into 4 steps: system startup, bootloader stage, kernel stage, and init process. When a Linux system is powered up or reset, its processor will execute a specific firmware/program for system initialization, such as Power-on self-test, invoking the reset vector to start a program at a known address in flash/ROM, then load the bootloader into RAM for later execution. In personal computer (PC), not only limited to Linux-distro PC, this firmware/program is called BIOS, which is stored in the mainboard. In embedded Linux system, this firmware/program is called boot ROM. After being loaded into RAM, bootloader will execute to load the second-stage bootloader. The second-stage bootloader will load the kernel image into memory, decompress and initialize it then pass control to this kernel image. Second-stage bootloader also performs several operation on the system such as system hardware check, mounting the root device, loading the necessary kernel modules, etc. Finally, the first user-space process starts, and other high-level system initializations are performed.

dm-crypt is a transparent block device encryption subsystem in Linux kernel versions 2.6 and later and in DragonFly BSD. It is part of the device mapper (dm) infrastructure, and uses cryptographic routines from the kernel's Crypto API. Unlike its predecessor cryptoloop, dm-crypt was designed to support advanced modes of operation, such as XTS, LRW and ESSIV, in order to avoid watermarking attacks. In addition to that, dm-crypt addresses some reliability problems of cryptoloop.

GEOM is the main storage framework for the FreeBSD operating system. It is available in FreeBSD 5.0 and later releases, and provides a standardized way to access storage layers. GEOM is modular and allows for geom modules to connect to the framework. For example, the geom_mirror module provides RAID1 or mirroring functionality to the system. A number of modules are provided as part of FreeBSD and others have been developed independently and are distributed via (e.g.) GitHub.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

GVfs is GNOME's userspace virtual filesystem designed to work with the I/O abstraction of GIO, a library available in GLib since version 2.15.1. It installs several modules that are automatically used by applications using the APIs of libgio. There is also FUSE support that allows applications not using GIO to access the GVfs filesystems.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

Device Mapper Multipath Input Output often shortened to DM-Multipathing and abbreviated as DM-MPIO provides input-output (I/O) fail-over and load-balancing by using multipath I/O within Linux for block devices. By utilizing device-mapper, the multipathd daemon provides the host-side logic to use multiple paths of a redundant network to provide continuous availability and higher-bandwidth connectivity between the host server and the block-level device. DM-MPIO handles the rerouting of block I/O to an alternate path in the event of a path failure. DM-MPIO can also balance the I/O load across all of the available paths that are typically utilized in Fibre Channel (FC) and iSCSI SAN environments. DM-MPIO is based on the device mapper, which provides the basic framework that maps one block device onto another.

bcache is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

References

  1. "Release mdadm-4.3". 2024-02-15.
  2. 1 2 Bresnahan, Christine; Blum, Richard (2016). LPIC-2: Linux Professional Institute Certification Study Guide. John Wiley & Sons. pp. 206–221. ISBN   9781119150817.
  3. Vadala, Derek (2003). Managing RAID on Linux . O'Reilly Media, Inc. p.  140. ISBN   9781565927308. mdadm linux.
  4. Nemeth, Evi (2011). UNIX and Linux System Administration Handbook. Pearson Education. pp. 242–245. ISBN   9780131480056.
  5. "Mdadm". Archived from the original on 2013-05-03. Retrieved 2007-08-25.
  6. "md: Remove deprecated CONFIG_MD_LINEAR". 2023-12-19. Retrieved 2024-04-30.
  7. "Linux kernel 3.7, Section 5. Block". kernelnewbies.org. 2012-12-10. Retrieved 2014-09-21.
  8. 1 2 "External Metadata". RAID Setup. kernel.org. 2013-10-05. Retrieved 2014-01-01.
  9. "DDF Fake RAID". RAID Setup. kernel.org. 2013-09-12. Retrieved 2014-01-01.
  10. "117498 – md code missing event interface".
  11. "Updated mdadm package includes multi-path device enhancements". RHEA-2003:397-06. Redhat. 2004-01-16.
  12. "Mdadm(8): Manage MD devices aka Software RAID - Linux man page".