Disk mirroring

Last updated
RAID 1 layout Raid1 Version for Wiki.jpg
RAID 1 layout

In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representation of separate volume copies.

Contents

In a disaster recovery context, mirroring data over long distance is referred to as storage replication. Depending on the technologies used, replication can be performed synchronously, asynchronously, semi-synchronously, or point-in-time. Replication is enabled via microcode on the disk array controller or via server software. It is typically a proprietary solution, not compatible between various data storage device vendors.

Mirroring is typically only synchronous. Synchronous writing typically achieves a recovery point objective (RPO) of zero lost data. Asynchronous replication can achieve an RPO of just a few seconds while the remaining methodologies provide an RPO of a few minutes to perhaps several hours.

Disk mirroring differs from file shadowing that operates on the file level, and disk snapshots where data images are never re-synced with their origins.

Overview

Typically, mirroring is provided in either hardware solutions such as disk arrays, or in software within the operating system (such as Linux mdadm and device mapper). [1] [2] Additionally, file systems like Btrfs or ZFS provide integrated data mirroring. [3] [4] There are additional benefits from Btrfs and ZFS, which maintain both data and metadata integrity checksums, making themselves capable of detecting bad copies of blocks, and using mirrored data to pull up data from correct blocks. [5]

There are several scenarios for what happens when a disk fails. In a hot swap system, in the event of a disk failure, the system itself typically diagnoses a disk failure and signals a failure. Sophisticated systems may automatically activate a hot standby disk and use the remaining active disk to copy live data onto this disk. Alternatively, a new disk is installed and the data is copied to it. In less sophisticated systems, the system is operated on the remaining disk until a spare disk can be installed.

The copying of data from one side of a mirror pair to another is called rebuilding or, less commonly, resilvering. [6]

Mirroring can be performed site to site either by rapid data links, for example fibre optic links, which over distances of 500 m or so can maintain adequate performance to support real-time mirroring. Longer distances or slower links maintain mirrors using an asynchronous copying system. For remote disaster recovery systems, this mirroring may not be done by integrated systems but simply by additional applications on primary and secondary machines.

Additional benefits

In addition to providing an additional copy of the data for the purpose of redundancy in case of hardware failure, disk mirroring can allow each disk to be accessed separately for reading purposes. Under certain circumstances, this can significantly improve performance as the system can choose for each read which disk can seek most quickly to the required data. This is especially significant where there are several tasks competing for data on the same disk, and thrashing (where the switching between tasks takes up more time than the task itself) can be reduced. This is an important consideration in hardware configurations that frequently access the data on the disk.

See also

Related Research Articles

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot.

<span class="mw-page-title-main">Data striping</span>

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

<span class="mw-page-title-main">Data corruption</span> Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

<span class="mw-page-title-main">Distributed Replicated Block Device</span> Distributed replicated storage system for Linux

DRBD is a distributed replicated storage system for the Linux platform. It is implemented as a kernel driver, several userspace management applications, and some shell scripts. DRBD is traditionally used in high availability (HA) computer clusters, but beginning with DRBD version 9, it can also be used to create larger software defined storage pools with a focus on cloud integration.

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a complete copy of the actual data, or only select pieces of data that allow detection of errors and reconstruction of lost or damaged data up to a certain level.

Although all RAID implementations differ from the specification to some extent, some companies and open-source projects have developed non-standard RAID implementations that differ substantially from the standard. Additionally, there are non-RAID drive architectures, providing configurations of multiple hard drives not referred to by RAID acronyms.

mdadm is a Linux utility used to manage and monitor software RAID devices. It is used in modern Linux distributions in place of older software RAID utilities such as raidtools2 or raidtools.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was initially designed at Oracle Corporation in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel. According to Oracle, Btrfs "is not a true acronym".

A journaling file system is a file system that keeps track of changes not yet committed to the file system's main part by recording the goal of such changes in a data structure known as a "journal", which is usually a circular log. In the event of a system crash or power failure, such file systems can be brought back online more quickly with a lower likelihood of becoming corrupted.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

<span class="mw-page-title-main">ZFS</span> File system

ZFS is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009–2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.

References

  1. "ANNOUNCE: mdadm 3.3 - A tools for managing md Soft RAID under Linux". gmane.org. 2013-09-03. Archived from the original on 2014-08-21. Retrieved 2013-11-20.
  2. "Logical Volume Manager Administration". Appendix A. The Device Mapper. Red Hat . Retrieved 2013-09-29.
  3. "Using Btrfs with Multiple Devices". kernel.org. 2013-11-07. Retrieved 2013-11-20.
  4. "Actually it's a n-way mirror". c0t0d0s0.org. 2013-09-04. Archived from the original on 2013-09-14. Retrieved 2013-11-20.
  5. McPherson, Amanda (22 June 2009). "A Conversation with Chris Mason on BTRfs: the next generation file system for Linux". Linux Foundation. Archived from the original on 27 June 2012. Retrieved 2013-11-22.
  6. "Why Is It Called 'Resilvering'?". The Lone SysAdmin. Retrieved 2013-09-19.