Degraded mode

Last updated

When a RAID array experiences the failure of one or more disks, it can enter degraded mode [1] , a fallback mode that generally allows the continued usage of the array, but either loses the performance boosts of the RAID technique (such as a RAID-1 mirror across two disks when one of them fails; performance will fall back to that of a normal, single drive) or experiences severe performance penalties due to the necessity to reconstruct the damaged data from error correction data.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This was in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

Depending on the severity of the problem, the array may be placed into a read-only mode, either automatically or by the system administrator, until it can be corrected. Such corrections may or may not be possible to do on-line (as opposed to an "off-line" repair, during which the system is unavailable to users). Many RAID configurations feature spare disks that are already installed and can be automatically added to the array as needed; when this happens, the array may or may not go into degraded mode until the spare is rebuilt, but in any case should not be in degraded mode after the spare is rebuilt. If no spares are available, the array will remain in degraded mode until a spare disk is added, or the array is reconfigured (if that is possible for the configuration in question).

System administrator person who maintains and operates a computer system and/or network

A system administrator, or sysadmin, is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems; especially multi-user computers, such as servers. The system administrator seeks to ensure that the uptime, performance, resources, and security of the computers they manage meet the needs of the users, without exceeding a set budget when doing so.

A typical case where a RAID enters degraded mode is a simple two-drive mirror after a power failure – it is unlikely the drives are in sync. Every time blocks are written to the storage elements (physical drives, in this case), certain accounting information is updated after the write. The RAID controller will notice that the storage elements are not in sync, will place the array in degraded mode, and – generally – will start a background resync (rebuild) operation. Simple mirroring solutions will resynchronize the entire array, block by block, across both drives, which can be quite time-consuming; this time can be greatly reduced by the usage of a write intent bitmap [2] .

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions, some of which use it as the default file system.

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

ZFS is a combined file system and logical volume manager designed by Sun Microsystems. ZFS is scalable, and includes extensive protection against data corruption, support for high storage capacities, efficient data compression, integration of the concepts of filesystem and volume management, snapshots and copy-on-write clones, continuous integrity checking and automatic repair, RAID-Z, native NFSv4 ACLs, and can be very precisely configured. The two main implementations, by Oracle and by the OpenZFS project, are extremely similar, making ZFS widely available within Unix-like systems.

The Write Anywhere File Layout (WAFL) is a file layout that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

In Linux, Logical Volume Manager (LVM) is a device mapper target that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Disk mirroring replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability

In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representation of separate volume copies.

Is a computer storage product by NetApp running ONTAP operation system; terms ONTAP, AFF, FAS often used as synonyms, also filer is a synonyms though it is not an official name. There are two types of FAS systems, Hybrid and All-Flash:

  1. NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
  2. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS.

Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then correct detected errors using redundant data in the form of different checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.

In computing, error recovery control (ERC) is a feature of hard disks which allow a system administrator to configure the amount of time a drive's firmware is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in hardware or software RAID environments. In some cases, there is a conflict as to whether error handling should be undertaken by the hard drive or by the RAID implementation, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided.

In computer storage, the standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives (HDDs). The most common types are RAID 0 (striping), RAID 1 and its variants (mirroring), RAID 5, and RAID 6. RAID levels and their associated data formats are standardized by the Storage Networking Industry Association (SNIA) in the Common RAID Disk Drive Format (DDF) standard.

Nested RAID levels, also known as hybrid RAID, combine two or more of the standard RAID levels to gain performance, additional redundancy or both, as a result of combining properties of different standard RAID layouts.

Although all RAID implementations differ from the specification to some extent, some companies and open-source projects have developed non-standard RAID implementations that differ substantially from the standard. Additionally, there are non-RAID drive architectures, providing configurations of multiple hard drives not referred to by RAID acronyms.

mdadm is a Linux utility used to manage and monitor software RAID devices. It is used in modern GNU/Linux distributions in place of older software RAID utilities such as raidtools2 or raidtools.

Btrfs is a file system based on the copy-on-write (COW) principle, initially designed at Oracle Corporation for use in Linux. The development of Btrfs began in 2007, and since November 2013 the file system's on-disk format has been marked as stable.

A trim command allows an operating system to inform a solid-state drive (SSD) which blocks of data are no longer considered in use and can be wiped internally.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with similarity to RAID, several tongue-in-cheek

bcache is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices, such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides performance improvements.

dm-cache is a component of the Linux kernel's device mapper, which is a framework for mapping block devices onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as a cache for one or more slower storage devices such as hard disk drives (HDDs); this effectively creates hybrid volumes and provides secondary storage performance improvements.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed word "Data" from it and remove 7-Mode image, therefore, ONTAP 9 is successor from Clustered Data ONTAP 8.

References

  1. Vadala, Derek. Managing RAID on Linux, "O'Reilly Media, Inc.", 2003, pp. 31-32
  2. Linux RAID Wiki, "Write-intent bitmap", "https://raid.wiki.kernel.org/index.php/Write-intent_bitmap", published March 21, 2011, accessed April 17, 2018