Backup rotation scheme

Last updated

A backup rotation scheme is a system of backing up data to computer media (such as tapes) that minimizes, by re-use, the number of media used. The scheme determines how and when each piece of removable storage is used for a backup job and how long it is retained once it has backup data stored on it. Different techniques have evolved over time to balance data retention and restoration needs with the cost of extra data storage media. Such a scheme can be quite complicated if it takes incremental backups, multiple retention periods, and off-site storage into consideration.

Contents

Schemes

First in, first out

A first in, first out (FIFO) backup scheme saves new or modified files onto the "oldest" media in the set, i.e. the media that contain the oldest and thus least useful previously backed up data. [1] Performing a daily backup onto a set of 14 media, the backup depth would be 14 days. Each day, the oldest media would be inserted when performing the backup. This is the simplest rotation scheme and is usually the first to come to mind.

This scheme has the advantage that it retains the longest possible tail of daily backups. It can be used when archived data is unimportant (or is retained separately from the short-term backup data) and data before the rotation period is irrelevant.

However, this scheme suffers from the possibility of data loss: suppose, an error is introduced into the data, but the problem is not identified until several generations of backups and revisions have taken place. Thus when the error is detected, all the backup files contain the error. It would then be useful to have at least one older version of the data, as it would not have the error.

Grandfather-father-son

Grandfather-father-son backup (GFS) is a common rotation scheme for backup media, [1] in which there are three or more backup cycles, such as daily, weekly and monthly. The daily backups are rotated on a 3-months basis using a FIFO system as above. The weekly backups are similarly rotated on a bi-yearly basis, and the monthly backup on a yearly basis. In addition, quarterly, half-yearly, and/or annual backups could also be separately retained. Often some of these backups are removed from the site for safekeeping and disaster recovery purposes.

Tower of Hanoi

The Tower of Hanoi rotation method is more complex. It is based on the mathematics of the Tower of Hanoi puzzle, using a recursive method to optimize the back-up cycle. Every tape corresponds to a disk in the puzzle, and every disk movement to a different peg corresponds with a backup to that tape. So the first tape is used every other day (1, 3, 5, 7, 9, ...), the second tape is used every fourth day (2, 6, 10, ...), the third tape is used every eighth day (4, 12, 20, ...). [2]

A set of n tapes (or other media) will allow backups for 2n−1 days before the last set is recycled. So, 3 tapes will give 4 days' worth of backups, and on the 4th day Set C will be overwritten; 4 tapes will give 8 days, and Set D is overwritten on the 9th day; 5 tapes will give 16 days, etc. Files can be restored from 1, 2, 4, 8, 16, ..., 2n−1 days ago. [3]

The following tables show which tapes are used on which days of various cycles. A disadvantage of the method is that half the backups are overwritten after only two days.

Three-tape Hanoi schedule

Day of the cycle
0102030405060708
SetAAAA
BB
CC

Four-tape Hanoi schedule

Day of the cycle
01020304050607080910111213141516
SetAAAAAAAA
BBBB
CC
DD

Five-tape Hanoi schedule

Day of the cycle
0102030405060708091011121314151617181920212223242526272829303132
SetAAAAAAAAAAAAAAAA
BBBBBBBB
CCCC
DD
EE

Extensions and example

Many variations are possible, and the concepts are readily extended to disc-based directories containing backups. Here are some options:

  • Save a base backup as set zero.
  • Save as many of the most recent backups as desired.
  • Save more than one of each set number, for greater coverage.

Coverage automatically gets sparser the further back in time one goes, which approximates the likelihood of needing to do restores from past backups.

And Tower of Hanoi has the huge advantage of freeing implementers from having to deal with managing hourly, daily, weekly, monthly, quarterly or annual management strategies.

In general, backup set number set is used at seq = 2set−1 + j × 2set, j = 0, 1, 2, 3, 4, ..., where seq is the sequence or serial number of a backup (also the Tower of Hanoi move number).

Here is an example showing coverage, including set 0, keeping at least the last 4 days, and recycling:

  • precious.20140515.seq.0 set 0
  • precious.20150205.seq.256 set 9
  • precious.20151026.seq.512 set 10
  • precious.20160311.seq.640 set 8
  • precious.20160516.seq.704 set 7
  • precious.20160601.seq.720 set 5
  • precious.20160609.seq.728 set 4
  • precious.20160617.seq.736 set 6
  • precious.20160618.seq.737.recycle set 1
  • precious.20160619.seq.738 set 2
  • precious.20160620.seq.739 set 1
  • precious.20160621.seq.740 set 3
  • precious.20160622.seq.741 set 1

Weighted random distribution

An alternative arrangement is to keep generations distributed across all points in time is by deleting (or overwriting) past generations (except the oldest and the most-recent-n generations) when necessary in a weighted-random fashion. For each deletion, the weight assigned to each deletable generation corresponds to the probability of it being deleted.

One acceptable weight is a constant exponent (possibly the square) of the multiplicative inverse of the duration (possibly expressed in the number of days) between the dates of the generation and the generation preceding it. Using a larger exponent leads to a more uniform distribution of generations, whereas a smaller exponent leads to a distribution with more recent and fewer older generations. This technique probabilistically ensures that past generations are always distributed across all points in time as is desired.

The weighted random method only has an advantage over a more systematic approach, when backups are irregular or missed.

Incremented media method

This method has many variations and names. A set of numbered media is used until the end of the cycle. Then the cycle is repeated using media numbered the same as the previous cycle, but incremented by one. The lowest numbered tape from the previous cycle is retired and kept permanently. Thus one has access to every backup for one cycle and to one backup per cycle before that. This method has the advantage of ensuring even media wear, but requires a schedule to be precalculated.

See also

Related Research Articles

<span class="mw-page-title-main">Computer data storage</span> Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

<span class="mw-page-title-main">Disk storage</span> General category of storage mechanisms

Disk storage is a general category of storage mechanisms where data is recorded by various electronic, magnetic, optical, or mechanical changes to a surface layer of one or more rotating disks. A disk drive is a device implementing such a storage mechanism. Notable types are today's hard disk drives (HDD) containing one or more non-removable rigid platters, the floppy disk drive (FDD) and its removable floppy disk, and various optical disc drives (ODD) and associated optical disc media.

<span class="mw-page-title-main">FIFO (computing and electronics)</span> Scheduling algorithm, the first piece of data inserted into a queue is processed first

In computing and in systems theory, first in, first out, acronymized as FIFO, is a method for organizing the manipulation of a data structure where the oldest (first) entry, or "head" of the queue, is processed first.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), although they can also be applied separately to an entire message string of bits.

In computer science, group coded recording or group code recording (GCR) refers to several distinct but related encoding methods for representing data on magnetic media. The first, used in 6250 bpi magnetic tape since 1973, is an error-correcting code combined with a run-length limited (RLL) encoding scheme, belonging into the group of modulation codes. The others are different mainframe hard disk as well as floppy disk encoding methods used in some microcomputers until the late 1980s. GCR is a modified form of a NRZI code, but necessarily with a higher transition density.

In computer architecture, register renaming is a technique that abstracts logical registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instruction refers to a particular logical register, the processor transposes this name to one specific physical register on the fly. The physical registers are opaque and cannot be referenced directly but only via the canonical names.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

<span class="mw-page-title-main">Linear Tape-Open</span> Magnetic tape-based data storage technology

Linear Tape-Open (LTO), also known as LTO Ultrium, is a magnetic tape data storage technology used for backup, data archiving, and data transfer. It was originally developed in the late 1990s as an open standards alternative to the proprietary magnetic tape formats that were available at the time. Upon introduction, LTO rapidly defined the super tape market segment and has consistently been the best-selling super tape format. The latest generation as of 2021, LTO-9, can hold 18 TB in one cartridge.

<span class="mw-page-title-main">Magnetic storage</span> Recording of data on a magnetizable medium

Magnetic storage or magnetic recording is the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetisation in a magnetizable material to store data and is a form of non-volatile memory. The information is accessed using one or more read/write heads.

Degaussing is the process of decreasing or eliminating a remnant magnetic field. It is named after the gauss, a unit of magnetism, which in turn was named after Carl Friedrich Gauss. Due to magnetic hysteresis, it is generally not possible to reduce a magnetic field completely to zero, so degaussing typically induces a very small "known" field referred to as bias. Degaussing was originally applied to reduce ships' magnetic signatures during World War II. Degaussing is also used to reduce magnetic fields in cathode ray tube monitors and to destroy data held on magnetic storage.

Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion operation, by reformatting of storage media that does not remove data previously written to the media, or through physical properties of the storage media that allow previously written data to be recovered. Data remanence may make inadvertent disclosure of sensitive information possible should the storage media be released into an uncontrolled environment.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

<span class="mw-page-title-main">Snapshot (computer storage)</span> Recorded state of a computer storage system at a particular point in time

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, or formatted data from secondary storage, removable media or files, when the data stored in them cannot be accessed in a usual way. The data is most often salvaged from storage media such as internal or external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives, magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices. Recovery may be required due to physical damage to the storage devices or logical damage to the file system that prevents it from being mounted by the host operating system (OS).

Nearline storage is a term used in computer science to describe an intermediate type of data storage that represents a compromise between online storage and offline storage/archiving.

An incremental backup is one in which successive copies of the data contain only the portion that has changed since the preceding backup copy was made. When a full recovery is needed, the restoration process would need the last full backup plus all the incremental backups until the point of restoration. Incremental backups are often desirable as they reduce storage space usage, and are quicker to perform than differential backups.

A differential backup is a type of data backup that preserves data, saving only the difference in the data since the last full backup. The rationale in this is that, since changes to data are generally few compared to the entire amount of data in the data repository, the amount of time required to complete the backup will be smaller than if a full backup was performed every time that the organization or data owner wishes to back up changes since the last full backup. Another advantage, at least as compared to the incremental backup method of data backup, is that at data restoration time, at most two backup media are ever needed to restore all the data. This simplifies data restores as well as increases the likelihood of shortening data restoration time.

NTBackup is the first built-in backup utility of the Windows NT family. It was introduced with Windows NT 3.51. NTBackup comprises a GUI (wizard-style) and a command-line utility to create, customize, and manage backups. It takes advantage of Shadow Copy and Task Scheduler. NTBackup stores backups in the BKF file format on external sources, e.g., floppy disks, hard drives, tape drives, and Zip drives. When used with tape drives, NTBackup uses the Microsoft Tape Format (MTF), which is also used by BackupAssist, Backup Exec, and Veeam Backup & Replication and is compatible with BKF.

The subject of computer backups is rife with jargon and highly specialized terminology. This page is a glossary of backup terms that aims to clarify the meaning of such jargon and terminology.

References

  1. 1 2 Kissell, Joe (February 2007). Take Control of Mac OS X Backups (PDF) (Version 2.0 ed.). Ithaca, NY: TidBITS Electronic Publishing. pp. 18-20 (The Archive), 24 (client-server), 82-83 (archive file), 112-114 (Off-site storage backup rotation scheme), 126-141 (old Retrospect terminology and GUI—still used in Windows variant), 165 (client-server), 128 (subvolume—later renamed Favorite Folder in Macintosh variant). ISBN   0-9759503-0-4. Archived from the original (PDF) on 2018-10-15.
  2. San Francisco Computer Repair (2008-01-13). "Backup Methods" . Retrieved 2008-02-21.
  3. Alvechurch Data Ltd (2007-11-27). "Tower of Hanoi pattern for backup" . Retrieved 2008-03-12.