Incremental backup

Last updated

An incremental backup is one in which successive copies of the data contain only the portion that has changed since the preceding backup copy was made. [1] [2] [3] [4] When a full recovery is needed, the restoration process would need the last full backup plus all the incremental backups until the point of restoration. [5] Incremental backups are often desirable as they reduce storage space usage, and are quicker to perform than differential backups. [6]

Contents

Variants

Incremental

The most basic form of incremental backup consists of identifying, recording and thus, preserving only those files that have changed since the last backup. Since changes are typically low, incremental backups are much smaller and quicker than full backups. For instance, following a full backup on Friday, a Monday backup will contain only those files that changed since Friday. A Tuesday backup contains only those files that changed since Monday, and so on. A full restoration of data will naturally be slower, since all increments must be restored. Should any one of the copies created fail, including the first (full), restoration will be incomplete. [7]

A Unix example would be:

rsync -e ssh -va --link-dest=$dst/hourly.1 $remoteserver:$remotepath$dst/hourly.0 

The use of rsync's --link-dest option is what makes this command an example of incremental backup.

Multilevel incremental

A more sophisticated incremental backup scheme involves multiple numbered backup levels. A full backup is level 0. A level n backup will back up everything that has changed since the most recent level n-1 backup. Suppose for instance that a level 0 backup was taken on a Sunday. A level 1 backup taken on Monday would include only changes made since Sunday. A level 2 backup taken on Tuesday would include only changes made since Monday. A level 3 backup taken on Wednesday would include only changes made since Tuesday. If a level 2 backup was taken on Thursday, it would include all changes made since Monday because Monday was the most recent level n-1 backup.

Reverse incremental

An incremental backup of the changes made between two instances of a mirror can be forward or reverse.

If the oldest version of the mirror is treated as the base and the newest version as the revised version, the incremental produced is a forward incremental.

If the newest version of the mirror is treated as the base and the oldest version as the revised / changed version, the incremental produced is a reverse incremental.

In making backups using reverse incremental backups, each time a reverse incremental backup is taken, it is applied (in reverse) to the previous full (synthetic) backup, thus the current full (synthetic) backup is always a backup of the most recent state of the system. This is in contrast to forward incremental backups where the current full backup is a backup of the oldest version of the system, and to get a backup of the most recent state of the system, all of the forward incremental backups have to be applied to that oldest version successively.

By applying a reverse incremental to a mirror, the result will be a previous version of the mirror. This gives a means to revert to any previous version of the mirror.

In other words, after the initial full backup, each successive incremental backup applies the changes to the previous full, creating a new synthetic full backup every time, while maintaining the ability to revert to previous versions.

The main advantage of this type of backup is a more efficient recovery process, since the most recent version of the data (which is the most frequently restored version) is a (synthetic) full backup, and no incrementals need to be applied to it during its restoration. Reverse incremental backup works for both tapes and disks, but in practice tends to work better with disks.

Companies using the reverse incremental backup method include Intronis and Zetta.net.

Incremental forever

This style is similar to the synthetic backup concept. After an initial full backup, only the incremental backups are sent to a centralized backup system. This server keeps track of all the increments and sends the proper data back to the client during restores. This can be implemented by sending each incremental directly to tape as it is taken and then refactoring the tapes as necessary. If enough disk space is available, an online mirror can be maintained along with previous incremental changes so that the current or older versions of the systems being backed up can be restored. This is a suitable method in the case of banking systems.[ citation needed ]

In modern cloud architectures, or disk to disk backup scenarios, this is much simpler. Data is broken into chunks and placed on a cloud storage system. Metadata about the chunks is stored in a persistent system, which allows the system to assemble a point in time backup from these chunks at restore time. There is no need to refactor tape.

Block-level incremental

This method backs up only the blocks within the file that changed. This requires a higher level of integration between the sender and receiver.

Byte-level incremental

These backup technologies are similar to the "block-level incremental" backup method; however, the byte (or binary) incremental backup method is based on a binary variation of the files compared to the previous backup: while the block-based technologies work with heavy changing units (blocks of 8K, 4K or 1K), the byte-based technologies work with the minimum unit, saving space when reflecting a change on a file. [8] Another important difference is that they work independently on the file system. At the moment, these are the technologies that achieve the highest relative compression of the data, turning into a great advantage for the security copies carried out through the Internet.[ citation needed ]

Other backup types

Synthetic full backup

A synthetic backup is an alternative method of creating full backups. Instead of reading and backing up data directly from the disk, it will synthesize the data from the previous full backup (either a regular full backup for the first backup, or the previous synthetic full backup) and the periodic incremental backups. As only the incremental backups read data from the disk, these are the only files that need to be transferred during offsite replication. This greatly reduces the bandwidth needed for offsite replication. Synthetic backup does not always work with the same efficiency. The rate of data uploaded from the target machine to data, synchronized on the storage, varies depending on the disk fragmentation. [9]

Differential

A differential backup is a cumulative backup of all changes made since the last full or normal backup, i.e., the differences since the last full backup. The advantage to this is the quicker recovery time, requiring only a full backup and the last differential backup to restore the system. The disadvantage is that for each day elapsed since the last full backup, more data needs to be backed up, especially if a significant proportion of the data has changed.

Forward incremental-forever

A forward incremental-forever backup [10] allows the synthetic operation to create a new full backup, which is limited to the size of the incremental file, instead of the complete size of a full backup file as it would happen in a “forward mode with synthetic fulls”. The overall consumed I/O is the same as the reversed incremental, but during the duration of the backup activity only 1 write I/O is used and the snapshot of the VM is opened for less time than the reversed incremental; the remaining 2 I/O are used to update the full backup file.

See also

Related Research Articles

rsync File synchronization protocol and software

rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operating systems and is under the GPL-3.0-or-later license.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

<span class="mw-page-title-main">Shadow Copy</span> Microsoft technology for storage snapshots

Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. A software VSS provider service is also included as part of Windows to be used by Windows applications. Shadow Copy technology requires either the Windows NTFS or ReFS filesystems in order to create and store shadow copies. Shadow Copies can be created on local and external volumes by any Windows component that uses this technology, such as when creating a scheduled Windows Backup or automatic System Restore point.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

Extensible Storage Engine (ESE), also known as JET Blue, is an ISAM data storage technology from Microsoft. ESE is the core of Microsoft Exchange Server, Active Directory, and Windows Search. It's also used by a number of Windows components including Windows Update client and Help and Support Center. Its purpose is to allow applications to store and retrieve data via indexed and sequential access.

Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. In its true form it allows the user or administrator to restore data to any point in time. The technique was patented by British entrepreneur Pete Malcolm in 1989 as "a backup system in which a copy [editor's emphasis] of every change made to a storage medium is recorded as the change occurs [editor's emphasis]."

IBM Spectrum Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

<span class="mw-page-title-main">Acronis Cyber Protect Home Office</span> Data protection software for personal users

Acronis Cyber Protect Home Office is a software package produced by Acronis International GmbH that aims to protect the system from ransomware and allows users to backup and restore files or entire systems from a backup archive, which was previously created using the software. Since 2020, Acronis Cyber Protect Home Office includes malware and Zoom protection. The software is used by technicians to deploy operating systems to computers and by academics to help restore computers following analysis of how viruses infect computers.

<span class="mw-page-title-main">BackupPC</span>

BackupPC is a free disk-to-disk backup software suite with a web-based frontend. The cross-platform server will run on any Linux, Solaris, or UNIX-based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software, even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them".

<span class="mw-page-title-main">Time Machine (macOS)</span> Backup software application developed by Apple and distributed as part of macOS

Time Machine is the backup mechanism of macOS, the desktop operating system developed by Apple. The software is designed to work with both local storage devices and network-attached disks, and is most commonly used with external disk drives connected using either USB or Thunderbolt. It was first introduced in Mac OS X 10.5 Leopard, which appeared in October 2007 and incrementally refined in subsequent releases of macOS. Time Machine was revamped in macOS 11 Big Sur to support APFS, thereby enabling "faster, more compact, and more reliable backups" than were possible previously.

A differential backup is a type of data backup that preserves data, saving only the difference in the data since the last full backup. The rationale in this is that, since changes to data are generally few compared to the entire amount of data in the data repository, the amount of time required to complete the backup will be smaller than if a full backup was performed every time that the organization or data owner wishes to back up changes since the last full backup. Another advantage, at least as compared to the incremental backup method of data backup, is that at data restoration time, at most two backup media are ever needed to restore all the data. This simplifies data restores as well as increases the likelihood of shortening data restoration time.

<span class="mw-page-title-main">Backup and Restore</span>

Backup and Restore is the primary backup component of Windows Vista and Windows 7. It can create file and folder backups, as well as system images backups, to be used for recovery in the event of data corruption, hard disk drive failure, or malware infection. It replaces NTBackup, which has been part of Windows since Windows NT 3.51. Unlike its predecessor, it supports CDs, DVDs, and Blu-rays discs as backup media.

Catalogic DPX is an enterprise-level data protection tool that backs up and restores data and applications for a variety of operating systems. It has data protection, disaster recovery and business continuity planning capabilities. Catalogic DPX protects physical or virtual servers including VMWare and vSphere, supports many database applications, including Oracle, SQL, SharePoint, and Exchange. DPX supports agent-based or agent less backups. Users can map to and use a backed up version of the database if something goes wrong with the primary version. DPX is managed from a single console and catalog. This allows for centralized control of both tape-based and disk-based data protection jobs across heterogeneous operating systems. DPX can protect data centers, remote sites and supports recovery from DR. DPX can protect data to disk, tape or cloud. It is used for various recovery use cases including file, application, BMR, VM or DR. DPX can spin up VMs from backup images, recover physical servers, bring up applications online from snapshot based backups, it can be used to recover from Ransomware.

The subject of computer backups is rife with jargon and highly specialized terminology. This page is a glossary of backup terms that aims to clarify the meaning of such jargon and terminology.

This is a comparison of online backup services.

rdiff-backup is a backup software written in Python that creates reverse incremental backups. The most recent backup is thus directly accessible, while earlier backups will be reconstructed from diff files by rdiff-backup.

NetVault is a set of data protection software developed and supported by Quest Software. NetVault Backup is a backup and recovery software product. It can be used to protect data and software applications in physical and virtual environments from one central management interface. It supports many servers, application platforms, and protocols such as UNIX, Linux, Microsoft Windows, VMware, Microsoft Hyper-V, Oracle, Sybase, Microsoft SQL Server, NDMP, Oracle ACSLS, IBM DAS/ACI, Microsoft Exchange Server, DB2, and Teradata.

Disk-based backup refers to technology that allows one to back up large amounts of data to a disk storage unit. It is the technology which is often supplemented by tape drives for data archival or replication to another facility for disaster recovery. Additionally, backup-to-disk has several advantages over traditional tape backup for both technical and business reasons. With continued improvements in storage devices to provide faster access and higher storage capacity, a prime consideration for backup and restore operations, backup-to-disk will become more prominent in organizations.

<span class="mw-page-title-main">Veeam Backup & Replication</span> Backup and disaster recovery software

Veeam Backup & Replication is a proprietary backup app developed by Veeam for virtual environments built on VMware vSphere, Nutanix AHV, and Microsoft Hyper-V hypervisors. The software provides backup, restore and replication functionality for virtual machines, physical servers and workstations as well as cloud-based workload.

<span class="mw-page-title-main">MSP360</span>

MSP360, formerly CloudBerry Lab, is a software and application service provider company that develops online backup, remote desktop and file management products integrated with more than 20 cloud storage providers.

References

  1. Description of Full, Incremental, and Differential Backups. Microsoft Support. Retrieved 21 August 2012.
  2. 3.3.2. Making an Incremental Backup. (MySQL Enterprise Backup User's Guide (Version 3.7.1) :: II Using MySQL Enterprise Backup :: 3 Backing Up a Database Server :: 3.3 Backup Scenarios and Examples :: 3.3.2 Making an Incremental Backup). MySQL. Retrieved 21 August 2012.
  3. ARCserve Backup r16-ENU/Bookshelf_Files/PDF/AB_MS_EXCHANGE_W_ENU.pdf CA ARCserve Backup for Windows: Agent for Microsoft Exchange Server Guide, r16 CA Technologies Technical Support. Page 52. Retrieved 21 August 2012.
  4. What are the differences between Differential and Incremental backups?. Archived 2012-09-04 at the Wayback Machine Symantec Enterprise Technical Support. Article: TECH7665. Created: 2000-01-27; Updated: 2012-05-12. Retrieved 21 August 2012.
  5. SQL Server differential backups. Carlos Rojas. EMC Community Network. EMC Corporation. 2 March 2011. Retrieved 21 August 2012.
  6. NetApp SnapMirror Block Level Incremental Backup to Tape with NetVault Backup. Archived 2013-07-11 at archive.today Charles Keiper, Senior Product Manager. Quest Software. 1 August 2012. Retrieved 21 August 2012.
  7. Zacker, Craig (2006). Network+ Certification, Fourth Edition. Redmond, WA: Microsoft Press. p. 455.
  8. "What is an incremental backup?". IONOS Digitalguide. Retrieved 2022-08-15.
  9. Gugick, David. "Synthetic Full Backup Explained". CloudBerry Lab. Retrieved 20 December 2018.
  10. "New Forward Incremental-Forever Backup". Virtualtothecore. 13 October 2014.

Further reading