Continuous data protection

Last updated

Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. In its true form it allows the user or administrator to restore data to any point in time. [1] The technique was patented by British entrepreneur Pete Malcolm in 1989 as "a backup system in which a copy [editor's emphasis] of every change made to a storage medium is recorded as the change occurs [editor's emphasis]." [2]

Contents

In an ideal case of continuous data protection, the recovery point objective—"the maximum targeted period in which data (transactions) might be lost from an IT service due to a major incident"—is zero, even though the recovery time objective—"the targeted duration of time and a service level within which a business process must be restored after a disaster (or disruption) in order to avoid unacceptable consequences associated with a break in business continuity"—is not zero. [3] An example of a period in which data transactions might be lost is a major discount chain having card readers at its checkout counters shut down at multiple locations for close to two hours in the month of June 2019.

CDP runs as a service that captures changes to data to a separate storage location. There are multiple methods for capturing continuous live data changes involving different technologies that serve different needs. True CDP-based solutions can provide fine granularities of restorable objects ranging from crash-consistent images to logical objects such as files, mail boxes, messages, and database files and logs. [4] This isn't necessarily true of near-CDP solutions.

Differences from traditional backup

True continuous data protection is different from traditional backup in that it is not necessary to specify the point in time to recover from until ready to restore. [5] Traditional backups only restore data from the time the backup was made. True continuous data protection, in contrast to "snapshots", has no backup schedules. [5] When data is written to disk, it is also asynchronously written to a second location, either another computer over the network [6] or an appliance. [7] This introduces some overhead to disk-write operations but eliminates the need for scheduled backups.

Allowing restoring data to any point in time, "CDP is the gold standard—the most comprehensive and advanced data protection. But 'near CDP' technologies can deliver enough protection for many companies with less complexity and cost. For example, snapshots ["near-CDP" clarification in the section below] can provide a reasonable near-CDP-level of protection for file shares, letting users directly access data on the file share at regular intervals—say, every half hour or 15 minutes. That's certainly a higher level of protection than tape-based or disk-based nightly backups and may be all you need." [1] Because "near-CDP does this [copying] at pre-set time intervals", [8] it is essentially incremental backup initiated—separately for each source machine—by timer instead of script.

Continuous vs near continuous

Since true CDP "backup write operations are executed at the level of the basic input/output system (BIOS) of the microcomputer in such a manner that normal use of the computer is unaffected", [2] true CDP backup must in practice be run in conjunction with a virtual machine [6] [9] or equivalent [10] —ruling it out for ordinary personal backup applications. It is therefore discussed in the "Enterprise client-server backup" article, rather than in the "Backup" article.

Some solutions marketed as continuous data protection may only allow restores at fixed intervals such as 15 minutes or one hour or 24 hours, because they automatically take incremental backups at those intervals. Such "near-CDP"—short for near-continuous data protection—schemes are not universally recognized as true continuous data protection, as they do not provide the ability to restore to any point in time. When the interval is shorter than one hour, [11] "near-CDP" solutions—for example Arq Backup [12] —are typically based on periodic "snapshots"; "to avoid downtime, high-availability systems may instead perform the backup on ... a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data".

There is debate in the industry as to whether the granularity of backup must be "every write" to be CDP, or whether a "near-CDP" solution that captures the data every few minutes is good enough. The latter is sometimes called near continuous backup. The debate hinges on the use of the term continuous: whether only the backup process must be continuously automatically scheduled, which is often sufficient to achieve the benefits cited above, or whether the ability to restore from the backup also must be continuous. The Storage Networking Industry Association (SNIA) uses the "every write" definition. [5]

There is a briefer sub-sub-section in the "Backup" article about this, now renamed to "Near-CDP" to avoid confusion.

Differences from RAID, replication or mirroring

Continuous data protection differs from RAID, replication, or mirroring in that these technologies only protect one copy of the data (the most recent). If data becomes corrupted in a way that is not immediately detected, these technologies simply protect the corrupted data with no way to restore an uncorrupted version.

Continuous data protection protects against some effects of data corruption by allowing restoration of a previous, uncorrupted version of the data. Transactions that took place between the corrupting event and the restoration are lost, however. They could be recovered through other means, such as journaling.

Backup disk size

In some situations, continuous data protection requires less space on backup media (usually disk) than traditional backup. Most continuous data protection solutions save byte or block-level differences rather than file-level differences. This means that if one byte of a 100 GB file is modified, only the changed byte or block is backed up. Traditional incremental and differential backups make copies of entire files; however starting around 2013 enterprise client-server backup applications have implemented a capability for block-level incremental backup, designed for large files such as databases.

Risks and disadvantages

When real-time edits—especially in multimedia and CAD design environments—are backed up offsite over the upstream channel of the installation's broadband network, [13] network bandwidth throttling [14] may be needed to reduce the impact of true CDP. [13] An alternative approach is to back up to a separate Fibre-Channel-connected SAN appliance. [7]

See also

Related Research Articles

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Disk cloning is the process of duplicating all data on a digital storage drive, such as a hard disk or solid state drive, using hardware or software techniques. Unlike file copying, disk cloning also duplicates the filesystems, partitions, drive meta data and slack space on the drive. Common reasons for cloning a drive include; data backup and recovery; duplicating a computer's configuration for mass deployment and for preserving data for digital forensics purposes. Drive cloning can be used in conjunction with drive imaging where the cloned data is saved to one or more files on another drive rather than copied directly to another drive.

Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle. It employs policies, tools, and procedures. Disaster recovery focuses on information technology (IT) or technology systems supporting critical business functions as opposed to business continuity. This involves keeping all essential aspects of a business functioning despite significant disruptive events; it can therefore be considered a subset of business continuity. Disaster recovery assumes that the primary site is not immediately recoverable and restores data and services to a secondary site.

NetApp, Inc. is an intelligent data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an initial public offering in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.

<span class="mw-page-title-main">Shadow Copy</span> Microsoft technology for storage snapshots

Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. A software VSS provider service is also included as part of Windows to be used by Windows applications. Shadow Copy technology requires either the Windows NTFS or ReFS filesystems in order to create and store shadow copies. Shadow Copies can be created on local and external volumes by any Windows component that uses this technology, such as when creating a scheduled Windows Backup or automatic System Restore point.

A remote, online, or managed backup service, sometimes marketed as cloud backup or backup-as-a-service, is a service that provides users with a system for the backup, storage, and recovery of computer files. Online backup providers are companies that provide this type of service to end users. Such backup services are considered a form of cloud computing.

<span class="mw-page-title-main">Snapshot (computer storage)</span> Recorded state of a computer storage system at a particular point in time

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

A virtual tape library (VTL) is a data storage virtualization technology used typically for backup and recovery purposes. A VTL presents a storage component as tape libraries or tape drives for use with existing backup software.

IBM Storage Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.

EMC NetWorker is an enterprise-level data protection software product from Dell EMC that unifies and automates backup to tape, disk-based, and flash-based storage media across physical and virtual environments for granular and disaster recovery.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

<span class="mw-page-title-main">Time Machine (macOS)</span> Backup software application developed by Apple and distributed as part of macOS

Time Machine is the backup mechanism of macOS, the desktop operating system developed by Apple. The software is designed to work with both local storage devices and network-attached disks, and is most commonly used with external disk drives connected using either USB or Thunderbolt. It was first introduced in Mac OS X 10.5 Leopard, which appeared in October 2007 and incrementally refined in subsequent releases of macOS. Time Machine was revamped in macOS 11 Big Sur to support APFS, thereby enabling "faster, more compact, and more reliable backups" than were possible previously.

Lasso Logic was a company formed in 2003 that pioneered continuous data protection (CDP) and an onsite–offsite backup technology for the small and medium enterprise market (SME) before it was acquired by SonicWALL in November 2005 for approximately $20 million. Lasso Logic became the third of three business divisions at SonicWALL, now called the Business Continuity Unit. Lasso CDP is now called SonicWALL CDP.

InMage was a computer software company based in the US and India. It marketed a product line called Scout that used continuous data protection (CDP) for backup and replication. Scout consisted of two product lines: the host-offload line, which uses a software agent on the protected servers, and the fabric line, which uses an agent on the Fibre Channel switch fabric. The software protects at the volume or block level, tracking all write changes. It allows for local or remote protection policies. The first version of the product was released in 2002.

NetVault is a set of data protection software developed and supported by Quest Software. NetVault Backup is a backup and recovery software product. It can be used to protect data and software applications in physical and virtual environments from one central management interface. It supports many servers, application platforms, and protocols such as UNIX, Linux, Microsoft Windows, VMware, Microsoft Hyper-V, Oracle, Sybase, Microsoft SQL Server, NDMP, Oracle ACSLS, IBM DAS/ACI, Microsoft Exchange Server, DB2, and Teradata.

RecoverPoint is a continuous data protection product offered by Dell EMC which supports asynchronous and synchronous data replication of block-based storage. RecoverPoint was originally created by a company called Kashya, which was bought by EMC in 2006.

<span class="mw-page-title-main">Zerto</span>

Zerto provides disaster recovery, ransomware resilience and workload mobility software for virtualized infrastructures and cloud environments. Zerto is a subsidiary of Hewlett Packard Enterprise company which is headquartered in Spring, Texas, USA.

<span class="mw-page-title-main">Veeam Backup & Replication</span> Backup and disaster recovery software

Veeam Backup & Replication is a proprietary backup app developed by Veeam for virtual environments built on VMware vSphere, Nutanix AHV, and Microsoft Hyper-V hypervisors. The software provides backup, restore and replication functionality for virtual machines, physical servers and workstations as well as cloud-based workload.

References

  1. 1 2 Behzad Behtash (2010-05-06). "Why Continuous Data Protection's Getting More Practical". Disaster recovery/business continuity. InformationWeek. Retrieved 2011-11-12. A true CDP approach should capture all data writes, thus continuously backing up data and eliminating backup windows.... CDP is the gold standard—the most comprehensive and advanced data protection. But "near CDP" technologies can deliver enough protection for many companies with less complexity and cost. For example, snapshots can provide a reasonable near-CDP-level of protection for file shares, letting users directly access data on the file share at regular intervals--say, every half hour or 15 minutes. That's certainly a higher level of protection than tape-based or disk-based nightly backups and may be all you need.
  2. 1 2 Peter B. Malcolm (13 November 1989). "US Patent 5086502: Method of operating a data processing system". Google Patents. Retrieved 29 November 2016. Filing date Nov 13, 1989 ... a backup system in which a copy of every change made to a storage medium is recorded as the change occurs ... backup write operations are executed at the level of the basic input/output system (BIOS) ...
  3. Richard May (November 2012). "Finding RPO and RTO". Archived from the original on 2016-03-03.
  4. Pat Hanavan (2007). "An Overview of Continuous Data Protection". Infosectoday.com. What is Continuous Data Protection?, Can CDP be leveraged for backing up and recovering email?. Archived from the original on 2019-06-17. Retrieved 2011-11-12. ... may be block, file-, or application-based and can provide fine granularities of restorable objects to infinitely variable points in time.... New granular recovery technologies have emerged that enable mail messages, mailboxes, and folders to be restored individually without having to restore an entire email database, and without separate and redundant mailbox backups.
  5. 1 2 3 "Data Protection Best Practices" (PDF). SNIA. Storage Networking Industry Association. 23 October 2017. 2.1.4 Continuous Data Protection (CDP). Retrieved 27 June 2019. ...pros to the use of snapshots:[new paragraph]Allows for the recovery of files from a specific point in time (based on snapshot schedule)....CDP can provide the ability to restore to any previous point in time, since the backups are taking place near-instantaneously; therefore, the potential for data loss is very small.
  6. 1 2 Wu, Victor (4 March 2017). "EMC RecoverPoint for Virtual Machine Overview". Victor Virtual. WuChiKin. Retrieved 22 June 2019. The splitter splits out the Write IOs to the VMDK/RDM of a VM and sends a copy to the production VMDK and also to the RecoverPoint for VMs cluster.
  7. 1 2 Wendt, Jerome M. (21 September 2009). "Symantec Brings RealTime CDP into NetBackup Data Management Fold". DCIG. DCIG LLC. Retrieved 5 August 2019. NetBackup RealTime is an appliance-based CDP solution intended for the protection of multiple hosts. Residing in corporate FC-SANs as a side-band appliance, it sits outside of the data path between application servers and their assigned storage to eliminate any possibilities of application disruption.
  8. "Continuous data protection (CDP) explained: True CDP vs near-CDP". ComputerWeekly.com. TechTarget. July 2010. Retrieved 22 June 2019. ... copies data from a source to a target. True CDP does this every time a change is made, while so-called near-CDP does this at pre-set time intervals. Near-CDP is effectively the same as snapshotting....True CDP systems record every write and copy them to the target where all changes are stored in a log. [new paragraph] By contrast, near-CDP/snapshot systems copy files in a straightforward manner but require applications to be quiesced and made ready for backup, either via the application's backup mode or using, for example, Microsoft's Volume Shadow Copy Services (VSS).
  9. "Zerto or Veeam?". RES-Q Services. March 2017. Retrieved 7 July 2019. Zerto doesn't use snapshot technology like Veeam. Instead, Zerto deploys small virtual machines on its physical hosts. These Zerto VMs capture the data as it is written to the host and then send a copy of that data to the replication site.....However, Veeam has the advantage of being able to more efficiently capture and store data for long-term retention needs. There is also a significant pricing difference, with Veeam being cheaper than Zerto.
  10. "Agent Related". CloudEndure.com. 2019. What does the CloudEndure Agent do?. Retrieved 3 July 2019. The CloudEndure Agent performs an initial block-level read of the content of any volume attached to the server and replicates it to the Replication Server. The Agent then acts as an OS-level read filter to capture writes and synchronizes any block level modifications to the CloudEndure Replication Server, ensuring near-zero RPO.
  11. Pond, James (25 May 2013). "FAQ 13. How are [Time Machine] backups scheduled (and can I change that)?". Apple OSX and Time Machine Tips. Baligu.com (as mirrored after James Pond died in 2013). Retrieved 4 July 2019. Time Machine was designed and optimized to do backups hourly.... You cannot change the schedule within Time Machine. You must use a 3rd-party app, or manually alter some system files.
  12. Reitshamer, Stefan (5 July 2017). "Troubleshooting backing up open/locked files on Windows". Arq Blog. Haystack Software LLC. Retrieved 25 June 2019. Arq uses Windows Volume Shadow Copy Service (VSS) to back up files that are open/locked. [Reitshamer is the principal developer of Arq Backup]
  13. 1 2 Carter, Nick (5 August 2010). "Off-Site Backup – The Bandwidth Hog". Accel Networks. Archived from the original on 2011-07-07. In a true CDP environment, whenever large files are saved – images, audio, video, CAD or 3D models – the data is transmitted over the same broadband connection that feeds users' email and internet, not to mention back-end business-critical processes. Moreover, these transmissions rely on the scarcer of the two channels, the upstream channel. The result for many companies is an erratic broadband performance, and even server slow-down.
  14. David Pogue (4 January 2007). "Fewer Excuses for Not Doing a PC Backup". The New York Times . options like "Enable Bandwidth Throttle" and "Don't back up if the CPU is over this % busy."