Snapshot (computer storage)

Last updated
Example of snapshots of a Btrfs filesystem, managed with snapper Snapper root list screenshot.png
Example of snapshots of a Btrfs filesystem, managed with snapper

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography.

Contents

Rationale

A full backup of a large data set may take a long time to complete. On multi-tasking or multi-user systems, there may be writes to that data while it is being backed up. This prevents the backup from being atomic and introduces a version skew that may result in data corruption. For example, if a user moves a file into a directory that has already been backed up, then that file would be completely missing on the backup media, since the backup operation had already taken place before the addition of the file. Version skew may also cause corruption with files which change their size or contents underfoot while being read.

One approach to safely backing up live data is to temporarily disable write access to data during the backup, either by stopping the accessing applications or by using the locking API provided by the operating system to enforce exclusive read access. This is tolerable for low-availability systems (on desktop computers and small workgroup servers, on which regular downtime is acceptable). High-availability 24/7 systems, however, cannot bear service stoppages.

To avoid downtime, high-availability systems may instead perform the backup on a snapshot—a read-only copy of the data set frozen at a point in time—and allow applications to continue writing to their data. Most snapshot implementations are efficient and can create snapshots in O(1) . In other words, the time and I/O needed to create the snapshot does not increase with the size of the data set; by contrast, the time and I/O required for a direct backup is proportional to the size of the data set. In some systems once the initial snapshot is taken of a data set, subsequent snapshots copy the changed data only, and use a system of pointers to reference the initial snapshot. This method of pointer-based snapshots consumes less disk capacity than if the data set was repeatedly cloned.

Implementations

Volume managers

Some Unix systems have snapshot-capable logical volume managers. These implement copy-on-write on entire block devices by copying changed blocksjust before they are to be overwritten within "parent" volumesto other storage, thus preserving a self-consistent past image of the block device. Filesystems on such snapshot images can later be mounted as if they were on a read-only media.

Some volume managers also allow creation of writable snapshots, extending the copy-on-write approach by disassociating any blocks modified within the snapshot from their "parent" blocks in the original volume. Such a scheme could be also described as performing additional copy-on-write operations triggered by the writes to snapshots.

On Linux, Logical Volume Manager (LVM) allows creation of both read-only and read-write snapshots. Writable snapshots were introduced with the LVM version 2 (LVM2). [1]

File systems

Some file systems, such as WAFL, [lower-alpha 1] fossil for Plan 9 from Bell Labs, and ODS-5,[ citation needed ] internally track old versions of files and make snapshots available through a special namespace. Others, like UFS2, provide an operating system API for accessing file histories. In NTFS, access to snapshots is provided by the Volume Shadow-copying Service (VSS) in Windows XP and Windows Server 2003 and Shadow Copy in Windows Vista. Melio FS provides snapshots via the same VSS interface for shared storage. [2] Snapshots have also been available in the NSS (Novell Storage Services) file system on NetWare since version 4.11, and more recently on Linux platforms in the Open Enterprise Server product.

EMC's Isilon OneFS clustered storage platform implements a single scalable file system that supports read-only snapshots at the file or directory level. Any file or directory within the file system can be snapshotted and the system will implement a copy-on-write or point-in-time snapshot dynamically based on which method is determined to be optimal for the system.

On Linux, the Btrfs and OCFS2 file systems support creating snapshots (cloning) of individual files. Additionally, Btrfs also supports the creation of snapshots of subvolumes. On AIX, JFS2 also support snapshots.

See also

Notes

  1. WAFL is not a file system. WAFL is a file layout that provides mechanisms that enable a variety of file systems and technologies that want to access disk blocks.

Related Research Articles

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources.

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

<span class="mw-page-title-main">Data striping</span> Data segmentation technique

In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Disk cloning is the process of duplicating all data on a digital storage drive, such as a hard disk or solid state drive, using hardware or software techniques. Unlike file copying, disk cloning also duplicates the filesystems, partitions, drive meta data and slack space on the drive. Common reasons for cloning a drive include; data backup and recovery; duplicating a computer's configuration for mass deployment and for preserving data for digital forensics purposes. Drive cloning can be used in conjunction with drive imaging where the cloned data is saved to one or more files on another drive rather than copied directly to another drive.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed using methods similar as those for normal file access.

<span class="mw-page-title-main">Shadow Copy</span> Microsoft technology for storage snapshots

Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. A software VSS provider service is also included as part of Windows to be used by Windows applications. Shadow Copy technology requires either the Windows NTFS or ReFS filesystems in order to create and store shadow copies. Shadow Copies can be created on local and external volumes by any Windows component that uses this technology, such as when creating a scheduled Windows Backup or automatic System Restore point.

In Linux, Logical Volume Manager (LVM) is a device mapper framework that provides logical volume management for the Linux kernel. Most modern Linux distributions are LVM-aware to the point of being able to have their root file systems on a logical volume.

Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user saves. In its true form it allows the user or administrator to restore data to any point in time. The technique was patented by British entrepreneur Pete Malcolm in 1989 as "a backup system in which a copy [editor's emphasis] of every change made to a storage medium is recorded as the change occurs [editor's emphasis]."

NILFS or NILFS2 is a log-structured file system implementation for the Linux kernel. It was developed by Nippon Telegraph and Telephone Corporation (NTT) CyberSpace Laboratories and a community from all over the world. NILFS was released under the terms of the GNU General Public License (GPL).

A NetApp FAS is a computer storage product by NetApp running the ONTAP operating system; the terms ONTAP, AFF, ASA, FAS are often used as synonyms. "Filer" is also used as a synonym although this is not an official name. There are three types of FAS systems: Hybrid, All-Flash, and All SAN Array:

  1. NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
  2. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS
  3. All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.
<span class="mw-page-title-main">Distributed Replicated Block Device</span> Distributed replicated storage system for Linux

Distributed Replicated Block Device (DRBD) is a distributed replicated storage system for the Linux platform. It is implemented as a kernel driver, several userspace management applications, and some shell scripts. DRBD is traditionally used in high availability (HA) computer clusters, but beginning with DRBD version 9, it can also be used to create larger software defined storage pools with a focus on cloud integration.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was founded by Chris Mason in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel.

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

Next3 is a journaling file system for Linux based on ext3 which adds snapshots support, yet retains compatibility to the ext3 on-disk format. Next3 is implemented as open-source software, licensed under the GPL license.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

References

  1. "LVM HOWTO". 3.8. Snapshots. tldp.org. Retrieved 2013-09-29.
  2. "Optimized Storage Solution for Enterprise Scale Hyper-V Deployments" (PDF). Microsoft. March 2010. p. 15. Retrieved 25 October 2012.