Single-instance storage

Last updated

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file systems, e-mail server software, data backup, and other storage-related computer software. Single-instance storage is a simple variant of data deduplication. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages. [1]

Contents

Concept

In the case of an e-mail server, single-instance storage would mean that a single copy of a message is held within its database while individual mailboxes access the content through a reference pointer. However, there is a common misconception that the primary benefit of single-instance storage in mail servers is a reduction in disk space requirements. The truth is that its primary benefit is to greatly enhance delivery efficiency of messages sent to large distribution lists. In a mail server scenario disk space savings from single-instance storage are transient and drop off very quickly over time.[ citation needed ]

When used in conjunction with backup software, single-instance storage can reduce the quantity of archive media required since it avoids storing duplicate copies of the same file. Often identical files are installed on multiple computers, for example operating system files. With single-instance storage, only one copy of a file is written to the backup media therefore reducing space. This becomes more important when the storage is offsite and on cloud storage such as Amazon S3. In such cases, it has been reported that deduplication can help reduce the costs of storage, costs of bandwidth and backup windows by up to 10:1. [2]

Novell GroupWise was built on single-instance storage, which accounts for its large capacity.

ISO CD/DVD image files can be optimized to use SIS to reduce the size of a CD/DVD compilation (if there are enough duplicated files) to make it fit into smaller media.

SIS is related to system wide file duplication search and multiple file instance detection tools such as the P2P application BearShare (5.n Versions and below) but differs in that SIS reduces storage utilization automatically and creates and retains symbolic linkages, whereas Bearshare allows for manual deletion of duplicates and associated user-level file system, Windows Explorer type of icon links.

Microsoft

SIS was introduced with the Remote Installation Services feature of Windows 2000 Server. A typical server might hold ten or more unique installation configurations (perhaps with different device drivers or software suites) but perhaps only 20% of the data may be unique between configurations. [3] Microsoft states that "SIS works by searching a hard disk volume to identify duplicate files. When SIS finds identical files, it saves one copy of the file to a central repository, called the SIS Common Store, and replaces other copies with pointers to the stored versions." [4] Files are compared solely by their hash functions; files with different names or dates can be consolidated so long as the data itself is identical. [3] Windows Server 2003 Standard Edition has SIS capabilities but is limited to OEM OS system installs.[ citation needed ]

The file-based Windows Imaging Format introduced in Windows Vista also supported single-instance storage. Single-instance storage was a feature of Microsoft Exchange Server since version 4.0 and is also present in Microsoft's Windows Home Server. It is deduplicating attachments only in Exchange 2007 and was dropped completely in Microsoft Exchange Server 2010. [5] Microsoft announced Windows Storage Server 2008 (WSS2008) [6] with Single Instance Storage on June 1, 2009, and states this feature is not available on Windows Server 2008. [6]

The feature is officially deprecated since Windows Server 2012, when a new, more powerful chunk-based data deduplication mechanism was introduced. It allows files with similar content to be deduplicated as long as they have stretches of identical data. This mechanism is more powerful than SIS. [7] Since Windows Server 2019, the feature is fully supported on ReFS. [8]

See also

Related Research Articles

A disk image is a snapshot of a storage device's structure and data typically stored in one or more computer files on another storage device. Traditionally, disk images were bit-by-bit copies of every sector on a hard disk often created for digital forensic purposes, but it is now common to only copy allocated data to reduce storage space. Compression and deduplication are commonly used to reduce the size of the image file set. Disk imaging is done for a variety of purposes including digital forensics, cloud computing, system administration, as part of a backup strategy, and legacy emulation as part of a digital preservation strategy. Disk images can be made in a variety of formats depending on the purpose. Virtual disk images are intended to be used for cloud computing, ISO images are intended to emulate optical media and raw disk images are used for forensic purposes. Proprietary formats are typically used by disk imaging software. Despite the benefits of disk imaging the storage costs can be high, management can be difficult and they can be time consuming to create.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Filesystem in Userspace (FUSE) is a software interface for Unix and Unix-like computer operating systems that lets non-privileged users create their own file systems without editing kernel code. This is achieved by running file system code in user space while the FUSE module provides only a bridge to the actual kernel interfaces.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

<span class="mw-page-title-main">Shadow Copy</span> Microsoft technology for storage snapshots

Shadow Copy is a technology included in Microsoft Windows that can create backup copies or snapshots of computer files or volumes, even when they are in use. It is implemented as a Windows service called the Volume Shadow Copy service. A software VSS provider service is also included as part of Windows to be used by Windows applications. Shadow Copy technology requires either the Windows NTFS or ReFS filesystems in order to create and store shadow copies. Shadow Copies can be created on local and external volumes by any Windows component that uses this technology, such as when creating a scheduled Windows Backup or automatic System Restore point.

A virtual tape library (VTL) is a data storage virtualization technology used typically for backup and recovery purposes. A VTL presents a storage component as tape libraries or tape drives for use with existing backup software.

IBM Storage Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

EMC NetWorker is an enterprise-level data protection software product from Dell EMC that unifies and automates backup to tape, disk-based, and flash-based storage media across physical and virtual environments for granular and disaster recovery.

Veritas Backup Exec is a data protection software product designed for customers with mixed physical and virtual environments, and who are moving to public cloud services. Supported platforms include VMware and Hyper-V virtualization, Windows and Linux operating systems, Amazon S3, Microsoft Azure and Google Cloud Storage, among others. All management and configuration operations are performed with a single user interface. Backup Exec also provides integrated deduplication, replication, and disaster recovery capabilities and helps to manage multiple backup servers or multi-drive tape loaders.

<span class="mw-page-title-main">BackupPC</span>

BackupPC is a free disk-to-disk backup software suite with a web-based frontend. The cross-platform server will run on any Linux, Solaris, or UNIX-based server. No client is necessary, as the server is itself a client for several protocols that are handled by other services native to the client OS. In 2007, BackupPC was mentioned as one of the three most well known open-source backup software, even though it is one of the tools that are "so amazing, but unfortunately, if no one ever talks about them, many folks never hear of them".

An NTFS reparse point is a type of NTFS file system object. It is available with the NTFS v3.0 found in Windows 2000 or later versions. Reparse points provide a way to extend the NTFS filesystem. A reparse point contains a reparse tag and data that are interpreted by a filesystem filter driver identified by the tag. Microsoft includes several default tags including NTFS symbolic links, directory junction points, volume mount points and Unix domain sockets. Also, reparse points are used as placeholders for files moved by Windows 2000's Remote Storage Hierarchical Storage System. They also can act as hard links, but are not limited to pointing to files on the same volume: they can point to directories on any local volume. The feature is inherited to ReFS.

<span class="mw-page-title-main">Windows Home Server</span> Home server operating system by Microsoft released in 2007

Windows Home Server is a home server operating system from Microsoft. It was announced on 7 January 2007 at the Consumer Electronics Show by Bill Gates, released to manufacturing on 16 July 2007 and officially released on 4 November 2007.

NTBackup is the first built-in backup utility of the Windows NT family. It was introduced with Windows NT 3.51. NTBackup comprises a GUI (wizard-style) and a command-line utility to create, customize, and manage backups. It takes advantage of Shadow Copy and Task Scheduler. NTBackup stores backups in the BKF file format on external sources, e.g., floppy disks, hard drives, tape drives, and Zip drives. When used with tape drives, NTBackup uses the Microsoft Tape Format (MTF), which is also used by BackupAssist, Backup Exec, and Veeam Backup & Replication and is compatible with BKF.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

NetVault is a set of data protection software developed and supported by Quest Software. NetVault Backup is a backup and recovery software product. It can be used to protect data and software applications in physical and virtual environments from one central management interface. It supports many servers, application platforms, and protocols such as UNIX, Linux, Microsoft Windows, VMware, Microsoft Hyper-V, Oracle, Sybase, Microsoft SQL Server, NDMP, Oracle ACSLS, IBM DAS/ACI, Microsoft Exchange Server, DB2, and Teradata.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

<span class="mw-page-title-main">Veeam Backup & Replication</span> Backup and disaster recovery software

Veeam Backup & Replication is a proprietary backup app developed by Veeam for virtual environments built on VMware vSphere, Nutanix AHV, and Microsoft Hyper-V hypervisors. The software provides backup, restore and replication functionality for virtual machines, physical servers and workstations as well as cloud-based workload.

<span class="mw-page-title-main">MSP360</span> Application service provider

MSP360, formerly CloudBerry Lab, is a software and application service provider company that develops online backup, remote desktop and file management products integrated with more than 20 cloud storage providers.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select, and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed the word "Data" from it, removed the 7-Mode image, therefore, ONTAP 9 is the successor of Clustered Data ONTAP 8.

References

  1. Explaining deduplication rates and single-instance storage to clients. George Crump, Storage Switzerland
  2. Deduplication + Amazon S3 will save you time and money. White Paper: Published June 2008
  3. 1 2 Douceur, John (JD); Goebel, David; Corbin, Scott; Bolosky, Bill (August 2000). "Single Instance Storage in Windows 2000" (PDF). Microsoft Research. Microsoft Research and Balder Technology Group.
  4. Single Instance Storage in Microsoft Windows Storage Server 2003 R2 Archived 2007-01-04 at the Wayback Machine : Technical White Paper: Published May 2006
  5. The Exchange Team Blog, Microsoft Corp.
  6. 1 2 Windows Storage Server 2008 at Microsoft
  7. FileCAB-Team (10 April 2019). "Introduction to Data Deduplication in Windows Server 2012". Microsoft Tech Community.
  8. "Data Deduplication interoperability". docs.microsoft.com. 29 March 2022.