Capacity optimization

Last updated

Capacity optimization is a general term for technologies used to improve storage use by shrinking stored data. Primary technologies used for capacity optimization are data deduplication and data compression. These are delivered as software or hardware, integrated with storage systems or delivered as standalone products. Deduplication algorithms look for redundancy in sequences of bytes across comparison windows. Typically using cryptographic hash functions as identifiers of unique sequences, sequences are compared to the history of other such sequences, and where possible, the first uniquely stored version of a sequence is referenced rather than stored again. Different methods for selecting data windows include 4KB blocks to full-file comparisons known as single-instance storage (SIS).

Capacity optimization generally refers to the use of this kind of technology in a storage system. An example of this kind of system is the Venti file system [1] in the Plan9 open source OS. There are also implementations in networking (especially wide-area networking), where they are sometimes called bandwidth optimization or WAN optimization. [2] [3]

Commercial implementations of capacity optimization are most often found in backup/recovery storage, where storage of iterating versions of backups day to day creates an opportunity for reduction in space using this approach. The term was first used widely in 2005. [4]

Related Research Articles

A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is usually made by creating a sector-by-sector copy of the source medium, thereby perfectly replicating the structure and contents of a storage device independent of the file system. Depending on the disk image format, a disk image may span one or more computer files.

Quantum Corporation stores and manages video and video-like data. The company offers high streaming performance for video and rich media applications, along with low cost, high density massive-scale data protection and archive systems. Quantum enables customers to capture, create and share digital data and preserve and protect it for decades. The company works with a network of distributors, VARs, DMRs, OEMs and other suppliers.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

File system concrete format or program for storing files and directories on a data storage device

In computing, a file system or filesystem, controls how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stops and the next begins. By separating the data into pieces and giving each piece a name, the data is easily isolated and identified. Taking its name from the way paper-based data management system is named, each group of data is called a "file". The structure and logic rules used to manage the groups of data and their names is called a "file system".

A versioning file system is any computer file system which allows a computer file to exist in several versions at the same time. Thus it is a form of revision control. Most common versioning file systems keep a number of old copies of the file. Some limit the number of changes per minute or per hour to avoid storing large numbers of trivial changes. Others instead take periodic snapshots whose contents can be accessed with similar semantics to normal file access.

Hierarchical storage management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

A virtual tape library (VTL) is a data storage virtualization technology used typically for backup and recovery purposes. A VTL presents a storage component as tape libraries or tape drives for use with existing backup software.

Venti is a network storage system that permanently stores data blocks. A 160-bit SHA-1 hash of the data acts as the address of the data. This enforces a write-once policy since no other data block can be found with the same address: the addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Venti is typically used with Fossil to provide a file system with permanent snapshots.

Veritas Backup Exec is a data protection software product designed for customers who have mixed physical and virtual environments, and who are moving to public cloud services. Supported platforms include VMware and Hyper-V virtualization, Windows and Linux operating systems, Amazon S3, Microsoft Azure and Google cloud storage, among others. All management and configuration operations are performed with a single user interface. Backup Exec also provides integrated deduplication, replication, and disaster recovery capabilities and helps to manage multiple backup servers or multi-drive tape loaders.

The IBM Storage product portfolio includes disk, flash, tape, NAS storage products, storage software and services. IBM’s approach is to focus on data management.

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file systems, e-mail server software, data backup and other storage-related computer software. Single-instance storage is a simple variant of data deduplication. While data deduplication may work at a segment or sub-block level, single instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages.

ExaGrid Systems, Inc. is a disk-based backup hardware company that was founded in 2002 and is headquartered in Westborough, Massachusetts, with several satellite offices throughout the United States, Europe and Asia.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. A related and somewhat synonymous term is single-instance (data) storage. This technique is used to improve storage utilization and can also be applied to network data transfers to reduce the number of bytes that must be sent. In the deduplication process, unique chunks of data, or byte patterns, are identified and stored during a process of analysis. As the analysis continues, other chunks are compared to the stored copy and whenever a match occurs, the redundant chunk is replaced with a small reference that points to the stored chunk. Given that the same byte pattern may occur dozens, hundreds, or even thousands of times, the amount of data that must be stored or transferred can be greatly reduced.

Ocarina Networks

Ocarina Networks was a technology company selling a hardware/software solution designed to reduce data footprints with file-aware storage optimization. A subsidiary of Dell, their flagship product, the Ocarina Appliance/Reader, released in April 2008, uses patented data compression techniques incorporating such methods as record linkage and context-based lossless data compression. The product includes the hardware-appliance-based compressor, the Ocarina Optimizer and a real-time decompressor, the software-based Ocarina Reader.

Storage efficiency is the ability to store and manage data that consumes the least amount of space with little to no impact on performance; resulting in a lower total operational cost. Efficiency addresses the real-world demands of managing costs, reducing complexity and limiting risk. The Storage Networking Industry Association (SNIA) defines storage efficiency in the SNIA Dictionary as follows:

HP Data Protector software is automated backup and recovery software for single-server to enterprise environments, supporting disk storage or tape storage targets. It provides cross-platform, online backup of data for Microsoft Windows, Unix, and Linux operating systems.

Wanova, Inc, headquartered in San Jose, California, provides software to help IT organizations manage, support and protect data on desktop and laptop computers. Wanova's primary product, Wanova Mirage, was designed as an alternative to server-hosted desktop virtualization technologies, combining the centralization and management capabilities of virtual desktop infrastructure (VDI) with features that allow the system to work for laptops and other WAN-connected desktops. Mirage enables IT organizations to store the complete contents of each personal computer (PC) in the data center for centralized management and data protection. End users execute a locally cached copy of their centrally-stored PC, which makes it possible for users to use their PC whether or not they are connected to the network. The software includes additional features that optimize the system to work over a wide area network(WAN). Wanova Mirage software was designed for information technology organizations supporting distributed enterprises and has three primary components: the Mirage Client, Mirage Server and capabilities that optimize network and storage efficiency.

NetVault is a set of data protection software developed and supported by Quest Software. NetVault Backup is a backup and recovery software product. It can be used to protect data and software applications in physical and virtual environments from one central management interface. It supports many servers, application platforms, and protocols such as UNIX, Linux, Microsoft Windows, VMware, Microsoft Hyper-V, Oracle, Sybase, Microsoft SQL Server, NDMP, Oracle ACSLS, IBM DAS/ACI, Microsoft Exchange Server, DB2, and Teradata.

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

Dell EMC Data Domain is Dell EMC’s data deduplication storage system. Development began with the founding of Data Domain, and has continued since that company’s acquisition by EMC Corporation.

References

  1. Venti filesystem Archived January 4, 2006, at the Wayback Machine
  2. "Spring and Wetherall, A Protocol Independent Technique for Eliminating Redundant Network Traffic" (PDF). Archived from the original (PDF) on 2005-03-04. Retrieved 2005-08-05.
  3. F. Foukalas et. al, Capacity optimization through sensing threshold adaptation for cognitive radio networks
  4. "Capacity optimization" defined by searchstorage.com

Capacity optimization through sensing threshold adaptation for cognitive radio networks (https://doi.org/10.1007%2Fs11590-011-0345-8)