Storage efficiency

Last updated

Storage efficiency is the ability to store and manage data that consumes the least amount of space with little to no impact on performance; resulting in a lower total operational cost. Efficiency addresses the real-world demands of managing costs, reducing complexity and limiting risk. The Storage Networking Industry Association (SNIA) defines storage efficiency in the SNIA Dictionary as follows:

Contents

The efficiency of an empty enterprise level system is commonly in the 4070% range, depending on what combination of RAID, mirroring and other data protection technologies are deployed, and may be even lower for highly redundant remotely mirrored systems. As data is stored on the system, technologies such as deduplication and compression may store data at a greater than 1-to-1 data size-to-space consumed ratio, and efficiency rises, often to over 100% for primary data, and thousands of percent for backup data.

Technologies

Different technologies exist at different and sometimes multiple levels:

Snapshot technologyknown formally as "delta snapshot technology"gives the ability to use the same dataset multiple times for multiple reasons, while storing only the changes between each dataset. Some storage vendors integrate their snapshot capabilities at the operating system and/or application level, enabling access to the data the snapshots are holding at the system and/or application management layers. Terminology around snapshots and "clones" is currently confusing, and care must be taken when evaluating vendor claims. In particular, some vendors call full point-in-time copies "snapshots" or "clones", while others use the same terms to refer to shared-block "delta" snapshots or clones. And some implementations can only do read-only snapshots, while others are able to provide writable ones as well.

Data deduplication technology can be used to very efficiently track and remove duplicate blocks of data inside a storage unit. There are a multitude of implementations, each with their separate advantages and disadvantages. Deduplication is most efficient at the shared storage layer, however, implementations in software and even databases exist. The most suitable candidates for deduplication are backup and platform virtualization, because both applications typically produce or use a lot of almost identical copies. However, some vendors are now offering in-place deduplication, which deduplicates primary storage.

Thin provisioning technology is a technique to prevent under-utilization by sharing the allocated, but not yet utilized capacity. A good example is Gmail, where every Gmail account has a large amount of allocated capacity. Because most Gmail users only use a fraction of the allocated capacity, this "free space" is "shared" among all Gmail users.

Major advantages

Actively increasing storage efficiency using these techniques has the following advantages:

Backup and restore. Using snapshots, time used for both backup and restore RTO can be minimized. This can greatly reduce cost, and reduce hours of downtime to seconds of downtime. Snapshots also allow for better RPO values.

Reducing floorspace. When less storage is required to store a given amount of data, less data center floorspace is required.

Reducing energy use. When fewer spindles are required to store a given amount of data, less power is required.

Provisioning efficiency. Writable delta snapshot technology allows for very fast provisioning of writable data copies. This reduces waiting time in processes that require that data. Examples are data mining, test data, etc. Snapshot integration at the OS and/or application level also leads to faster provisioning, because system and/or application managers are able to manage their own snapshots without having to wait for storage managers and/or provisioning procedures.

Major commercial players

All major vendors are implementing one or more of these technologies, because storage efficiency is becoming more and more popular. Customers are facing storage requirements that are growing exponentially and a strong demand for cost-cutting. The major vendors are NetApp, EMC, HDS, IBM and HP.


Related Research Articles

Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred until the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, while adding a small overhead to resource-modifying operations.

In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, stripe together or otherwise combine partitions into larger virtual partitions that administrators can re-size or move, potentially without interrupting system use.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

The Write Anywhere File Layout (WAFL) is a proprietary file system that supports large, high-performance RAID arrays, quick restarts without lengthy consistency checks in the event of a crash or power failure, and growing the filesystems size quickly. It was designed by NetApp for use in its storage appliances like NetApp FAS, AFF, Cloud Volumes ONTAP and ONTAP Select.

Snapshot (computer storage)

In computer systems, a snapshot is the state of a system at a particular point in time. The term was coined as an analogy to that in photography. It can refer to an actual copy of the state of a system or to a capability provided by certain systems.

A virtual tape library (VTL) is a data storage virtualization technology used typically for backup and recovery purposes. A VTL presents a storage component as tape libraries or tape drives for use with existing backup software.

A NetApp FAS is a computer storage product by NetApp running the ONTAP operating system; the terms ONTAP, AFF, ASA, FAS are often used as synonyms. "Filer" is also used as a synonym although this is not an official name. There are three types of FAS systems: Hybrid, All-Flash, and All SAN Array:

  1. NetApp proprietary custom-build hardware appliances with HDD or SSD drives called hybrid Fabric-Attached Storage
  2. NetApp proprietary custom-build hardware appliances with only SSD drives and optimized ONTAP for low latency called ALL-Flash FAS
  3. All SAN Array build on top of AFF platform, and provide only SAN-based data protocol connectivity.

In computing, thin provisioning involves using virtualization technology to give the appearance of having more physical resources than are actually available. If a system always has enough resource to simultaneously support all of the virtualized resources, then it is not thin provisioned. The term thin provisioning is applied to disk layer in this article, but could refer to an allocation scheme for any resource. For example, real memory in a computer is typically thin-provisioned to running tasks with some form of address translation technology doing the virtualization. Each task acts as if it has real memory allocated. The sum of the allocated virtual memory assigned to tasks typically exceeds the total of real memory.

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file systems, e-mail server software, data backup, and other storage-related computer software. Single-instance storage is a simple variant of data deduplication. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages.

Btrfs is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager, developed together. It was initially designed at Oracle Corporation in 2007 for use in Linux, and since November 2013, the file system's on-disk format has been declared stable in the Linux kernel. According to Oracle, Btrfs "is not a true acronym".

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

Storage area network Network which provides access to consolidated, block-level data storage

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

Resilient File System (ReFS), codenamed "Protogon", is a Microsoft proprietary file system introduced with Windows Server 2012 with the intent of becoming the "next generation" file system after NTFS.

VM-aware storage (VAS) is computer data storage designed specifically for managing storage for virtual machines (VMs) within a data center. The goal is to provide storage that is simpler to use with functionality better suited for VMs compared with general-purpose storage. VM-aware storage allows storage to be managed as an integrated part of managing VMs rather than as logical unit numbers (LUNs) or volumes that are separately configured and managed.

Software-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots and backup.

Object storage is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks. Each object typically includes the data itself, a variable amount of metadata, and a globally unique identifier. Object storage can be implemented at multiple levels, including the device level, the system level, and the interface level. In each case, object storage seeks to enable capabilities not addressed by other storage architectures, like interfaces that are directly programmable by the application, a namespace that can span multiple instances of physical hardware, and data-management functions like data replication and data distribution at object-level granularity.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed word "Data" from it and remove 7-Mode image, therefore, ONTAP 9 is successor from Clustered Data ONTAP 8.

ZFS File system

ZFS combines a file system with a volume manager. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open source license as OpenSolaris for around 5 years from 2005, before being placed under a closed source license when Oracle Corporation acquired Sun in 2009/2010. During 2005 to 2010, the open source version of ZFS was ported to Linux, Mac OS X and FreeBSD. In 2010, the illumos project forked a recent version of OpenSolaris, to continue its development as an open source project, including ZFS. In 2013, OpenZFS was founded to coordinate the development of open source ZFS. OpenZFS maintains and manages the core ZFS code, while organizations using ZFS maintain the specific code and validation processes required for ZFS to integrate within their systems. OpenZFS is widely used in Unix-like systems.