Active Archive Alliance

Last updated
Active Archive Alliance
ActiveArchiveLogo.png
Formation2010
TypeIndustry trade group
PurposePromoting active archive
Website activearchive.com

The Active Archive Alliance is a trade association that promotes a method of tiered storage. This method provides users access to data across a virtual file system that migrates data between multiple storage systems and media types including solid-state drive/flash, hard disk drives, magnetic tape, optical disk, and cloud. The result of an active archive implementation is that data can be stored on the most appropriate media type for the given retention and restoration requirements of that data. This allows less time sensitive or infrequently accessed data to be stored on less expensive media and eliminates the need for an administrator to manually migrate data between storage systems. Additionally, since storage systems such as tape libraries have low power consumption, the operational expense of storing data in an active archive is significantly reduced.

Contents

Active archives provide organizations with a persistent view of the data in their archives and make it easy to access files whenever needed. Active archives take advantage of metadata to keep track of where primary, secondary, and tertiary copies of data reside within the system so as to maintain online accessibility to any given file in a file system, regardless of the storage medium being utilized. The impetus for active archive applications, or the software involved in an active archive, was the growing amount of unstructured data in the typical data center and the need to be able to manage and efficiently store that data. As a result, active archive applications tend to be focused on file systems and unstructured data, rather than all collective data; however, many have features and functions that address traditional backup needs as well.

Active archives provide online access, searchability and retrieval of long-term data and enable virtually unlimited scalability to accommodate future growth. In addition, active archives enhance the business value of the data by enabling users to directly access the data online, search it and use it for their business purposes.

Description

Since an active archive is built around a cost-performance ratio, the performance standards of these systems vary significantly based on each individual implementation. Within an active archive the quantities and types of media used are determined by the retention and access requirements of the varying types of data. This gives a company the flexibility to determine their own tolerance levels for accessing any given type of data. However, in general, active archive systems can recall data to a use ranging from milliseconds to 2 minutes, depending on what type of media the data is residing.

Because an active archive is being used for storing both primary, secondary, and tertiary copies of data there are several factors that become necessary for the implementation of an active archive beyond simply the ability to move and access data: data integrity, media monitoring, energy efficiency, and interoperability are all important components of an active archive. Many active archive components include features such as self-healing data within the software, versioning, encryption, and media health monitoring. Since an active archive is also being used as an archive, features such as automatic migration between storage devices and technologies, vendor neutral formatting, and ILM management are all important components to an active archive as well. Many of these standards are driven due to specific industry compliance requirements such as HIPAA, SOX, PCI Compliance, etc. [1]

Comparison to hierarchical storage management

While active archiving is often compared to hierarchical storage management (HSM), the two methods have very different implementations. Unlike an HSM, data in an active archive remains online regardless of the age or usage. The access pattern in an active archive is also different than a traditional HSM in that the data is not automatically restored to the "higher tier" storage system when requested, but rather is accessed directly from the storage device that the data is resting on. This makes every storage device in an active archive both primary storage and archival storage.

An active archive is an archive in the sense that it manages the data within the active archive throughout the lifecycle of that data according to each company's particular Information Lifecycle Management (ILM) policies and procedures.This means that while the active archive serves as the primary storage pool, it is also the final storage location for a file at the same time.

The alliance

The Active Archive Alliance is a trade organization promoting active archives for simplified, online access to all data. It was formed in April 2010 by Compellent (later acquired by Dell), FileTek, QStar Technologies, and Spectra Logic. The alliance is open to providers of active archive technologies including file systems, active archive applications, cloud storage, and high-density tape and disk storage, as well as individuals and end-users. Current members/sponsors include Fujifilm, IBM, Iron Mountain, Quantum Corporation, Spectra Logic, Western Digital, and some others.

Related Research Articles

Computer data storage Storage of digital data readable by computers

Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.

In computing, a file server is a computer attached to a network that provides a location for shared disk access, i.e. storage of computer files that can be accessed by the workstations that are able to reach the computer that shares the access through a computer network. The term server highlights the role of the machine in the traditional client–server scheme, where the clients are the workstations using the storage. A file server does not normally perform computational tasks or run programs on behalf of its client workstations.

A disk image, in computing, is a computer file containing the contents and structure of a disk volume or of an entire data storage device, such as a hard disk drive, tape drive, floppy disk, optical disc, or USB flash drive. A disk image is usually made by creating a sector-by-sector copy of the source medium, thereby perfectly replicating the structure and contents of a storage device independent of the file system. Depending on the disk image format, a disk image may span one or more computer files.

Quantum Corporation is a data storage and management company headquartered in San Jose, California. The company works with a network of distributors, VARs, DMRs, OEMs and other suppliers. From its founding in 1980 until 2001, it was also a major disk storage manufacturer, and was based in Milpitas, California. Quantum sold its hard disk drive business to Maxtor in 2001 and now focuses on integrated storage systems.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

File system Format or program for storing files and directories

In computing, file system or filesystem is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one large body of data with no way to tell where one piece of data stopped and the next began, or where any piece of data was located when it was time to retrieve it. By separating the data into pieces and giving each piece a name, the data is easily isolated and identified. Taking its name from the way a paper-based data management system is named, each group of data is called a "file." The structure and logic rules used to manage the groups of data and their names is called a "file system."

Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval and distribution. Systems using ECM generally provide a secure repository for managed items, analog or digital. They also include one methods for importing content to bring manage new items, and several presentation methods to make items available for use. Although ECM content may be protected by digital rights management (DRM), it is not required. ECM is distinguished from general content management by its cognizance of the processes and procedures of the enterprise for which it is created.

Hierarchical storage management (HSM) is a data storage technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, such as solid state drive arrays, are more expensive than slower devices, such as hard disk drives, optical discs and magnetic tape drives. While it would be ideal to have all data available on high-speed devices all the time, this is prohibitively expensive for many organizations. Instead, HSM systems store the bulk of the enterprise's data on slower devices, and then copy data to faster disk drives when needed. In effect, HSM turns the fast disk drives into caches for the slower mass storage devices. The HSM system monitors the way data is used and makes best guesses as to which data can safely be moved to slower devices and which data should stay on the fast devices.

IBM Spectrum Protect is a data protection platform that gives enterprises a single point of control and administration for backup and recovery. It is the flagship product in the IBM Spectrum Protect family.

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

Automated tiered storage is the automated progression or demotion of data across different tiers (types) of storage devices and media. The movement of data takes place in an automated way with the help of a software or embedded firmware and is assigned to the related media according to performance and capacity requirements. More advanced implementations include the ability to define rules and policies that dictate if and when data can be moved between the tiers, and in many cases provides the ability to pin data to tiers permanently or for specific periods of time. Implementations vary, but are classed into two broad categories: pure software based implementations that run on general purpose processors supporting most forms of general purpose storage media and embedded automated tiered storage controlled by firmware as part of a closed embedded storage system such as a SAN disk array. Software Defined Storage architectures commonly include a component of tiered storage as part of their primary functions.

IBM storage

The IBM Storage product portfolio includes disk, flash, tape, NAS storage products, storage software and services. IBM's approach is to focus on data management.

Data proliferation refers to the prodigious amount of data, structured and unstructured, that businesses and governments continue to generate at an unprecedented rate and the usability problems that result from attempting to store and manage that data. While originally pertaining to problems associated with paper documentation, data proliferation has become a major problem in primary and secondary data storage on computers.

StorNext File System (SNFS), colloquially referred to as "StorNext" is a shared disk file system made by Quantum Corporation. StorNext enables multiple Windows, Linux and Apple workstations to access shared block storage over a Fibre Channel network. With the StorNext file system installed, these computers can read and write to the same storage volume at the same time enabling what is known as a "file-locking SAN." StorNext is used in environments where large files must be shared, and accessed simultaneously by users without network delays, or where a file must be available for access by multiple readers starting at different times. Common use cases include multiple video editor environments in feature film, television and general video post production.

Spectra Logic Corporation is a computer data storage company based in Boulder, Colorado in the United States. The company builds backup and archive technology for secondary storage to protect data after it migrates from primary disk. Spectra Logic's primary product is tape libraries. The company was founded in 1979, and is a privately held company.

A stub file is a computer file that appears to the user to be on disk and immediately available for use, but is actually held either in part or entirely on a different storage medium. When a stub file is accessed, device driver software intercepts the access, retrieves the data from its actual location and writes it to the file, then allows the user's access to proceed. Typically, users are unaware that the file's data is stored on a different medium, though they may experience a slight delay when accessing such a file.

Content storage management (CSM) is a technique for the evolution of traditional media archive technology used by media companies and content owners to store and protect valuable file-based media assets. CSM solutions focus on active management of content and media assets regardless of format, type and source, interfaces between proprietary content source/destination devices and any format and type of commodity IT centric storage technology. These digital media files most often contain video but in rarer cases may be still pictures or sound. A CSM system may be directed manually but is more often directed by upper-level systems, which may include media asset management (MAM), automation, or traffic.

Data at rest Data stored on a device or backup medium

Data at rest in information technology means data that is housed physically on computer data storage in any digital form. Data at rest includes both structured and unstructured data. This type of data is subject to threats from hackers and other malicious threats to gain access to the data digitally or physical theft of the data storage media. To prevent this data from being accessed, modified or stolen, organizations will often employ security protection measures such as password protection, data encryption, or a combination of both. The security options used for this type of data are broadly referred to as data at rest protection (DARP).

Data defined storage is a marketing term for managing, protecting, and realizing value from data by uniting application, information and storage tiers. This is achieved through a process of unification, where users, applications and devices gain access to a repository of captured metadata that empowers organizations to access, query and manipulate the critical components of the data to transform it into information, while providing a flexible and scalable platform for storage of the underlying data. The technology abstracts the data entirely from the storage, allowing full transparent access to users.

Nirvana was virtual object storage software developed and maintained by General Atomics.

References

  1. "A Tutorial on Self-Healing Data Storage Systems".