NEC HYDRAstor

Last updated
NEC HYDRAstor
Stable release
HS8-5000 / 2016
Written in C++
Platform Cross-platform
Type Backup and Archive
License proprietary
Website www.necam.com/hydrastor/

NEC HYDRAstor is a disk-based grid storage system with data deduplication for backups and archiving, developed by NEC Corporation. A HYDRAstor storage system can be composed of multiple nodes, starting from one up to 100+ nodes. Each node contains standard hardware including disk drives, CPU, memory and network interfaces and is integrated with the HYDRAstor software into a single storage pool. HYDRAstor software incorporates multiple features of distributed storage systems: content-addressable storage, global data deduplication, variable block size, Rabin fingerprinting, erasure codes, data encryption and load balancing.

Contents

History

HYDRAstor project was started in 2002 by Cezary Dubnicki and Cristian Ungureanu in NEC Research in Princeton, NJ.[ citation needed ] Prototype version was implemented and evaluated in 2004. [1] After another 3 years of development, first version of HYDRAstor was brought to the market in US and Japan. Subsequent version with improved software and hardware were released in following years, with latest version, HS8-5000, providing 72TB raw storage per node, up to 11.88PB of raw capacity in its maximum configuration. [2]

Main features

HYDRAstor can be scaled from single node to 165 nodes in a multi-rack grid appliance. [3] Capacity and bandwidth can be scaled independently by using different types of nodes:

HYDRAstor supports online expansion, with automatic data migration and with no downtime. [4] In standard configuration, HYDRAstor provides resiliency to up to 3 concurrent disk or node failures. [5] Failures are automatically detected and data reconstruction is automatically performed, which means that if time between failures is enough to reconstruct data, system can withstand any number of them. [4]

Related Research Articles

Quantum Corporation is a data storage and management company headquartered in San Jose, California. The company works with a network of distributors, VARs, DMRs, OEMs and other suppliers. From its founding in 1980 until 2001, it was also a major disk storage manufacturer, and was based in Milpitas, California. Quantum sold its hard disk drive business to Maxtor in 2001 and now focuses on integrated storage systems.

In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is "backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server.

Data corruption Errors in computer data that introduce unintended changes to the original data

Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of measures to provide end-to-end data integrity, or lack of errors.

MySQL Cluster is a technology providing shared-nothing clustering and auto-sharding for the MySQL database management system. It is designed to provide high availability and high throughput with low latency, while allowing for near linear scalability. MySQL Cluster is implemented through the NDB or NDBCLUSTER storage engine for MySQL.

A virtual tape library (VTL) is a data storage virtualization technology used typically for backup and recovery purposes. A VTL presents a storage component as tape libraries or tape drives for use with existing backup software.

AdvFS, also known as Tru64 UNIX Advanced File System, is a file system developed in the late 1980s to mid-1990s by Digital Equipment Corporation for their OSF/1 version of the Unix operating system. In June 2008, it was released as free software under the GPL-2.0-only license. AdvFS has been used in high-availability systems where fast recovery from downtime is essential.

Apache Hadoop is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop was originally designed for computer clusters built from commodity hardware, which is still the common use. It has since also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework.

IBM storage

The IBM Storage product portfolio includes disk, flash, tape, NAS storage products, storage software and services. IBM's approach is to focus on data management.

Single-instance storage (SIS) is a system's ability to take multiple copies of content and replace them by a single shared copy. It is a means to eliminate data duplication and to increase efficiency. SIS is frequently implemented in file systems, e-mail server software, data backup, and other storage-related computer software. Single-instance storage is a simple variant of data deduplication. While data deduplication may work at a segment or sub-block level, single-instance storage works at the whole-file level and eliminates redundant copies of entire files or e-mail messages.

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system. Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.

ExaGrid Systems, Inc. is a disk-based backup hardware company that was founded in 2002 and is headquartered in Westborough, Massachusetts, with several satellite offices throughout the United States, Europe and Asia.

Ceph is an open-source software storage platform, implements object storage on a single distributed computer cluster, and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalable to the exabyte level, and freely available.

In computing, data deduplication is a technique for eliminating duplicate copies of repeating data. Successful implementation of the technique can improve storage utilization, which may in turn lower capital expenditure by reducing the overall amount of storage media required to meet storage capacity needs. It can also be applied to network data transfers to reduce the number of bytes that must be sent.

Storage area network Network which provides access to consolidated, block-level data storage

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

NetVault is a set of data protection software developed and supported by Quest Software. NetVault Backup is a backup and recovery software product. It can be used to protect data and software applications in physical and virtual environments from one central management interface. It supports many servers, application platforms, and protocols such as UNIX, Linux, Microsoft Windows, VMware, Microsoft Hyper-V, Oracle, Sybase, Microsoft SQL Server, NDMP, Oracle ACSLS, IBM DAS/ACI, Microsoft Exchange Server, DB2, and Teradata.

NEC Laboratories America, Inc. , formerly known as NEC Research Institute, is the US-based center for NEC Corporation’s global network of corporate research laboratories. It was established in 1988 with the primary location in Princeton, New Jersey and subsequently, a second location in the San Francisco Bay Area. Its mission is to generate significant new knowledge and create innovative solutions for society in collaboration with industry, academia and governments.

Darrell Long American computer scientist

Darrell Don Earl Long is an American computer scientist and computer engineer who is Kumar Malavalli Endowed Chair of Storage Systems Research and Distinguished Professor of Engineering at the University of California, Santa Cruz. He is Editor-in-Chief, emeritus, of the IEEE Letters of the Computer Society (LOCS), Editor-in-Chief, emeritus, of the ACM Transactions on Storage (TOS). In 2002, he was the founder of the Conference on File and Storage Technologies (FAST).

Dell Fluid File System

Dell Fluid File System, or FluidFS, is a shared-disk filesystem made by Dell that provides distributed file systems to clients. Customers buy an appliance: a combination of purpose-built network-attached storage (NAS) controllers with integrated primary and backup power supplies attached to block level storage via the iSCSI or Fiber Channel protocol. A single Dell FluidFS appliance consists of two controllers operating in concert connecting to the back-end storage area network (SAN). Depending on the storage capacity requirements and user preference, FluidFS version 4 NAS appliances can be used with Compellent or EqualLogic SAN arrays. The EqualLogic FS7600 and FS7610 connect to the client network and to Dell's EqualLogic arrays with either 1 Gbit/s (FS7600) or 10 Gbit/s (FS7610) iSCSI protocol. For Compellent, FluidFS is available with either 1 Gbit/s or 10 Gbit/s iSCSI connectivity to the client network and connection to the backend Compellent SAN can be either 8 Gbit/s Fibre Channel or 10 Gbit/s iSCSI.

Dell EMC Data Domain was Dell EMC’s data deduplication storage system. Development began with the founding of Data Domain, and continued since that company’s acquisition by EMC Corporation.

ONTAP or Data ONTAP or Clustered Data ONTAP (cDOT) or Data ONTAP 7-Mode is NetApp's proprietary operating system used in storage disk arrays such as NetApp FAS and AFF, ONTAP Select and Cloud Volumes ONTAP. With the release of version 9.0, NetApp decided to simplify the Data ONTAP name and removed word "Data" from it and remove 7-Mode image, therefore, ONTAP 9 is successor from Clustered Data ONTAP 8.

References

  1. Cezary Dubnicki; Cristian Ungureanu; Wojciech Kilian (2004). "FPN: a distributed hash table for commercial applications". Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004. 13th IEEE International Symposium on High performance Distributed Computing (HPDC '04). IEEE. pp. 120–128. doi:10.1109/HPDC.2004.1323509. ISBN   0-7695-2175-4.
  2. "HYDRAstor HS8-5000: Scale-out Global Dedupliaction Storage for Backup and Archive". NEC America. Retrieved 2017-01-11.
  3. "HYDRAstor HS8-5000: Scale-out Global Dedupliaction Storage for Backup and Archive". NEC America. Retrieved 2017-01-11.
  4. 1 2 Cezary Dubnicki; Leszek Gryz; Łukasz Heldt; Michał Kaczmarczyk; Wojciech Kilian; Przemysław Strzelczak; Jerzy Szczepkowski; Cristian Ungureanu; Michał Wełnicki (February 24–27, 2009). "HYDRAstor: a Scalable Secondary Storage". Proceedings 7th USENIX Conference on File and Storage Technologies. San Francisco, California: USENIX . Retrieved 2015-04-29.
  5. "HYDRAstor - FAQ". NEC America. Retrieved 2015-04-29.