Disaggregated storage

Last updated

Disaggregated storage is a type of data storage within computer data centers. It allows compute resources within a computer server to be separated from storage resources without modifying any physical connections. [1]

Contents

A form of composable disaggregated infrastructure, disaggregated storage allows resources to be connected via a network fabric providing flexibility when upgrading, replacing, or adding individual resources. It also allows servers to be built for future growth, offering greater storage efficiency, scale and performance than traditional data storage without compromising throughput and latency. [2]

Background

In the past, data center storage existed in two forms.

Direct-attached storage has one critical advantage—it offers high-performance for any workloads running on that server. However, it comes with two critical disadvantages: Overall performance across the network is low, as storage can't be shared over the network without performance impact. Capacity utilization is low because disk capacity can't be directly used by other servers. [5]

Storage area networks are used to allocate storage to dozens or possibly hundreds of servers, which increases capacity utilization, but storage area networks use specialized network hardware and/or protocols that can come with disadvantages. [6] Conventional storage networking does not provide sufficient throughput or latency minimization needed by many applications, and fails to provide enough bandwidth to utilize the full performance of new flash technologies. [7]

Disaggregated storage overview

Disaggregated storage is a form of scale-out storage, built with some number of storage devices that function as a logical pool of storage that can be allocated to any server on the network over a very high performance network fabric. Disaggregated storage solves the limitations of storage area networks or direct-attached storage. Disaggregated storage is dynamically reconfigurable and optimally reconfigures physical resources to maximize performance and limit latency. [8] Disaggregated storage provides the performance of local storage with the flexibility of storage area networks.

A number of technology improvements are combining to make storage disaggregation a reality. These include:

Protocols like NVMe-oF on these very high bandwidth connections take full advantage of network improvements, removing bottlenecks, boosting performance and reducing latency.

Different levels of storage disaggregation functionality exist, with the most flexible, [11] full disaggregation, enabling storage capacity and/or performance to be provisioned from any storage device to any server on the network, then expanded, shrunk, or reprovisioned as new requirements emerge.

Related Research Articles

<span class="mw-page-title-main">Cache (computing)</span> Additional storage that enables faster access to main storage

In computing, a cache is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A cache hit occurs when the requested data can be found in a cache, while a cache miss occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.

Network throughput refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered over physical or logical links, or through network nodes. Throughput is usually measured in bits per second, and sometimes in data packets per second or data packets per time slot.

RAID is a data storage virtualization technology that combines multiple physical disk drive components into one or more logical units for the purposes of data redundancy, performance improvement, or both. This is in contrast to the previous concept of highly reliable mainframe disk drives referred to as "single large expensive disk" (SLED).

Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to servers in storage area networks (SAN) in commercial data centers.

<span class="mw-page-title-main">Network-attached storage</span> Computer data storage server

Network-attached storage (NAS) is a file-level computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. The term "NAS" can refer to both the technology and systems involved, or a specialized device built for such functionality.

NetApp, Inc. is an intelligent data infrastructure company that provides unified data storage, integrated data services, and cloud operations (CloudOps) solutions to enterprise customers. The company is based in San Jose, California. It has ranked in the Fortune 500 from 2012 to 2021. Founded in 1992 with an initial public offering in 1995, NetApp offers cloud data services for management of applications and data both online and physically.

<span class="mw-page-title-main">Direct-attached storage</span> Data storage connected directly to a computer

Direct-attached storage (DAS) is digital storage directly attached to the computer accessing it, as opposed to storage accessed over a computer network. DAS consists of one or more storage units such as hard drives, solid-state drives, optical disc drives within an external enclosure. The term "DAS" is a retronym to contrast with storage area network (SAN) and network-attached storage (NAS).

The IBM SAN Volume Controller (SVC) is a block storage virtualization appliance that belongs to the IBM System Storage product family. SVC implements an indirection, or "virtualization", layer in a Fibre Channel storage area network (SAN).

In computer science, storage virtualization is "the process of presenting a logical view of the physical storage resources to" a host computer system, "treating all storage media in the enterprise as a single pool of storage."

<span class="mw-page-title-main">Solid-state drive</span> Computer storage device with no moving parts

A solid-state drive (SSD) is a solid-state storage device. It provides persistent data storage using no moving parts.

<span class="mw-page-title-main">IBM storage</span> Product portfolio of IBM

The IBM Storage product portfolio includes disk, flash, tape, NAS storage products, storage software and services. IBM's approach is to focus on data management.

<span class="mw-page-title-main">Storage area network</span> Network which provides access to consolidated, block-level data storage

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

The most widespread standard for configuring multiple hard disk drives is RAID, which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures also exist, and are referred to by acronyms with tongue-in-cheek similarity to RAID:

IBM Storwize systems were virtualizing RAID computer data storage systems with raw storage capacities up to 32 PB. Storwize is based on the same software as IBM SAN Volume Controller (SVC).

Adaptable Modular Storage 2000 is the brand name of Hitachi Data Systems mid-range storage platforms.

NVM Express (NVMe) or Non-Volatile Memory Host Controller Interface Specification (NVMHCIS) is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via the PCI Express bus. The initial NVM stands for non-volatile memory, which is often NAND flash memory that comes in several physical form factors, including solid-state drives (SSDs), PCIe add-in cards, and M.2 cards, the successor to mSATA cards. NVM Express, as a logical-device interface, has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.

Software-defined storage (SDS) is a marketing term for computer data storage software for policy-based provisioning and management of data storage independent of the underlying hardware. Software-defined storage typically includes a form of storage virtualization to separate the storage hardware from the software that manages it. The software enabling a software-defined storage environment may also provide policy management for features such as data deduplication, replication, thin provisioning, snapshots and backup.

<span class="mw-page-title-main">Dell Technologies PowerFlex</span> Software-defined storage product

Dell Technologies PowerFlex, is a commercial software-defined storage product from Dell Technologies that creates a server-based storage area network (SAN) from local server storage using x86 servers. It converts this direct-attached storage into shared block storage that runs over an IP-based network.

In-situ processing also known as in-storage processing (ISP) is a computer science term that refers to processing data where it resides. In-situ means "situated in the original, natural, or existing place or position." An in-situ process processes data where it is stored, such as in solid-state drives (SSDs) or memory devices like NVDIMM, rather than sending the data to a computer's central processing unit (CPU).

Composable disaggregated infrastructure (CDI), sometimes stylized as composable/disaggregated infrastructure, is a technology that allows enterprise data center operators to achieve the cost and availability benefits of cloud computing using on-premises networking equipment. It is considered a class of converged infrastructure, and uses management software to combine compute, storage and network elements. It is similar to public cloud, except the equipment sits on premises in an enterprise data center.

References

  1. "Storage Disaggregation in the Data Center". Data Center Knowledge. 2013-10-18. Retrieved 2019-11-30.
  2. Tomsho, Greg (2017-07-26). MCSA Guide to Installation, Storage, and Compute with Microsoft Windows Server2016, Exam 70-740. Cengage Learning. ISBN   978-1-337-40066-4.
  3. "What is direct-attached storage (DAS)? - Definition from WhatIs.com". SearchStorage. Retrieved 2019-11-30.
  4. "What Is a Storage Area Network (SAN)? | SNIA". www.snia.org. Retrieved 2019-11-30.
  5. "Direct Attached Storage". www.enterprisestorageforum.com. 15 May 2019. Retrieved 2019-11-30.
  6. "Fibre Channel Really Is Dead". IT Infrastructure Advice, Discussion, Community - Network Computing. 2015-04-23. Retrieved 2019-11-30.
  7. "Advantages and disadvantages of storage area network (SAN)". IT Release. 2019-05-02. Retrieved 2019-11-30.
  8. Zervas, G.; Yuan, H.; Saljoghei, A.; Chen, Q.; Mishra, V. (February 2018). "Optically disaggregated data centers with minimal remote memory latency: Technologies, architectures, and resource allocation [Invited]". IEEE/OSA Journal of Optical Communications and Networking. 10 (2): A270–A285. doi:10.1364/JOCN.10.00A270. ISSN   1943-0639. S2CID   3397054.
  9. "Industry Outlook: NVMe and NVMe-oF For Storage". InterOperability Laboratory. 2018-02-08. Retrieved 2019-11-30.
  10. John, Kim, Chair SNIA Ethernet Storage Forum. "Servers and storage rapidly adopting 25GbE and 100 GbE networking" (PDF). snia.org. Retrieved 30 November 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)
  11. Understanding Rack-Scale Disaggregated StorageSergey Legtchenko, Hugh Williams, Kaveh Razavi†, Austin Donnelly, Richard Black, Andrew Douglas, Nathanael Cheriere†, Daniel Fryer†, Kai Mast†, Angela Demke Brown‡, Ana Klimovic†, Andy Slowey and Antony Rowstron. "Understanding Rack-Scale Disaggregated Storage" (PDF). usenix.org. Retrieved 30 November 2019.{{cite web}}: CS1 maint: multiple names: authors list (link)