SCSI RDMA Protocol

Last updated

In computing the SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA). [1] [2] The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is generally available through e.g. the TCP/IP communication protocol.

Contents

Though the SRP protocol has been designed to use RDMA networks efficiently, it is also possible to implement the SRP protocol over networks that do not support RDMA.

History

SRP was published as an ANSI standard (ANSI INCITS 365-2002) in 2002 and renewed in 2007 and 2019. [3] [4]

As with the ISCSI Extensions for RDMA (iSER) communication protocol, there is the notion of a target (a system that stores the data) and an initiator (a client accessing the target) with the target initiating data transfers. In other words, when an initiator writes data to a target, the target executes an RDMA read to fetch the data from the initiator and when a user issues a SCSI read command, the target sends an RDMA write to the initiator.

While the SRP protocol is easier to implement than the iSER protocol, iSER offers more management functionality, e.g. the target discovery infrastructure enabled by the iSCSI protocol.

Performance

Bandwidth and latency of storage targets supporting the SRP or the iSER protocol should be similar. On Linux, there are two SRP and two iSER storage target implementations available that run inside the kernel (SCST [5] and LIO) and an iSER storage target implementation that runs in user space (STGT). Measurements have shown that the SCST SRP target has a lower latency and a higher bandwidth than the STGT iSER target. This is probably because the RDMA communication overhead is lower for a component implemented in the Linux kernel than for a user space Linux process, and not because of protocol differences. [6]

Implementations

In order to use the SRP protocol, an SRP initiator implementation, an SRP target implementation and networking hardware supported by the initiator and target are needed. The following software SRP initiator implementations exist:

The following SRP target implementations exist:

See also

Related Research Articles

In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and other computer programs to access hardware functions without needing to know precise details about the hardware being used.

<span class="mw-page-title-main">SCSI</span> Set of computer and peripheral connection standards

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, electrical, optical and logical interfaces. The SCSI standard defines command sets for specific peripheral device types; the presence of "unknown" as one of these types means that in theory it can be used as an interface to almost any device, but the standard is highly pragmatic and addressed toward commercial requirements. The initial Parallel SCSI was most commonly used for hard disk drives and tape drives, but it can connect a wide range of other devices, including scanners and CD drives, although not all controllers can handle all devices.

Internet Small Computer Systems Interface or iSCSI is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP network. iSCSI facilitates data transfers over intranets and to manage storage over long distances. It can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval.

<span class="mw-page-title-main">InfiniBand</span> High-speed, low-latency computer networking bus used in supercomputing

InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. By 2014, it was the most commonly used interconnect in the TOP500 list of supercomputers, until about 2016.

In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.

The following is a timeline of virtualization development. In computing, virtualization is the use of a computer to simulate another computer. Through virtualization, a host simulates a guest by exposing virtual hardware devices, which may be done through software or by allowing access to a physical device connected to the machine.

iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym.

<span class="mw-page-title-main">Kernel-based Virtual Machine</span> Virtualization module in the Linux kernel

Kernel-based Virtual Machine (KVM) is a virtualization module in the Linux kernel that allows the kernel to function as a hypervisor. It was merged into the mainline Linux kernel in version 2.6.20, which was released on February 5, 2007. KVM requires a processor with hardware virtualization extensions, such as Intel VT or AMD-V. KVM has also been ported to other operating systems such as FreeBSD and illumos in the form of loadable kernel modules.

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). RDMA is provided by either the Transmission Control Protocol (TCP) with RDMA services (iWARP) that uses existing Ethernet setup and therefore no need of huge hardware investment, RoCE that does not need the TCP layer and therefore provides lower latency, or InfiniBand. It permits data to be transferred directly into and out of SCSI computer memory buffers without intermediate data copies and without much CPU intervention.

Ceph is an open-source software-defined storage platform that implements object storage on a single distributed computer cluster and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a single point of failure, scalability to the exabyte level, and to be freely available. Since version 12, Ceph does not rely on other filesystems and can directly manage HDDs and SSDs with its own storage backend BlueStore and can completely self reliantly expose a POSIX filesystem.

<span class="mw-page-title-main">OpenFabrics Alliance</span>

The OpenFabrics Alliance is a non-profit organization that promotes remote direct memory access (RDMA) switched fabric technologies for server and storage connectivity. These high-speed data-transport technologies are used in high-performance computing facilities, in research and various industries.

<span class="mw-page-title-main">Network block device</span>

On Linux, network block device (NBD) is a network protocol that can be used to forward a block device from one machine to a second machine. As an example, a local machine can access a hard disk drive that is attached to another computer.

<span class="mw-page-title-main">Fibre Channel over Ethernet</span> Computer network technology

Fibre Channel over Ethernet (FCoE) is a computer network technology that encapsulates Fibre Channel frames over Ethernet networks. This allows Fibre Channel to use 10 Gigabit Ethernet networks while preserving the Fibre Channel protocol. The specification was part of the International Committee for Information Technology Standards T11 FC-BB-5 standard published in 2009.

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

NVM Express (NVMe) or Non-Volatile Memory Host Controller Interface Specification (NVMHCIS) is an open, logical-device interface specification for accessing a computer's non-volatile storage media usually attached via PCI Express (PCIe) bus. The initialism NVM stands for non-volatile memory, which is often NAND flash memory that comes in several physical form factors, including solid-state drives (SSDs), PCIe add-in cards, and M.2 cards, the successor to mSATA cards. NVM Express, as a logical-device interface, has been designed to capitalize on the low latency and internal parallelism of solid-state storage devices.

<span class="mw-page-title-main">LIO (SCSI target)</span> Open-source version of SCSI target

In computing, Linux-IO (LIO) Target is an open-source implementation of the SCSI target that has become the standard one included in the Linux kernel. Internally, LIO does not initiate sessions, but instead provides one or more Logical Unit Numbers (LUNs), waits for SCSI commands from a SCSI initiator, and performs required input/output data transfers. LIO supports common storage fabrics, including FCoE, Fibre Channel, IEEE 1394, iSCSI, iSCSI Extensions for RDMA (iSER), SCSI RDMA Protocol (SRP) and USB. It is included in most Linux distributions; native support for LIO in QEMU/KVM, libvirt, and OpenStack makes LIO also a storage option for cloud deployments.

RDMA over Converged Ethernet (RoCE) or InfiniBand over Ethernet (IBoE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. It does this by encapsulating an InfiniBand (IB) transport packet over Ethernet. There are two RoCE versions, RoCE v1 and RoCE v2. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.

<span class="mw-page-title-main">SCST</span>

SCST is a GPL licensed SCSI target software stack. The design goals of this software stack are high performance, high reliability, strict conformance to existing SCSI standards, being easy to extend and easy to use. SCST does not only support multiple SCSI protocols but also supports multiple local storage interfaces and also storage drivers implemented in user-space via the scst_user driver.

<span class="mw-page-title-main">Nexenta Systems</span> Software company

Nexenta Systems, Inc. is a company that markets computer software for data storage and backup, headquartered in San Jose, California. Nexenta develops the products NexentaStor, NexentaCloud, NexentaFusion, and NexentaEdge.

Enterprise Storage OS, also known as ESOS, is a Linux distribution that serves as a block-level storage server in a storage area network (SAN). ESOS is composed of open-source software projects that are required for a Linux distribution and several proprietary build and install time options. The SCST project is the core component of ESOS; it provides the back-end storage functionality.

References

  1. ANSI T10 SRPr16a, www.t10.org.
  2. ANSI T10 SRPr16a, web.archive.org
  3. ANSI webstore for purchasing standards - ANSI INCITS 365-2002
  4. "SCSI RDMA Protocol - 2 (SRP-2)" (PDF). ANSI T10. 7 May 2019.
  5. 1 2 The SCST Project, an open source SCSI target implementation for Linux that includes an SRP target implementation.
  6. Performance of SCST versus STGT.
  7. OpenFabrics Enterprise Distribution for Windows.
  8. Mellanox OFED Drivers for VMware Infrastructure 3 and vSphere 4.
  9. Sun's download page.
  10. "Configuring SRP Devices With COMSTAR" . Retrieved 4 February 2013.
  11. 1 2 Linux kernel version 2.6.24 change log.
  12. 1 2 D. Boutcher and D. Engebretsen, Linux Virtualization on IBM POWER5 Systems, Proceedings of the Linux Symposium, Vol. 1, July 2004, pp. 113-120.
  13. 1 2 IBM Systems Hardware Information Center, Virtual SCSI.
  14. OFED 1.5.4.1 Release Notes, OpenFabrics website, January 2012.
  15. "SCSI RDMA Protocol". linux-iscsi.org.
  16. Linus Torvalds (2012-01-18). "InfiniBand/SRP merge". lkml.org.
  17. "DDN SFA10000 User Guide" (PDF). ddn.com. 2012-01-18.
  18. "DDN Corporate Overview, IB Storage 101 section" (PDF). ddn.com. 2012-01-18.
  19. IBM (10 March 2014). "IBM FlashSystem Integration Guide".
  20. Moellenkamp, Joerg. "PSARC/2009/111: SRP Target in Comstar" . Retrieved 4 February 2013.