ISCSI Extensions for RDMA

Last updated

The iSCSI Extensions for RDMA (iSER) is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA). RDMA can be provided by the Transmission Control Protocol (TCP) with RDMA services (iWARP), which uses an existing Ethernet setup and therefore has lower hardware costs, RoCE (RDMA over Converged Ethernet), which does not need the TCP layer and therefore provides lower latency, or InfiniBand. iSER permits data to be transferred directly into and out of SCSI computer memory buffers (those which connect computers and storage devices) without intermediate data copies and with minimal CPU involvement.

Contents

History

An RDMA consortium was announced on May 31, 2002, with a goal of product implementations by 2003. [1] The consortium released their proposal in July, 2003. [2] The protocol specifications were published as drafts in September 2004 in the Internet Engineering Task Force and issued as RFCs in October 2007. [3] [4] The OpenIB Alliance was renamed in 2007 to be the OpenFabrics Alliance, and then released an open source software package. [5]

Description

The motivation for iSER is to use RDMA to avoid unnecessary data copying on the target and initiator. The Datamover Architecture (DA) defines an abstract model in which the movement of data between iSCSI end nodes is logically separated from the rest of the iSCSI protocol; iSER is one Datamover protocol. The interface between the iSCSI and a Datamover protocol, iSER in this case, is called Datamover Interface (DI).

The main difference between the standard iSCSI and iSCSI over iSER is the execution of SCSI read/write commands. With iSER the target drives all data transfer (with the exception of iSCSI unsolicited data) by issuing RDMA write/read operations, respectively. When the iSCSI layer issues an iSCSI command PDU, it calls the Send_Control primitive, which is part of the DI. The Send_Control primitive sends the STag with the PDU. The iSER layer in the target side notifies the target that the PDU was received with the Control_Notify primitive (which is part of the DI). The target calls the Put_Data or Get_Data primitives (which are part of the DI) to perform an RDMA write/read operation respectively. Then, the target calls the Send_Control primitive to send a response to the initiator. An example is shown in the figures (time progresses from top to bottom).

READ command execution with iSER Iscsi-iser-read-steps-1.JPG
READ command execution with iSER
WRITE command execution with iSER Iscsi-iser-write-steps-1.JPG
WRITE command execution with iSER

All iSCSI control-type PDUs contain an iSER header, which allows the initiator to advertise the STags that were generated during buffer registration. The target will use the STags later for RDMA read/write operations.

See also

Related Research Articles

<span class="mw-page-title-main">SCSI</span> Set of computer and peripheral connection standards

Small Computer System Interface is a set of standards for physically connecting and transferring data between computers and peripheral devices, best known for its use with storage devices such as hard disk drives. SCSI was introduced in the 1980s and has seen widespread use on servers and high-end workstations, with new SCSI standards being published as recently as SAS-4 in 2017.

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

Telnet is a client/server application protocol that provides access to virtual terminals of remote systems on local area networks or the Internet. It is a protocol for bidirectional 8-bit communications. Its main goal was to connect terminal devices and terminal-oriented processes.

<span class="mw-page-title-main">Protocol data unit</span> Unit of information transmitted over a computer network

In telecommunications, a protocol data unit (PDU) is a single unit of information transmitted among peer entities of a computer network. It is composed of protocol-specific control information and user data. In the layered architectures of communication protocol stacks, each layer implements protocols tailored to the specific type or mode of data exchange.

Simple Network Management Protocol (SNMP) is an Internet Standard protocol for collecting and organizing information about managed devices on IP networks and for modifying that information to change device behavior. Devices that typically support SNMP include cable modems, routers, switches, servers, workstations, printers, and more.

Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like many other protocols, builds on the Open Network Computing Remote Procedure Call system. NFS is an open IETF standard defined in a Request for Comments (RFC), allowing anyone to implement the protocol.

The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data connections between the client and the server. FTP users may authenticate themselves with a plain-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that protects the username and password, and encrypts the content, FTP is often secured with SSL/TLS (FTPS) or replaced with SSH File Transfer Protocol (SFTP).

Internet Small Computer Systems Interface or iSCSI is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP network. iSCSI facilitates data transfers over intranets and to manage storage over long distances. It can be used to transmit data over local area networks (LANs), wide area networks (WANs), or the Internet and can enable location-independent data storage and retrieval.

<span class="mw-page-title-main">InfiniBand</span> Network standard

InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. Between 2014 and June 2016, it was the most commonly used interconnect in the TOP500 list of supercomputers.

Modbus or MODBUS is a client/server data communications protocol in the application layer. It was originally published by Modicon in 1979 for use with its programmable logic controllers (PLCs). Modbus has become a de facto standard communication protocol for communication between industrial electronic devices in a wide range of buses and network.

In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.

A network socket is a software structure within a network node of a computer network that serves as an endpoint for sending and receiving data across the network. The structure and properties of a socket are defined by an application programming interface (API) for the networking architecture. Sockets are created only during the lifetime of a process of an application running in the node.

iWARP is a computer networking protocol that implements remote direct memory access (RDMA) for efficient data transfer over Internet Protocol networks. Contrary to some accounts, iWARP is not an acronym.

<span class="mw-page-title-main">Network block device</span> Network storage protocol

On Linux, network block device (NBD) is a network protocol that can be used to forward a block device from one machine to a second machine. As an example, a local machine can access a hard disk drive that is attached to another computer.

<span class="mw-page-title-main">Fibre Channel over Ethernet</span> Computer network technology

Fibre Channel over Ethernet (FCoE) is a computer network technology that encapsulates Fibre Channel frames over Ethernet networks. This allows Fibre Channel to use 10 Gigabit Ethernet networks while preserving the Fibre Channel protocol. The specification was part of the International Committee for Information Technology Standards T11 FC-BB-5 standard published in 2009. FCoE did not see widespread adoption.

In computing the SCSI RDMA Protocol (SRP) is a protocol that allows one computer to access SCSI devices attached to another computer via remote direct memory access (RDMA). The SRP protocol is also known as the SCSI Remote Protocol. The use of RDMA makes higher throughput and lower latency possible than what is generally available through e.g. the TCP/IP communication protocol.

<span class="mw-page-title-main">Storage area network</span> Network which provides access to consolidated, block-level data storage

A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN).

iSCSI conformance testing is testing to determine whether an iSCSI Initiator/Target meets the iSCSI standard.

<span class="mw-page-title-main">LIO (SCSI target)</span> Open-source version of SCSI target

In computing, Linux-IO (LIO) Target is an open-source implementation of the SCSI target that has become the standard one included in the Linux kernel. Internally, LIO does not initiate sessions, but instead provides one or more Logical Unit Numbers (LUNs), waits for SCSI commands from a SCSI initiator, and performs required input/output data transfers. LIO supports common storage fabrics, including FCoE, Fibre Channel, IEEE 1394, iSCSI, iSCSI Extensions for RDMA (iSER), SCSI RDMA Protocol (SRP) and USB. It is included in most Linux distributions; native support for LIO in QEMU/KVM, libvirt, and OpenStack makes LIO also a storage option for cloud deployments.

RDMA over Converged Ethernet (RoCE) or InfiniBand over Ethernet (IBoE) is a network protocol which allows remote direct memory access (RDMA) over an Ethernet network. It does this by encapsulating an InfiniBand (IB) transport packet over Ethernet. There are multiple RoCE versions. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.

References

  1. "Open Consortium Developing Specifications for Remote Direct Memory Access Over TCP/IP Networks" (PDF). press release. May 31, 2002. Retrieved May 5, 2011.
  2. Mike Ko; et al. (July 2003). "iSCSI Extensions for RDMA Specification (Version 1.0)" (PDF). Retrieved May 5, 2011.
  3. M. Ko; et al. (October 2007). "Internet Small Computer System Interface (iSCSI) Extensions for Remote Direct Memory Access (RDMA)". RFC 5046 .
  4. M. Chadalapaka; et al. (October 2007). "DA: Datamover Architecture for the Internet Small Computer System Interface (iSCSI)". RFC 5047 .
  5. "OpenFabrics Alliance". official web site. Retrieved May 4, 2011.

Further reading