Ethernet flow control

Last updated
Wireshark screenshot of an Ethernet pause frame EthernetPauseFrame.jpg
Wireshark screenshot of an Ethernet pause frame

Ethernet flow control is a mechanism for temporarily stopping the transmission of data on Ethernet family computer networks. The goal of this mechanism is to avoid packet loss in the presence of network congestion.

Contents

The first flow control mechanism, the pause frame, was defined by the IEEE 802.3x standard. The follow-on priority-based flow control, as defined in the IEEE 802.1Qbb standard, provides a link-level flow control mechanism that can be controlled independently for each class of service (CoS), as defined by IEEE P802.1p and is applicable to data center bridging (DCB) networks, and to allow for prioritization of voice over IP (VoIP), video over IP, and database synchronization traffic over default data traffic and bulk file transfers.

Description

A sending station (computer or network switch) may be transmitting data faster than the other end of the link can accept it. Using flow control, the receiving station can signal the sender requesting suspension of transmissions until the receiver catches up. Flow control on Ethernet can be implemented at the data link layer.

The first flow control mechanism, the pause frame, was defined by the Institute of Electrical and Electronics Engineers (IEEE) task force that defined full duplex Ethernet link segments. The IEEE standard 802.3x was issued in 1997. [1]

Pause frame

An overwhelmed network node can send a pause frame, which halts the transmission of the sender for a specified period of time. A media access control (MAC) frame (EtherType 0x8808) is used to carry the pause command, with the Control opcode set to 0x0001 (hexadecimal). [1] Only stations configured for full-duplex operation may send pause frames. When a station wishes to pause the other end of a link, it sends a pause frame to either the unique 48-bit destination address of this link or to the 48-bit reserved multicast address of 01-80-C2-00-00-01. [2] :Annex 31B.3.3 The use of a well-known address makes it unnecessary for a station to discover and store the address of the station at the other end of the link.

Another advantage of using this multicast address arises from the use of flow control between network switches. The particular multicast address used is selected from a range of address which have been reserved by the IEEE 802.1D standard which specifies the operation of switches used for bridging. Normally, a frame with a multicast destination sent to a switch will be forwarded out to all other ports of the switch. However, this range of multicast address is special and will not be forwarded by an 802.1D-compliant switch. Instead, frames sent to this range are understood to be frames meant to be acted upon only within the switch.

A pause frame includes the period of pause time being requested, in the form of a two-byte (16-bit), unsigned integer (0 through 65535). This number is the requested duration of the pause. The pause time is measured in units of pause quanta, where each quanta is equal to 512 bit times.

By 1999, several vendors supported receiving pause frames, but fewer implemented sending them. [3] [4]

Issues

One original motivation for the pause frame was to handle network interface controllers (NICs) that did not have enough buffering to handle full-speed reception. This problem is not as common with advances in bus speeds and memory sizes. A more likely scenario is network congestion within a switch. For example, a flow can come into a switch on a higher speed link than the one it goes out, or several flows can come in over two or more links that total more than an output link's bandwidth. These will eventually exhaust any amount of buffering in the switch. However, blocking the sending link will cause all flows over that link to be delayed, even those that are not causing any congestion. This situation is a case of head-of-line (HOL) blocking, and can happen more often in core network switches due to the large numbers of flows generally being aggregated. Many switches use a technique called virtual output queues to eliminate the HOL blocking internally, so will never send pause frames. [4]

Subsequent efforts

Congestion management

Another effort began in March 2004, and in May 2004 it became the IEEE P802.3ar Congestion Management Task Force. In May 2006, the objectives of the task force were revised to specify a mechanism to limit the transmitted data rate at about 1% granularity. The request was withdrawn and the task force was disbanded in 2008. [5]

Priority flow control

Ethernet flow control disturbs the Ethernet class of service (defined in IEEE 802.1p), as the data of all priorities are stopped to clear the existing buffers which might also consist of low-priority data. As a remedy to this problem, Cisco Systems defined their own priority flow control extension to the standard protocol. This mechanism uses 14 bytes of the 42-byte padding in a regular pause frame. The MAC control opcode for a Priority pause frame is 0x0101. Unlike the original pause, Priority pause indicates the pause time in quanta for each of eight priority classes separately. [6] The extension was subsequently standardized by the Priority-based Flow Control (PFC) project authorized on March 27, 2008, as IEEE 802.1Qbb. [7] Draft 2.3 was proposed on June 7, 2010. Claudio DeSanti of Cisco was editor. [8] The effort was part of the data center bridging task group, which developed Fibre Channel over Ethernet. [9]

See also

Related Research Articles

IEEE 802.2 is the original name of the ISO/IEC 8802-2 standard which defines logical link control (LLC) as the upper portion of the data link layer of the OSI Model. The original standard developed by the Institute of Electrical and Electronics Engineers (IEEE) in collaboration with the American National Standards Institute (ANSI) was adopted by the International Organization for Standardization (ISO) in 1998, but it remains an integral part of the family of IEEE 802 standards for local and metropolitan networks.

<span class="mw-page-title-main">Ethernet over twisted pair</span> Ethernet physical layers using twisted-pair cables

Ethernet over twisted-pair technologies use twisted-pair cables for the physical layer of an Ethernet computer network. They are a subset of all Ethernet physical layers.

The Spanning Tree Protocol (STP) is a network protocol that builds a loop-free logical topology for Ethernet networks. The basic function of STP is to prevent bridge loops and the broadcast radiation that results from them. Spanning tree also allows a network design to include backup links providing fault tolerance if an active link fails.

IEEE 802.1 is a working group of the IEEE 802 project of the IEEE Standards Association.

<span class="mw-page-title-main">Medium access control</span> Service layer in IEEE 802 network standards

In IEEE 802 LAN/MAN standards, the medium access control (MAC), also called media access control, is the layer that controls the hardware responsible for interaction with the wired or wireless transmission medium. The MAC sublayer and the logical link control (LLC) sublayer together make up the data link layer. The LLC provides flow control and multiplexing for the logical link, while the MAC provides flow control and multiplexing for the transmission medium.

IEEE 802.1Q, often referred to as Dot1q, is the networking standard that supports virtual local area networking (VLANs) on an IEEE 802.3 Ethernet network. The standard defines a system of VLAN tagging for Ethernet frames and the accompanying procedures to be used by bridges and switches in handling such frames. The standard also contains provisions for a quality-of-service prioritization scheme commonly known as IEEE 802.1p and defines the Generic Attribute Registration Protocol.

<span class="mw-page-title-main">Link aggregation</span> Using multiple network connections in parallel to increase capacity and reliability

In computer networking, link aggregation is the combining of multiple network connections in parallel by any of several methods. Link aggregation increases total throughput beyond what a single connection could sustain, and provides redundancy where all but one of the physical links may fail without losing connectivity. A link aggregation group (LAG) is the combined collection of physical ports.

IEEE 802.11w-2009 is an approved amendment to the IEEE 802.11 standard to increase the security of its management frames.

IEEE 802.1ag is an amendment to the IEEE 802.1Q networking standard which introduces Connectivity Fault Management (CFM). This defines protocols and practices for the operations, administration, and maintenance (OAM) of paths through 802.1 bridges and local area networks (LANs). The final version was approved by the IEEE in 2007.

The Link Layer Discovery Protocol (LLDP) is a vendor-neutral link layer protocol used by network devices for advertising their identity, capabilities, and neighbors on a local area network based on IEEE 802 technology, principally wired Ethernet. The protocol is formally referred to by the IEEE as Station and Media Access Control Connectivity Discovery specified in IEEE 802.1AB with additional support in IEEE 802.3 section 6 clause 79.

IEEE P802.1p was a task group active from 1995 to 1998, responsible for adding traffic class expediting and dynamic multicast filtering to the IEEE 802.1D standard. The task group developed a mechanism for implementing quality of service (QoS) at the media access control (MAC) level. Although this technique is commonly referred to as IEEE 802.1p, the group's work with the new priority classes and Generic Attribute Registration Protocol (GARP) was not published separately but was incorporated into a major revision of the standard, IEEE 802.1D-1998, which subsequently was incorporated into IEEE 802.1Q-2014 standard. The work also required a short amendment extending the frame size of the Ethernet standard by four bytes which was published as IEEE 802.3ac in 1998.

<span class="mw-page-title-main">Token Ring</span> Technology for computer networking

Token Ring is a physical and data link layer computer networking technology used to build local area networks. It was introduced by IBM in 1984, and standardized in 1989 as IEEE 802.5. It uses a special three-byte frame called a token that is passed around a logical ring of workstations or servers. This token passing is a channel access method providing fair access for all stations, and eliminating the collisions of contention-based access methods.

Carrier Ethernet is a marketing term for extensions to Ethernet for communications service providers that utilize Ethernet technology in their networks.

Stream Reservation Protocol (SRP) is an enhancement to Ethernet that implements admission control. In September 2010 SRP was standardized as IEEE 802.1Qat which has subsequently been incorporated into IEEE 802.1Q-2011. SRP defines the concept of streams at layer 2 of the OSI model. Also provided is a mechanism for end-to-end management of the streams' resources, to guarantee quality of service (QoS).

<span class="mw-page-title-main">Fibre Channel over Ethernet</span> Computer network technology

Fibre Channel over Ethernet (FCoE) is a computer network technology that encapsulates Fibre Channel frames over Ethernet networks. This allows Fibre Channel to use 10 Gigabit Ethernet networks while preserving the Fibre Channel protocol. The specification was part of the International Committee for Information Technology Standards T11 FC-BB-5 standard published in 2009. FCoE did not see widespread adoption.

Data center bridging (DCB) is a set of enhancements to the Ethernet local area network communication protocol for use in data center environments, in particular for use with clustering and storage area networks.

IEEE 802.1aq is an amendment to the IEEE 802.1Q networking standard which adds support for Shortest Path Bridging (SPB). This technology is intended to simplify the creation and configuration of Ethernet networks while enabling multipath routing.

TRILL is a networking protocol for optimizing bandwidth and resilience in Ethernet networks, implemented by devices called TRILL switches. TRILL combines techniques from bridging and routing, and is the application of link-state routing to the VLAN-aware customer-bridging problem. Routing bridges (RBridges) are compatible with, and can incrementally replace, previous IEEE 802.1 customer bridges. TRILL Switches are also compatible with IPv4 and IPv6, routers and end systems. They are invisible to current IP routers, and like conventional routers, RBridges terminate the broadcast, unknown-unicast and multicast traffic of DIX Ethernet and the frames of IEEE 802.2 LLC including the bridge protocol data units of the Spanning Tree Protocol.

RDMA over Converged Ethernet (RoCE) is a network protocol which allows remote direct memory access (RDMA) over an Ethernet network. There are multiple RoCE versions. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of the standardization group. The standards define mechanisms for the time-sensitive transmission of data over deterministic Ethernet networks.

References

  1. 1 2 IEEE Standards for Local and Metropolitan Area Networks: Supplements to Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer Specifications - Specification for 802.3 Full Duplex Operation and Physical Layer Specification for 100 Mb/S Operation on Two Pairs of Category 3 or Better Balanced Twisted Pair Cable (100BASE-T2). Institute of Electrical and Electronics Engineers. 1997. doi:10.1109/IEEESTD.1997.95611. ISBN   978-1-55937-905-2. Archived from the original on July 13, 2012.
  2. IEEE Standard for Ethernet (PDF). IEEE Standards Association. 2018-08-31. doi:10.1109/IEEESTD.2018.8457469. ISBN   978-1-5044-5090-4 . Retrieved 2022-11-29.{{cite book}}: |website= ignored (help)[ dead link ]
  3. Ann Sullivan; Greg Kilmartin; Scott Hamilton (September 13, 1999). "Switch Vendors pass interoperability tests". Network World. pp. 81–82. Retrieved May 10, 2011.
  4. 1 2 "Vendors on flow control". Network World Fusion. September 13, 1999. Archived from the original on 2012-02-07. Vendor comments on flow control in the 1999 test.
  5. "IEEE P802.3ar Congestion Management Task Force". December 18, 2008. Retrieved May 10, 2011.
  6. "Priority Flow Control: Build Reliable Layer 2 Infrastructure" (PDF). White Paper. Cisco Systems. June 2009. Retrieved May 10, 2011.
  7. IEEE 802.1Qbb
  8. "IEEE 802.1Q Priority-based Flow Control". Institute of Electrical and Electronics Engineers. June 7, 2010. Retrieved May 10, 2011.
  9. "Data Center Bridging Task Group". Institute of Electrical and Electronics Engineers. June 7, 2010. Retrieved May 10, 2011.