Explicit Congestion Notification

Last updated

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.

Contents

Conventionally, TCP/IP networks signal congestion by dropping packets. When ECN is successfully negotiated, an ECN-aware router may set a mark in the IP header instead of dropping a packet in order to signal impending congestion. The receiver of the packet echoes the congestion indication to the sender, which reduces its transmission rate as if it detected a dropped packet.

Rather than responding properly or ignoring the bits, some outdated or faulty network equipment has historically dropped or mangled packets that have ECN bits set. [1] [2] [3] As of 2015, measurements suggested that the fraction of web servers on the public Internet for which setting ECN prevents network connections had been reduced to less than 1%. [4]

Passive support has existed in Ubuntu Linux since 12.04 and in Windows Server since 2012. [5] Passive support in the most popular websites has increased from 8.5% in 2012 to over 70% in May 2017. [5] Adoption across the Internet now requires clients to actively request ECN. In June 2015, Apple announced that ECN will be enabled by default on its supported and future products, to help drive the adoption of ECN signaling industry-wide. [6]

Operation

ECN requires specific support at both the Internet layer and the transport layer for the following reasons:

Without ECN, congestion indication echo is achieved indirectly by the detection of lost packets. With ECN, the congestion is indicated by setting the ECN field within an IP packet to CE (Congestion Experienced) and is echoed back by the receiver to the transmitter by setting proper bits in the header of the transport protocol. For example, when using TCP, the congestion indication is echoed back by setting the ECE bit.

Operation of ECN with IP

ECN uses the two least significant (right-most) bits of the Traffic Class field in the IPv4 or IPv6 header to encode four different code points:

When both endpoints support ECN they mark their packets with ECT(0) or ECT(1). Routers treat the ECT(0) and ECT(1) codepoints as equivalent. If the packet traverses an active queue management (AQM) queue (e.g., a queue that uses random early detection (RED)) that is experiencing congestion and the corresponding router supports ECN, it may change the code point to CE instead of dropping the packet. This act is referred to as "marking" and its purpose is to inform the receiving endpoint of impending congestion. At the receiving endpoint, this congestion indication is handled by the upper layer protocol (transport layer protocol) and needs to be echoed back to the transmitting node in order to signal it to reduce its transmission rate.

Because the CE indication can only be handled effectively by an upper layer protocol that supports it, ECN is only used in conjunction with upper layer protocols, such as TCP, that support congestion control and have a method for echoing the CE indication to the transmitting endpoint.

Operation of ECN with TCP

TCP supports ECN using two flags in the TCP header. The first, ECN-Echo (ECE) is used to echo back the congestion indication (i.e., signal the sender to reduce the transmission rate). The second, Congestion Window Reduced (CWR), to acknowledge that the congestion-indication echoing was received. Use of ECN on a TCP connection is optional; for ECN to be used, it must be negotiated at connection establishment by including suitable options in the SYN and SYN-ACK segments.

When ECN has been negotiated on a TCP connection, the sender indicates that IP packets that carry TCP segments of that connection are carrying traffic from an ECN Capable Transport by marking them with an ECT code point. This allows intermediate routers that support ECN to mark those IP packets with the CE code point instead of dropping them in order to signal impending congestion.

Upon receiving an IP packet with the Congestion Experienced code point, the TCP receiver echoes back this congestion indication using the ECE flag in the TCP header. When an endpoint receives a TCP segment with the ECE bit it reduces its congestion window as for a packet drop. It then acknowledges the congestion indication by sending a segment with the CWR bit set.

A node keeps transmitting TCP segments with the ECE bit set until it receives a segment with the CWR bit set.

To see affected packets with tcpdump, use the filter predicate (tcp[13] & 0xc0 != 0).

ECN and TCP control packets

Since the Transmission Control Protocol (TCP) does not perform congestion control on control packets (pure ACKs, SYN, FIN segments), control packets are usually not marked as ECN-capable.

A 2009 proposal [7] suggests marking SYN-ACK packets as ECN-capable. This improvement, known as ECN+, has been shown to provide dramatic improvements to performance of short-lived TCP connections. [8]

Operation of ECN with other transport protocols

ECN is also defined for other transport layer protocols that perform congestion control, notably DCCP and Stream Control Transmission Protocol (SCTP). The general principle is similar to TCP, although the details of the on-the-wire encoding differ.

It is possible to use ECN with protocols layered above UDP. However, UDP requires that congestion control be performed by the application, and early UDP based protocols such as DNS did not use ECN. More recent UDP based protocols such as QUIC are using ECN for congestion control.

Effects on performance

Since ECN is only effective in combination with an Active Queue Management (AQM) policy, the benefits of ECN depend on the precise AQM being used. A few observations, however, appear to hold across different AQMs.

As expected, ECN reduces the number of packets dropped by a TCP connection, which, by avoiding a retransmission, reduces latency and especially jitter. This effect is most drastic when the TCP connection has a single outstanding segment, [9] when it is able to avoid an RTO timeout; this is often the case for interactive connections, such as remote logins, and transactional protocols, such as HTTP requests, the conversational phase of SMTP, or SQL requests.

Effects of ECN on bulk throughput are less clear [10] because modern TCP implementations are fairly good at resending dropped segments in a timely manner when the sender's window is large.

Use of ECN has been found to be detrimental to performance on highly congested networks when using AQM algorithms that never drop packets. [8] Modern AQM implementations avoid this pitfall by dropping rather than marking packets at very high load.

Implementations

Many modern implementations of the TCP/IP protocol suite have some support for ECN; however, they usually ship with ECN disabled.

ECN support in TCP by hosts

Microsoft Windows

Windows versions since Windows Server 2008 and Windows Vista support ECN for TCP. [11] Since Windows Server 2012, it is enabled by default in Windows Server versions, because Data Center Transmission Control Protocol (DCTCP) is used. [12] In previous Windows versions and non-server versions it is disabled by default.

ECN support can be enabled using a shell command such as netsh interface tcp set global ecncapability=enabled.

BSD

On FreeBSD, ECN for TCP can be configured using the net.inet.tcp.ecn.enable sysctl. By default, it is enabled only for incoming connections that request it. It can also be enabled for all connections or disabled entirely. [13]

NetBSD  4.0 implements ECN support for TCP; it can be activated through the sysctl interface by setting 1 as value for the sysctl net.inet.tcp.ecn.enable parameter. [14]

Likewise, the sysctl net.inet.tcp.ecn can be used in OpenBSD. [15]

Linux

Since version 2.4.20 of the Linux kernel, released in November 2002, [16] Linux supports three working modes of the ECN for TCP, as configured through the sysctl interface by setting parameter /proc/sys/net/ipv4/tcp_ecn to one of the following values: [17]

  • 0  disable ECN and neither initiate nor accept it
  • 1  enable ECN when requested by incoming connections, and also request ECN on outgoing connection attempts
  • 2  (default) enable ECN when requested by incoming connections, but do not request ECN on outgoing connections

Beginning with version 4.1 of the Linux kernel, released in June 2015, the tcp_ecn_fallback mechanism, as specified in RFC 3168 section 6.1.1.1, [18] is enabled by default [19] when ECN is enabled (the value of 1). The fallback mechanism attempts ECN connectivity in the initial setup of outgoing connections, with a graceful fallback for transmissions without ECN capability, mitigating issues with ECN-intolerant hosts or firewalls.

Mac OS X

Mac OS X 10.5 and 10.6 implement ECN support for TCP. It is controlled using the boolean sysctl variables net.inet.tcp.ecn_negotiate_in and net.inet.tcp.ecn_initiate_out. [20] The first variable enables ECN on incoming connections that already have ECN flags set; the second one tries to initiate outgoing connections with ECN enabled. Both variables default to 0, but can be set to 1 to enable the respective behavior.

In June 2015, Apple Inc. announced that OS X 10.11 would have ECN turned on by default, [6] but the OS shipped without that default behavior. In macOS Sierra, ECN is enabled for half of TCP sessions. [21]

iOS

In June 2015, Apple Inc. announced that iOS 9, its next version of iOS, would support ECN and have it turned on by default. [6] TCP ECN negotiation is enabled on 5% of randomly selected connections over Wi-Fi / Ethernet in iOS 9 and 50% of randomly selected connections over Wi-Fi / Ethernet and a few cellular carriers in iOS 10 [22] [23] and 100% for iOS 11 [24]

Solaris

The Solaris kernel supports three states of ECN for TCP: [25]

  • never  no ECN
  • active  use ECN
  • passive  only advertise ECN support when asked for.

As of Solaris 11.4, the default behavior is active. ECN usage can be modified via ipadm set-prop -p ecn=active tcp. [26]

ECN support in IP by routers

Since ECN marking in routers is dependent on some form of active queue management, routers must be configured with a suitable queue discipline in order to perform ECN marking.

Cisco IOS routers perform ECN marking if configured with the WRED queuing discipline since version 12.2(8)T.

Linux routers perform ECN marking if configured with one of the RED or GRED queue disciplines with an explicit ecn parameter, by using the sfb discipline, by using the CoDel Fair Queuing (fq_codel) discipline, or the CAKE [27] queuing discipline.

Modern BSD implementations, such as FreeBSD, NetBSD and OpenBSD, have support for ECN marking in the ALTQ queueing implementation for a number of queuing disciplines, notably RED and Blue. FreeBSD 11 included CoDel, PIE, FQ-CoDel and FQ-PIE queuing disciplines implementation in ipfw/dummynet framework with ECN marking capability. [28]

Data Center TCP

Data Center Transmission Control Protocol (Data Center TCP or DCTCP) utilizes ECN to enhance the Transmission Control Protocol congestion control algorithm. [29] It is used in data center networks. Whereas the standard TCP congestion control algorithm is only able to detect the presence of congestion, DCTCP, using ECN, is able to gauge the extent of congestion. [30]

DCTCP modifies the TCP receiver to always relay the exact ECN marking of incoming packets at the cost of ignoring a function that is meant to preserve signalling reliability. This makes a DCTCP sender vulnerable to loss of ACKs from the receiver, which it has no mechanism to detect or cope with. [31] As of July 2014, algorithms that provide equivalent or better receiver feedback in a more reliable approach are an active research topic. [31]

See also

Related Research Articles

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport Layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

In computer networking, Layer 2 Tunneling Protocol (L2TP) is a tunneling protocol used to support virtual private networks (VPNs) or as part of the delivery of services by ISPs. It uses encryption ('hiding') only for its own control messages, and does not provide any encryption or confidentiality of content by itself. Rather, it provides a tunnel for Layer 2, and the tunnel itself may be passed over a Layer 3 encryption protocol such as IPsec.

Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput.

In computer networking, the Datagram Congestion Control Protocol (DCCP) is a message-oriented transport layer protocol. DCCP implements reliable connection setup, teardown, Explicit Congestion Notification (ECN), congestion control, and feature negotiation. The IETF published DCCP as RFC 4340, a proposed standard, in March 2006. RFC 4336 provides an introduction.

Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and congestion window (CWND), to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

Nagle's algorithm is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. It was defined by John Nagle while working for Ford Aerospace. It was published in 1984 as a Request for Comments (RFC) with title Congestion Control in IP/TCP Internetworks in RFC 896.

The type of service (ToS) field is the second byte of the IPv4 header. It has had various purposes over the years, and has been defined in different ways by five RFCs.

TCP tuning techniques adjust the network congestion avoidance parameters of Transmission Control Protocol (TCP) connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases. However, blindly following instructions without understanding their real consequences can hurt performance as well.

In computer networking, a host model is an option of designing the TCP/IP stack of a networking operating system like Microsoft Windows or Linux. When a unicast packet arrives at a host, IP must determine whether the packet is locally destined. If the IP stack is implemented with a weak host model, it accepts any locally destined packet regardless of the network interface on which the packet was received. If the IP stack is implemented with a strong host model, it only accepts locally destined packets if the destination IP address in the packet matches an IP address assigned to the network interface on which the packet was received.

Bandwidth management is the process of measuring and controlling the communications on a network link, to avoid filling the link to capacity or overfilling the link, which would result in network congestion and poor performance of the network. Bandwidth is described by bit rate and measured in units of bits per second (bit/s) or bytes per second (B/s).

Compound TCP (CTCP) is a Microsoft algorithm that was introduced as part of the Windows Vista and Window Server 2008 TCP stack. It is designed to aggressively adjust the sender's congestion window to optimise TCP for connections with large bandwidth-delay products while trying not to harm fairness. It is also available for Linux, as well as for Windows XP and Windows Server 2003 via a hotfix.

In routers and switches, active queue management (AQM) is the policy of dropping packets inside a buffer associated with a network interface controller (NIC) before that buffer becomes full, often with the goal of reducing network congestion or improving end-to-end latency. This task is performed by the network scheduler, which for this purpose uses various algorithms such as random early detection (RED), Explicit Congestion Notification (ECN), or controlled delay (CoDel). RFC 7567 recommends active queue management as a best practice.

The TCP window scale option is an option to increase the receive window size allowed in Transmission Control Protocol above its former maximum value of 65,535 bytes. This TCP option, along with several others, is defined in RFC 7323 which deals with long fat networks (LFNs).

In computing, Microsoft's Windows Vista and Windows Server 2008 introduced in 2007/2008 a new networking stack named Next Generation TCP/IP stack, to improve on the previous stack in several ways. The stack includes native implementation of IPv6, as well as a complete overhaul of IPv4. The new TCP/IP stack uses a new method to store configuration settings that enables more dynamic control and does not require a computer restart after a change in settings. The new stack, implemented as a dual-stack model, depends on a strong host-model and features an infrastructure to enable more modular components that one can dynamically insert and remove.

Bufferbloat is a cause of high latency and jitter in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation, as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), audio streaming, online gaming, and even ordinary web browsing.

The Stream Control Transmission Protocol (SCTP) is a computer networking communications protocol in the transport layer of the Internet protocol suite. Originally intended for Signaling System 7 (SS7) message transport in telecommunication, the protocol provides the message-oriented feature of the User Datagram Protocol (UDP), while ensuring reliable, in-sequence transport of messages with congestion control like the Transmission Control Protocol (TCP). Unlike UDP and TCP, the protocol supports multihoming and redundant paths to increase resilience and reliability.

RDMA over Converged Ethernet (RoCE) or InfiniBand over Ethernet (IBoE) is a network protocol that allows remote direct memory access (RDMA) over an Ethernet network. It does this by encapsulating an InfiniBand (IB) transport packet over Ethernet. There are two RoCE versions, RoCE v1 and RoCE v2. RoCE v1 is an Ethernet link layer protocol and hence allows communication between any two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer protocol which means that RoCE v2 packets can be routed. Although the RoCE protocol benefits from the characteristics of a converged Ethernet network, the protocol can also be used on a traditional or non-converged Ethernet network.

In computer networking, TCP Fast Open (TFO) is an extension to speed up the opening of successive Transmission Control Protocol (TCP) connections between two endpoints. It works by using a TFO cookie, which is a cryptographic cookie stored on the client and set upon the initial connection with the server. When the client later reconnects, it sends the initial SYN packet along with the TFO cookie data to authenticate itself. If successful, the server may start sending data to the client even before the reception of the final ACK packet of the three-way handshake, thus skipping a round-trip delay and lowering the latency in the start of data transmission.

CoDel is an active queue management (AQM) algorithm in network routing, developed by Van Jacobson and Kathleen Nichols and published as RFC8289. It is designed to overcome bufferbloat in networking hardware, such as routers, by setting limits on the delay network packets experience as they pass through buffers in this equipment. CoDel aims to improve on the overall performance of the random early detection (RED) algorithm by addressing some of its fundamental misconceptions, as perceived by Jacobson, and by being easier to manage.

<span class="mw-page-title-main">Network scheduler</span> Arbiter on a node in packet switching communication network

A network scheduler, also called packet scheduler, queueing discipline (qdisc) or queueing algorithm, is an arbiter on a node in a packet switching communication network. It manages the sequence of network packets in the transmit and receive queues of the protocol stack and network interface controller. There are several network schedulers available for the different operating systems, that implement many of the existing network scheduling algorithms.

References

  1. Steven Bauer; Robert Beverly; Arthur Berger (2011). "Measuring the State of ECN Readiness in Servers, Clients, and Routers" (PDF). Internet Measurement Conference 2011. Archived (PDF) from the original on 2014-03-22.
  2. Alberto Medina; Mark Allman; Sally Floyd. "Measuring Interactions Between Transport Protocols and Middleboxes" (PDF). Internet Measurement Conference 2004. Archived (PDF) from the original on 2016-03-04.
  3. "TBIT, the TCP Behavior Inference Tool: ECN". Icir.org. Archived from the original on 2013-03-11. Retrieved 2014-03-22.
  4. Brian Trammell; Mirja Kühlewind; Damiano Boppart; Iain Learmonth; Gorry Fairhurst; Richard Scheffenegger (2015). "Enabling Internet-Wide Deployment of Explicit Congestion Notification" (PDF). Proceedings of the Passive and Active Measurement Conference 2015. Archived from the original (PDF) on 15 June 2015. Retrieved 14 June 2015.
  5. 1 2 David Murray; Terry Koziniec; Sebastian Zander; Michael Dixon; Polychronis Koutsakis (2017). "An Analysis of Changing Enterprise Network Traffic Characteristics" (PDF). The 23rd Asia-Pacific Conference on Communications (APCC 2017). Archived (PDF) from the original on 3 October 2017. Retrieved 3 October 2017.
  6. 1 2 3 "Your App and Next Generation Networks". Apple Inc. 2015. Archived from the original on 2015-06-15.
  7. Kuzmanovic, A.; Mondal, A.; Floyd, S.; Ramakrishnan, K. (June 2009). Adding Explicit Congestion Notification Capability to TCP's SYN/ACK Packets. doi: 10.17487/RFC5562 . RFC 5562.
  8. 1 2 Aleksandar Kuzmanovic. The power of explicit congestion notification. In Proceedings of the 2005 conference on Applications, technologies, architectures, and protocols for computer communications. 2005.
  9. Jamal Hadi Salim and Uvaiz Ahmed. Performance Evaluation of Explicit Congestion Notification (ECN) in IP Networks. RFC  2884. July 2000
  10. Marek Malowidzki, Simulation-based Study of ECN Performance in RED Networks, In Proc. SPECTS'03. 2003.
  11. "New Networking Features in Windows Server 2008 and Windows Vista". Archived from the original on 2010-01-15.
  12. "Data Center Transmission Control Protocol (DCTCP) (Windows Server 2012)". Archived from the original on 2017-08-26.
  13. "tcp(4) - Internet Transmission Control Protocol". FreeBSD Kernel Interfaces Manual. Retrieved 3 April 2020.
  14. "Announcing NetBSD 4.0". 2007-12-19. Archived from the original on 2014-10-31. Retrieved 2014-10-13.
  15. Michael Lucas (2013). Absolute OpenBSD: UNIX for the Practical Paranoid. No Starch Press. ISBN   9781593274764 . Retrieved 2014-03-22.
  16. "A Map of the Networking Code in Linux Kernel 2.4.20, Technical Report DataTAG-2004-1, FP5/IST DataTAG Project" (PDF). datatag.web.cern.ch. March 2004. Archived (PDF) from the original on 27 October 2015. Retrieved 1 September 2015.
  17. "Documentation/networking/ip-sysctl.txt: /proc/sys/net/ipv4/* Variables". kernel.org. Archived from the original on 2016-03-05. Retrieved 2016-02-15.
  18. The Addition of Explicit Congestion Notification (ECN) to IP. September 2001. doi: 10.17487/RFC3168 . RFC 3168 . Retrieved 2016-02-15.
  19. "Linux man pages". man7.org. 2015-12-05. Archived from the original on 2016-02-16. Retrieved 2016-02-15.
  20. "ECN (Explicit Congestion Notification) in TCP/IP". Archived from the original on 2012-06-19.
  21. "macOS 10.12 Sierra: The Ars Technica review". Ars Technica. 20 September 2016. Archived from the original on 26 April 2018. Retrieved 25 April 2018.
  22. Inc., Apple. "Networking for the Modern Internet - WWDC 2016 - Videos - Apple Developer". Apple Developer. Archived from the original on 18 April 2018. Retrieved 18 April 2018.{{cite web}}: |last= has generic name (help)
  23. Bhooma, Padma (March 2017). "TCP ECN — Experience with enabling ECN on the Internet" (PDF). Archived (PDF) from the original on 2018-05-09. Retrieved 2017-05-03.
  24. Inc., Apple. "Advances in Networking, Part 1 - WWDC 2017 - Videos - Apple Developer". Apple Developer. Archived from the original on 31 January 2018. Retrieved 18 April 2018.{{cite web}}: |last= has generic name (help)
  25. "ipadm(8)". Oracle Solaris 11.4 Information Library. Oracle. Retrieved 6 May 2021.
  26. "Administering TCP/IP Networks, IPMP, and IP Tunnels in Oracle® Solaris 11.4, Using the TCP ECN Feature". Oracle Solaris 11.4 Information Library. Oracle. Retrieved 6 May 2021.
  27. Høiland-Jørgensen, Toke; Täht, Dave; Morton, Jonathan (2018). "Piece of CAKE: A Comprehensive Queue Management Solution for Home Gateways". arXiv: 1804.07617v2 [cs.NI].
  28. "Import Dummynet AQM version 0.2.1 (CoDel, FQ-CoDel, PIE and FQ-PIE) to FreeBSD 11". The FreeBSD Project, FreeBSD r300779. Retrieved 5 August 2016.
  29. "Data Center TCP (DCTCP)". Archived from the original on 2014-10-31. Retrieved March 7, 2023.
  30. Data Center TCP (DCTCP): TCP Congestion Control for Data Centers. doi: 10.17487/RFC8257 . RFC 8257 . Retrieved August 21, 2021.
  31. 1 2 Problem Statement and Requirements for Increased Accuracy in Explicit Congestion Notification (ECN) Feedback. August 26, 2015. doi: 10.17487/RFC7560 . RFC 7560 . Retrieved August 21, 2021.