Network congestion

Last updated

Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput. [1]

Contents

Network protocols that use aggressive retransmissions to compensate for packet loss due to congestion can increase congestion, even after the initial load has been reduced to a level that would not normally have induced network congestion. Such networks exhibit two stable states under the same level of load. The stable state with low throughput is known as congestive collapse.

Networks use congestion control and congestion avoidance techniques to try to avoid collapse. These include: exponential backoff in protocols such as CSMA/CA in 802.11 and the similar CSMA/CD in the original Ethernet, window reduction in TCP, and fair queueing in devices such as routers and network switches. Other techniques that address congestion include priority schemes which transmit some packets with higher priority ahead of others and the explicit allocation of network resources to specific flows through the use of admission control.

Network capacity

Network resources are limited, including router processing time and link throughput. Resource contention may occur on networks in several common circumstances. A wireless LAN is easily filled by a single personal computer. [2] Even on fast computer networks, the backbone can easily be congested by a few servers and client PCs. Denial-of-service attacks by botnets are capable of filling even the largest Internet backbone network links, generating large-scale network congestion. In telephone networks, a mass call event can overwhelm digital telephone circuits, in what can otherwise be defined as a denial-of-service attack.

Congestive collapse

Congestive collapse (or congestion collapse) is the condition in which congestion prevents or limits useful communication. Congestion collapse generally occurs at choke points in the network, where incoming traffic exceeds outgoing bandwidth. Connection points between a local area network and a wide area network are common choke points. When a network is in this condition, it settles into a stable state where traffic demand is high but little useful throughput is available, during which packet delay and loss occur and quality of service is extremely poor.

Congestive collapse was identified as a possible problem by 1984. [3] It was first observed on the early Internet in October 1986, [4] when the NSFNET phase-I backbone dropped three orders of magnitude from its capacity of 32 kbit/s to 40 bit/s, [5] which continued until end nodes started implementing Van Jacobson and Sally Floyd's congestion control between 1987 and 1988. [6] When more packets were sent than could be handled by intermediate routers, the intermediate routers discarded many packets, expecting the endpoints of the network to retransmit the information. However, early TCP implementations had poor retransmission behavior. When this packet loss occurred, the endpoints sent extra packets that repeated the information lost, doubling the incoming rate.

Congestion control

Congestion control modulates traffic entry into a telecommunications network in order to avoid congestive collapse resulting from oversubscription. [7] This is typically accomplished by reducing the rate of packets. Whereas congestion control prevents senders from overwhelming the network, flow control prevents the sender from overwhelming the receiver.

Theory of congestion control

The theory of congestion control was pioneered by Frank Kelly, who applied microeconomic theory and convex optimization theory to describe how individuals controlling their own rates can interact to achieve an optimal network-wide rate allocation. Examples of optimal rate allocation are max-min fair allocation and Kelly's suggestion of proportionally fair allocation, although many others are possible.

Let be the rate of flow , be the capacity of link , and be 1 if flow uses link and 0 otherwise. Let , and be the corresponding vectors and matrix. Let be an increasing, strictly concave function, called the utility, which measures how much benefit a user obtains by transmitting at rate . The optimal rate allocation then satisfies

such that

The Lagrange dual of this problem decouples so that each flow sets its own rate, based only on a price signaled by the network. Each link capacity imposes a constraint, which gives rise to a Lagrange multiplier, . The sum of these multipliers, is the price to which the flow responds.

Congestion control then becomes a distributed optimization algorithm. Many current congestion control algorithms can be modeled in this framework, with being either the loss probability or the queueing delay at link . A major weakness is that it assigns the same price to all flows, while sliding window flow control causes burstiness that causes different flows to observe different loss or delay at a given link.

Classification of congestion control algorithms

Among the ways to classify congestion control algorithms are:

Mitigation

Mechanisms have been invented to prevent network congestion or to deal with a network collapse:

The correct endpoint behavior is usually to repeat dropped information, but progressively slow the repetition rate. Provided all endpoints do this, the congestion lifts and the network resumes normal behavior.[ citation needed ] Other strategies such as slow start ensure that new connections don't overwhelm the router before congestion detection initiates.

Common router congestion avoidance mechanisms include fair queuing and other scheduling algorithms, and random early detection (RED) where packets are randomly dropped as congestion is detected. This proactively triggers the endpoints to slow transmission before congestion collapse occurs.

Some end-to-end protocols are designed to behave well under congested conditions; TCP is a well known example. The first TCP implementations to handle congestion were described in 1984, [8] but Van Jacobson's inclusion of an open source solution in the Berkeley Standard Distribution UNIX ("BSD") in 1988 first provided good behavior.

UDP does not control congestion. Protocols built atop UDP must handle congestion independently. Protocols that transmit at a fixed rate, independent of congestion, can be problematic. Real-time streaming protocols, including many Voice over IP protocols, have this property. Thus, special measures, such as quality of service, must be taken to keep packets from being dropped in the presence of congestion.

Practical network congestion avoidance

Connection-oriented protocols, such as the widely used TCP protocol, watch for packet loss or queuing delay to adjust their transmission rate. Various network congestion avoidance processes support different trade-offs. [9]

TCP/IP congestion avoidance

The TCP congestion avoidance algorithm is the primary basis for congestion control on the Internet. [10] [11] [12] [13] [14]

Problems occur when concurrent TCP flows experience tail-drops, especially when bufferbloat is present. This delayed packet loss interferes with TCP's automatic congestion avoidance. All flows that experience this packet loss begin a TCP retrain at the same moment – this is called TCP global synchronization.

Active queue management

Active queue management (AQM) is the reordering or dropping of network packets inside a transmit buffer that is associated with a network interface controller (NIC). This task is performed by the network scheduler.

Random early detection

One solution is to use random early detection (RED) on the network equipment's egress queue. [15] [16] On networking hardware ports with more than one egress queue, weighted random early detection (WRED) can be used.

RED indirectly signals TCP sender and receiver by dropping some packets, e.g. when the average queue length is more than a threshold (e.g. 50%) and deletes linearly or cubically more packets, [17] up to e.g. 100%, as the queue fills further.

Robust random early detection

The robust random early detection (RRED) algorithm was proposed to improve the TCP throughput against denial-of-service (DoS) attacks, particularly low-rate denial-of-service (LDoS) attacks. Experiments confirmed that RED-like algorithms were vulnerable under LDoS attacks due to the oscillating TCP queue size caused by the attacks. [18]

Flow-based WRED

Some network equipment is equipped with ports that can follow and measure each flow and are thereby able to signal a too big bandwidth flow according to some quality of service policy. A policy could then divide the bandwidth among all flows by some criteria. [19]

Explicit Congestion Notification

Another approach is to use Explicit Congestion Notification (ECN). [20] ECN is used only when two hosts signal that they want to use it. With this method, a protocol bit is used to signal explicit congestion. This is better than the indirect congestion notification signaled by packet loss by the RED/WRED algorithms, but it requires support by both hosts. [21] [15]

When a router receives a packet marked as ECN-capable and the router anticipates congestion, it sets the ECN flag, notifying the sender of congestion. The sender should respond by decreasing its transmission bandwidth, e.g., by decreasing its sending rate by reducing the TCP window size or by other means.

TCP window shaping

Congestion avoidance can be achieved efficiently by reducing traffic. When an application requests a large file, graphic or web page, it usually advertises a window of between 32K and 64K. This results in the server sending a full window of data (assuming the file is larger than the window). When many applications simultaneously request downloads, this data can create a congestion point at an upstream provider. By reducing the window advertisement, the remote servers send less data, thus reducing the congestion. [22] [23]

Backward ECN

Backward ECN (BECN) is another proposed congestion notification mechanism. It uses ICMP source quench messages as an IP signaling mechanism to implement a basic ECN mechanism for IP networks, keeping congestion notifications at the IP level and requiring no negotiation between network endpoints. Effective congestion notifications can be propagated to transport layer protocols, such as TCP and UDP, for the appropriate adjustments. [24]

Side effects of congestive collapse avoidance

The protocols that avoid congestive collapse generally assume that data loss is caused by congestion. On wired networks, errors during transmission are rare. WiFi, 3G and other networks with a radio layer are susceptible to data loss due to interference and may experience poor throughput in some cases. The TCP connections running over a radio-based physical layer see the data loss and tend to erroneously believe that congestion is occurring.

Short-lived connections

The slow-start protocol performs badly for short connections. Older web browsers created many short-lived connections and opened and closed the connection for each file. This kept most connections in the slow start mode. Initial performance can be poor, and many connections never get out of the slow-start regime, significantly increasing latency. To avoid this problem, modern browsers either open multiple connections simultaneously or reuse one connection for all files requested from a particular server.

Admission control

Admission control is any system that requires devices to receive permission before establishing new network connections. If the new connection risks creating congestion, permission can be denied. Examples include Contention-Free Transmission Opportunities (CFTXOPs) in the ITU-T G.hn standard for home networking over legacy wiring, Resource Reservation Protocol for IP networks and Stream Reservation Protocol for Ethernet.

See also

Related Research Articles

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

The Transmission Control Protocol (TCP) is one of the main protocols of the Internet protocol suite. It originated in the initial network implementation in which it complemented the Internet Protocol (IP). Therefore, the entire suite is commonly referred to as TCP/IP. TCP provides reliable, ordered, and error-checked delivery of a stream of octets (bytes) between applications running on hosts communicating via an IP network. Major internet applications such as the World Wide Web, email, remote administration, and file transfer rely on TCP, which is part of the Transport layer of the TCP/IP suite. SSL/TLS often runs on top of TCP.

<span class="mw-page-title-main">Transport layer</span> Layer in the OSI and TCP/IP models providing host-to-host communication services for applications

In computer networking, the transport layer is a conceptual division of methods in the layered architecture of protocols in the network stack in the Internet protocol suite and the OSI model. The protocols of this layer provide end-to-end communication services for applications. It provides services such as connection-oriented communication, reliability, flow control, and multiplexing.

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.

FAST TCP is a TCP congestion avoidance algorithm especially targeted at long-distance, high latency links, developed at the Netlab, California Institute of Technology and now being commercialized by FastSoft. FastSoft was acquired by Akamai Technologies in 2012.

<span class="mw-page-title-main">Random early detection</span> Algorithm

Random early detection (RED), also known as random early discard or random early drop, is a queuing discipline for a network scheduler suited for congestion avoidance.

Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and a congestion window (CWND), to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

TCP global synchronization in computer networks is a pattern of each sender decreasing and increasing transmission rates at the same time as other senders. It can happen to Transmission Control Protocol (TCP) flows during periods of congestion because each sender will reduce their transmission rate at the same time when packet loss occurs.

TCP Vegas is a TCP congestion avoidance algorithm that emphasizes packet delay, rather than packet loss, as a signal to help determine the rate at which to send packets. It was developed at the University of Arizona by Lawrence Brakmo and Larry L. Peterson and introduced in 1994.

TCP tuning techniques adjust the network congestion avoidance parameters of Transmission Control Protocol (TCP) connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases. However, blindly following instructions without understanding their real consequences can hurt performance as well.

Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as a percentage of packets lost with respect to packets sent.

The additive-increase/multiplicative-decrease (AIMD) algorithm is a feedback control algorithm best known for its use in TCP congestion control. AIMD combines linear growth of the congestion window when there is no congestion with an exponential reduction when congestion is detected. Multiple flows using AIMD congestion control will eventually converge to an equal usage of a shared link. The related schemes of multiplicative-increase/multiplicative-decrease (MIMD) and additive-increase/additive-decrease (AIAD) do not reach stability.

Bandwidth management is the process of measuring and controlling the communications on a network link, to avoid filling the link to capacity or overfilling the link, which would result in network congestion and poor performance of the network. Bandwidth is described by bit rate and measured in units of bits per second (bit/s) or bytes per second (B/s).

Compound TCP (CTCP) is a Microsoft algorithm that was introduced as part of the Windows Vista and Window Server 2008 TCP stack. It is designed to aggressively adjust the sender's congestion window to optimise TCP for connections with large bandwidth-delay products while trying not to harm fairness. It is also available for Linux, as well as for Windows XP and Windows Server 2003 via a hotfix.

In routers and switches, active queue management (AQM) is the policy of dropping packets inside a buffer associated with a network interface controller (NIC) before that buffer becomes full, often with the goal of reducing network congestion or improving end-to-end latency. This task is performed by the network scheduler, which for this purpose uses various algorithms such as random early detection (RED), Explicit Congestion Notification (ECN), or controlled delay (CoDel). RFC 7567 recommends active queue management as a best practice.

In computer networks, goodput is the application-level throughput of a communication; i.e. the number of useful information bits delivered by the network to a certain destination per unit of time. The amount of data considered excludes protocol overhead bits as well as retransmitted data packets. This is related to the amount of time from the first bit of the first packet sent until the last bit of the last packet is delivered.

In computing, Microsoft's Windows Vista and Windows Server 2008 introduced in 2007/2008 a new networking stack named Next Generation TCP/IP stack, to improve on the previous stack in several ways. The stack includes native implementation of IPv6, as well as a complete overhaul of IPv4. The new TCP/IP stack uses a new method to store configuration settings that enables more dynamic control and does not require a computer restart after a change in settings. The new stack, implemented as a dual-stack model, depends on a strong host-model and features an infrastructure to enable more modular components that one can dynamically insert and remove.

Bufferbloat is a cause of high latency and jitter in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation, as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), audio streaming, online gaming, and even ordinary web browsing.

Zeta-TCP refers to a set of proprietary Transmission Control Protocol (TCP) algorithms aiming at improving the end-to-end performance of TCP, regardless of whether the peer is Zeta-TCP or any other TCP protocol stack, in other words, to be compatible with the existing TCP algorithms. It was designed and implemented by AppEx Networks Corporation.

<span class="mw-page-title-main">Fast and Secure Protocol</span> Terminal command scheme used to transfer data

The Fast Adaptive and Secure Protocol (FASP) is a proprietary data transfer protocol. FASP is a network-optimized network protocol created by Michelle C. Munson and Serban Simu, productized by Aspera, and now owned by IBM subsequent to its acquisition of Aspera. The associated client/server software packages are also commonly called Aspera. The technology is patented under US Patent #8085781, Bulk Data Transfer, #20090063698, Method and system for aggregate bandwidth control. and others.

References

  1. (Al-Bahadili, 2012, p. 282) Al-Bahadili, H. (2012). Simulation in computer network design and modeling: Use and analysis. Hershey, PA: IGI Global.
  2. den Hartog, F., Raschella, A., Bouhafs, F., Kempker, P., Boltjes, B., & Seyedebrahimi, M. (2017, November). A Pathway to solving the Wi-Fi Tragedy of the Commons in apartment blocks. In 2017 27th International Telecommunication Networks and Applications Conference (ITNAC) (pp. 1-6). IEEE.
  3. RFC   896
  4. Fall, K.R.; Stevens, W.R. (2011). TCP/IP Illustrated, Volume 1: The Protocols (2 ed.). Pearson Education. p. 739. ISBN   9780132808187.
  5. Van Jacobson; Michael J. Karels (November 1988), Congestion Avoidance and Control (PDF), In October of '86, the Internet had the first of what became a series of 'congestion collapses'. During this period, the data throughput from LBL to UC Berkeley (sites separatedby 400 yards and two IMP hops) dropped from 32 Kbps to 40 bps. We were fascinated by this sudden factor-of-thousand drop in bandwidth and embarked on an investigation of why things had gotten so bad. In particular, we wondered if the 4.3BSD (Berkeley UNIX) TCP was mis-behaving or if it could be tuned to work better under abysmal network conditions.The answer to both of these questions was "yes".
  6. Hafner, Katie (4 September 2019). "Sally Floyd, Who Helped Things Run Smoothly Online, Dies at 69". New York Times. Retrieved 5 September 2019.
  7. Nanda, Priyadarsi (2000-11-01). "A Control Theory Approach for Congestion Control in Intranetwork". IFAC Proceedings Volumes. 16th IFAC Workshop on Distributed Computer Control Systems (DCCS 2000), Sydney, Australia, 29 November-1 December 2000. 33 (30): 91–94. doi:10.1016/S1474-6670(17)36735-6. ISSN   1474-6670.
  8. Vinton G. Cerf; Robert E. Kahn (May 1974). "A Protocol for Packet Network Intercommunication" (PDF). IEEE Transactions on Communications. 22 (5): 637–648. doi:10.1109/tcom.1974.1092259. Archived from the original (PDF) on March 4, 2016.
  9. Lee, B.P.; Balan, R.K.; Jacob, L.; Seah, W.K.G.; Ananda, A.L. (2000), "TCP Tunnels: Avoiding Congestion Collapse", Proceedings 25th Annual IEEE Conference on Local Computer Networks. LCN 2000, pp. 408–417, doi:10.1109/LCN.2000.891077, ISBN   0-7695-0912-6, S2CID   34447400
  10. Van Jacobson, Michael J. Karels. Congestion Avoidance and Control (1988). Proceedings of the Sigcomm '88 Symposium, vol.18(4): pp.314329. Stanford, CA. August, 1988. This paper originated many of the congestion avoidance algorithms used in TCP/IP.
  11. RFC 2001 - TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery Algorithms
  12. RFC 2581 - TCP Congestion Control
  13. RFC 3390 - TCP Increasing TCP's Initial Window
  14. TCP Congestion Avoidance Explained via a Sequence Diagram
  15. 1 2 Sally Floyd: RED (Random Early Detection) Queue Management
  16. Sally Floyd, Van Jacobson. Random Early Detection Gateways for Congestion Avoidance (1993). IEEE/ACM Transactions on Networking, vol.1(4): pp.397413. Invented Random Early Detection (RED) gateways.
  17. An Analytical RED Function Design Guaranteeing Stable System Behavior, CiteSeerX   10.1.1.105.5995 , ...The advantage of this function lies not only in avoiding heavy oscillations but also in avoiding link under-utilization at low loads. The applicability of the derived function is independent of the load range, no parameters are to be adjusted. Compared to the original linear drop function applicability is extended by far...Our example with realistic system parameters gives an approximation function of the cubic of the queue size...
  18. Zhang, Changwang; Yin, Jianping; Cai, Zhiping; Chen, Weifeng (2010). "RRED: Robust RED Algorithm to Counter Low-rate Denial-of-Service Attacks" (PDF). IEEE Communications Letters. 14 (5). IEEE: 489–491. doi:10.1109/LCOMM.2010.05.091407. S2CID   1121461.
  19. "Congestion Avoidance Overview". Cisco Systems . Retrieved 2020-08-07.
  20. RFC 3168 - The Addition of Explicit Congestion Notification (ECN) to IP
  21. Comparative study of RED, ECN and TCP Rate Control (1999)
  22. Generalized Window Advertising for TCP CongestionControl (PDF), retrieved 2020-11-13
  23. Pop, O.; Moldován, I.; Simon, Cs.; Bíró, J.; Koike, A.; Ishii, H. (2000), "Advertised Window-Based TCP Flow Control in Routers", Telecommunication Network Intelligence, pp. 197–218, doi: 10.1007/978-0-387-35522-1_12 , ISBN   978-1-4757-6693-6
  24. A proposal for Backward ECN for the Internet Protocol