Bufferbloat

Last updated

Bufferbloat is a cause of high latency and jitter in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation (also known as jitter), as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), audio streaming, online gaming, and even ordinary web browsing.

Contents

Some communications equipment manufacturers designed unnecessarily large buffers into some of their network products. In such equipment, bufferbloat occurs when a network link becomes congested, causing packets to become queued for long periods in these oversized buffers. In a first-in first-out queuing system, overly large buffers result in longer queues and higher latency, and do not improve network throughput. It can also be induced by specific slow-speed connections hindering the on-time delivery of other packets.

The bufferbloat phenomenon was described as early as 1985. [1] It gained more widespread attention starting in 2009. [2]

According to some sources the most frequent cause of high latency ("lag") in online video games is local home network bufferbloat. High latency can render modern online gaming impossible. [3]

Buffering

An established rule of thumb for the network equipment manufacturers was to provide buffers large enough to accommodate at least 250  ms of buffering for a stream of traffic passing through a device. For example, a router's Gigabit Ethernet interface would require a relatively large 32  MB buffer. [4] Such sizing of the buffers can lead to failure of the TCP congestion control algorithm. The buffers then take some time to drain, before congestion control resets and the TCP connection ramps back up to speed and fills the buffers again. [5] Bufferbloat thus causes problems such as high and variable latency, and choking network bottlenecks for all other flows as the buffer becomes full of the packets of one TCP stream and other packets are then dropped. [6]

A bloated buffer has an effect only when this buffer is actually used. In other words, oversized buffers have a damaging effect only when the link they buffer becomes a bottleneck. The size of the buffer serving a bottleneck can be measured using the ping utility provided by most operating systems. First, the other host should be pinged continuously; then, a several-seconds-long download from it should be started and stopped a few times. By design, the TCP congestion avoidance algorithm will rapidly fill up the bottleneck on the route. If downloading (and uploading, respectively) correlates with a direct and important increase of the round trip time reported by ping, then it demonstrates that the buffer of the current bottleneck in the download (and upload, respectively) direction is bloated. Since the increase of the round trip time is caused by the buffer on the bottleneck, the maximum increase gives a rough estimation of its size in milliseconds.[ citation needed ]

In the previous example, using an advanced traceroute tool instead of the simple pinging (for example, MTR) will not only demonstrate the existence of a bloated buffer on the bottleneck, but will also pinpoint its location in the network. Traceroute achieves this by displaying the route (path) and measuring transit delays of packets across the network. The history of the route is recorded as round-trip times of the packets received from each successive host (remote node) in the route (path). [7]

Mechanism

Most TCP congestion control algorithms rely on measuring the occurrence of packet drops to determine the available bandwidth between two ends of a connection. The algorithms speed up the data transfer until packets start to drop, then slow down the transmission rate. Ideally, they keep adjusting the transmission rate until it reaches an equilibrium speed of the link. So that the algorithms can select a suitable transfer speed, the feedback about packet drops must occur in a timely manner. With a large buffer that has been filled, the packets will arrive at their destination, but with a higher latency. The packets were not dropped, so TCP does not slow down once the uplink has been saturated, further filling the buffer. Newly arriving packets are dropped only when the buffer is fully saturated. Once this happens TCP may even decide that the path of the connection has changed, and again go into the more aggressive search for a new operating point. [8]

Packets are queued within a network buffer before being transmitted; in problematic situations, packets are dropped only if the buffer is full. On older routers, buffers were fairly small so they filled quickly and therefore packets began to drop shortly after the link became saturated, so the TCP protocol could adjust and the issue would not become apparent. On newer routers, buffers have become large enough to hold several seconds of buffered data. To TCP, a congested link can appear to be operating normally as the buffer fills. The TCP algorithm is unaware the link is congested and does not start to take corrective action until the buffer finally overflows and packets are dropped.

All packets passing through a simple buffer implemented as a single queue will experience similar delay, so the latency of any connection that passes through a filled buffer will be affected. Available channel bandwidth can also end up being unused, as some fast destinations may not be promptly reached due to buffers clogged with data awaiting delivery to slow destinations. These effects impair interactivity of applications using other network protocols, including UDP used in latency-sensitive applications like VoIP and online gaming. [9] [ self-published source ]

Impact on applications

Regardless of bandwidth requirements, any type of a service which requires consistently low latency or jitter-free transmission can be affected by bufferbloat. Such services include digital voice calls (VOIP), online gaming, video chat, and other interactive applications such as radio streaming, video on demand, and remote login.

When the bufferbloat phenomenon is present and the network is under load, even normal web page loads can take many seconds to complete, or simple DNS queries can fail due to timeouts. [10] Actually any TCP connection can timeout and disconnect, and UDP packets can get discarded. Since the continuation of a TCP download stream depends on acknowledgement (ACK) packets in the upload stream, a bufferbloat problem in the upload can cause failure of other non-related download applications, because the client ACK packets do not timely reach the internet server.

Detection

The DSL Reports Speedtest [11] is an easy-to-use test that includes a score for bufferbloat. The ICSI Netalyzr [12] was another on-line tool that could be used for checking networks for the presence of bufferbloat, together with checking for many other common configuration problems. [13] The service was shut down in March 2019. The bufferbloat.net web site lists tools and procedures for determining whether a connection has excess buffering that will slow it down. [14] [15]

Solutions and mitigations

Several technical solutions exist which can be broadly grouped into two categories: solutions that target the network and solutions that target the endpoints. The two types of solutions are often complementary. The problem sometimes arrives with a combination of fast and slow network paths.

Network solutions generally take the form of queue management algorithms. This type of solution has been the focus of the IETF AQM working group. [16] Notable examples include:

Notable examples of solutions targeting the endpoints are:

The problem may also be mitigated by reducing the buffer size on the OS [10] and network hardware; however, this is often not configurable and optimal buffer size is dependent on line rate which may differ for different destinations.

Utilizing DiffServ (and employing multiple priority-based queues) helps in prioritizing transmission of low-latency traffic (such as VoIP, videoconferencing, gaming), relegating dealing with congestion and bufferbloat onto non-prioritized traffic. [21]

Optimal buffer size

For the longest delay TCP connections to still get their fair share of the bandwidth, the buffer size should be at least the bandwidth-delay product divided by the square root of the number of simultaneous streams. [22] [4] A typical rule of thumb is 50 ms of line rate data, [23] but some popular consumer grade switches only have 1 ms, [24] which may result in extra bandwidth loss on the longer delay connections in case of local contention with others.

See also

Related Research Articles

Quality of service (QoS) is the description or measurement of the overall performance of a service, such as a telephony or computer network, or a cloud computing service, particularly the performance seen by the users of the network. To quantitatively measure quality of service, several related aspects of the network service are often considered, such as packet loss, bit rate, throughput, transmission delay, availability, jitter, etc.

In telecommunication and computer engineering, the queuing delay or queueing delay is the time a job waits in a queue until it can be executed. It is a key component of network delay. In a switched network, queuing delay is the time between the completion of signaling by the call originator and the arrival of a ringing signal at the call receiver. Queuing delay may be caused by delays at the originating switch, intermediate switches, or the call receiver servicing switch. In a data network, queuing delay is the sum of the delays between the request for service and the establishment of a circuit to the called data terminal equipment (DTE). In a packet-switched network, queuing delay is the sum of the delays encountered by a packet between the time of insertion into the network and the time of delivery to the address.

Traffic shaping is a bandwidth management technique used on computer networks which delays some or all datagrams to bring them into compliance with a desired traffic profile. Traffic shaping is used to optimize or guarantee performance, improve latency, or increase usable bandwidth for some kinds of packets by delaying other kinds. It is often confused with traffic policing, the distinct but related practice of packet dropping and packet marking.

Network congestion in data networking and queueing theory is the reduced quality of service that occurs when a network node or link is carrying more data than it can handle. Typical effects include queueing delay, packet loss or the blocking of new connections. A consequence of congestion is that an incremental increase in offered load leads either only to a small increase or even a decrease in network throughput.

FAST TCP is a TCP congestion avoidance algorithm especially targeted at long-distance, high latency links, developed at the Netlab, California Institute of Technology and now being commercialized by FastSoft. FastSoft was acquired by Akamai Technologies in 2012.

<span class="mw-page-title-main">Random early detection</span> Algorithm

Random early detection (RED), also known as random early discard or random early drop, is a queuing discipline for a network scheduler suited for congestion avoidance.

Transmission Control Protocol (TCP) uses a congestion control algorithm that includes various aspects of an additive increase/multiplicative decrease (AIMD) scheme, along with other schemes including slow start and congestion window (CWND), to achieve congestion avoidance. The TCP congestion-avoidance algorithm is the primary basis for congestion control in the Internet. Per the end-to-end principle, congestion control is largely a function of internet hosts, not the network itself. There are several variations and versions of the algorithm implemented in protocol stacks of operating systems of computers that connect to the Internet.

Nagle's algorithm is a means of improving the efficiency of TCP/IP networks by reducing the number of packets that need to be sent over the network. It was defined by John Nagle while working for Ford Aerospace. It was published in 1984 as a Request for Comments (RFC) with title Congestion Control in IP/TCP Internetworks in RFC 896.

TCP global synchronization in computer networks can happen to TCP/IP flows during periods of congestion because each sender will reduce their transmission rate at the same time when packet loss occurs.

TCP tuning techniques adjust the network congestion avoidance parameters of Transmission Control Protocol (TCP) connections over high-bandwidth, high-latency networks. Well-tuned networks can perform up to 10 times faster in some cases. However, blindly following instructions without understanding their real consequences can hurt performance as well.

Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as a percentage of packets lost with respect to packets sent.

Bandwidth management is the process of measuring and controlling the communications on a network link, to avoid filling the link to capacity or overfilling the link, which would result in network congestion and poor performance of the network. Bandwidth is described by bit rate and measured in units of bits per second (bit/s) or bytes per second (B/s).

In routers and switches, active queue management (AQM) is the policy of dropping packets inside a buffer associated with a network interface controller (NIC) before that buffer becomes full, often with the goal of reducing network congestion or improving end-to-end latency. This task is performed by the network scheduler, which for this purpose uses various algorithms such as random early detection (RED), Explicit Congestion Notification (ECN), or controlled delay (CoDel). RFC 7567 recommends active queue management as a best practice.

In packet switching networks, traffic flow, packet flow or network flow is a sequence of packets from a source computer to a destination, which may be another host, a multicast group, or a broadcast domain. RFC 2722 defines traffic flow as "an artificial logical equivalent to a call or connection." RFC 3697 defines traffic flow as "a sequence of packets sent from a particular source to a particular unicast, anycast, or multicast destination that the source desires to label as a flow. A flow could consist of all packets in a specific transport connection or a media stream. However, a flow is not necessarily 1:1 mapped to a transport connection." Flow is also defined in RFC 3917 as "a set of IP packets passing an observation point in the network during a certain time interval." Packet flow temporal efficiency can be affected by one-way delay (OWD) that is described as a combination of the following components:

CoDel is an active queue management (AQM) algorithm in network routing, developed by Van Jacobson and Kathleen Nichols and published as RFC8289. It is designed to overcome bufferbloat in networking hardware, such as routers, by setting limits on the delay network packets experience as they pass through buffers in this equipment. CoDel aims to improve on the overall performance of the random early detection (RED) algorithm by addressing some of its fundamental misconceptions, as perceived by Jacobson, and by being easier to manage.

<span class="mw-page-title-main">Network scheduler</span> Arbiter on a node in packet switching communication network

A network scheduler, also called packet scheduler, queueing discipline (qdisc) or queueing algorithm, is an arbiter on a node in a packet switching communication network. It manages the sequence of network packets in the transmit and receive queues of the protocol stack and network interface controller. There are several network schedulers available for the different operating systems, that implement many of the existing network scheduling algorithms.

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of the standardization group. The standards define mechanisms for the time-sensitive transmission of data over deterministic Ethernet networks.

Kathleen Nichols is an American computer scientist and computer networking expert. Nichols is the founder and CEO of Pollere, Inc, a network architecture and performance company based in California, US. Before founding Pollere, Nichols was VP of Network Science at Packet Design, where she was part of the founding team. Prior to Packet Design she was director of advanced Internet architectures in the Office of CTO at Cisco Systems.

Deterministic Networking (DetNet) is an effort by the IETF DetNet Working Group to study implementation of deterministic data paths for real-time applications with extremely low data loss rates, packet delay variation (jitter), and bounded latency, such as audio and video streaming, industrial automation, and vehicle control.

<span class="mw-page-title-main">Dave Taht</span> American network engineer

Dave Täht is an American network engineer, musician, lecturer, asteroid exploration advocate, and Internet activist. He is the chief executive officer of TekLibre.

References

  1. "On Packet Switches with Infinite Storage". December 31, 1985.
  2. van Beijnum, Iljitsch (January 7, 2011). "Understanding Bufferbloat and the Network Buffer Arms Race". Ars Technica . Retrieved November 12, 2011.
  3. "Bufferbloat: Dark Buffers in the Internet: Networks without effective AQM may again be vulnerable to congestion collapse". Queue. doi: 10.1145/2063166.2071893 . S2CID   18820360.
  4. 1 2 Guido Appenzeller; Isaac Keslassy; Nick McKeown (2004). "Sizing Router Buffers" (PDF). ACM SIGCOMM. ACM. Retrieved October 15, 2013.
  5. Nichols, Kathleen; Jacobson, Van (May 6, 2012). "Controlling Queue Delay". ACM Queue. ACM Publishing. Retrieved September 27, 2013.
  6. Gettys, Jim (May–June 2011), Bufferbloat: Dark Buffers in the Internet, IEEE Internet Computing, vol. 15, IEEE, pp. 95–96, doi:10.1109/MIC.2011.56, archived from the original on October 12, 2012, retrieved February 20, 2012
  7. "traceroute(8) – Linux man page". die.net. Retrieved September 27, 2013.
  8. Jacobson, Van; Karels, MJ (1988). "Congestion avoidance and control" (PDF). ACM SIGCOMM Computer Communication Review. 18 (4): 314–329. doi:10.1145/52325.52356. Archived from the original (PDF) on June 22, 2004.
  9. "Technical Introduction to Bufferbloat". Bufferbloat.net. Retrieved September 27, 2013.
  10. 1 2 3 4 Gettys, Jim; Nichols, Kathleen (January 2012). "Bufferbloat: Dark Buffers in the Internet". Communications of the ACM. ACM. 55 (1): 57–65. doi: 10.1145/2063176.2063196 .
  11. "Speed test - how fast is your internet?". dslreports.com. Retrieved October 26, 2017.
  12. "ICSI Netalyzr". berkeley.edu. Archived from the original on April 7, 2019. Retrieved January 30, 2015.
  13. "Understanding your Netalyzr results" . Retrieved October 26, 2017.
  14. "Tests for Bufferbloat". bufferbloat.net. Retrieved October 26, 2017.[ self-published source ]
  15. "Introduction to Bufferbloat". bufferbloat.net. Retrieved May 8, 2023.
  16. "IETF AQM working group". ietf.org. Retrieved October 26, 2017.
  17. Pan, Rong; Natarajan, Preethi; Piglione, Chiara; Prabhu, Mythili; Subramanian, Vijay; Baker, Fred; VerSteeg, Bill (2013). "PIE: A Lightweight Control Scheme To Address the Bufferbloat Problem". 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR). 2013 IEEE 14th International Conference on High Performance Switching and Routing (HPSR). IEEE. pp. 148–155. doi:10.1109/HPSR.2013.6602305. ISBN   978-1-4673-4620-7.
  18. Høiland-Jørgensen, Toke; McKenney, Paul; Taht, Dave; Gettys, Jim; Dumazet, Eric. The FlowQueue-CoDel Packet Scheduler and Active Queue Management Algorithm. doi: 10.17487/RFC8290 . RFC 8290.
  19. "DOCSIS "Upstream Buffer Control" feature". CableLabs. pp. 554–556. Retrieved August 9, 2012.
  20. Høiland-Jørgensen, Toke; Kazior, Michał; Täht, Dave; Hurtig, Per; Brunstrom, Anna (2017). Ending the Anomaly: Achieving Low Latency and Airtime Fairness in WiFi. 2017 USENIX Annual Technical Conference (USENIX ATC 17). USENIX - The Advanced Computing Systems Association. pp. 139–151. ISBN   978-1-931971-38-6 . Retrieved September 28, 2017. source code.
  21. Hein, Mathias. "Bufferbloat » ADMIN Magazine". ADMIN Magazine. Retrieved June 11, 2020.
  22. Huston, Geoff (December 12, 2019). "Sizing the buffer". APNIC Blog. Retrieved October 16, 2022.
  23. "Router/Switch Buffer Size Issues". fasterdata.es.net. Retrieved October 16, 2022.
  24. "BCM53115". www.broadcom.com. Retrieved October 16, 2022.