Network scheduler

Last updated
Packets queuing in a FIFO (first in, first out) data structure. Data Queue.svg
Packets queuing in a FIFO (first in, first out) data structure.

A network scheduler, also called packet scheduler, queueing discipline (qdisc) or queueing algorithm, is an arbiter on a node in a packet switching communication network. It manages the sequence of network packets in the transmit and receive queues of the protocol stack and network interface controller. There are several network schedulers available for the different operating systems, that implement many of the existing network scheduling algorithms.

Contents

The network scheduler logic decides which network packet to forward next. The network scheduler is associated with a queuing system, storing the network packets temporarily until they are transmitted. Systems may have a single or multiple queues in which case each may hold the packets of one flow, classification, or priority.

In some cases it may not be possible to schedule all transmissions within the constraints of the system. In these cases the network scheduler is responsible for deciding which traffic to forward and what gets dropped.

Terminology and responsibilities

A network scheduler may have responsibility in implementation of specific network traffic control initiatives. Network traffic control is an umbrella term for all measures aimed at reducing network congestion, latency and packet loss. Specifically, active queue management (AQM) is the selective dropping of queued network packets to achieve the larger goal of preventing excessive network congestion. The scheduler must choose which packets to drop. Traffic shaping smooths the bandwidth requirements of traffic flows by delaying transmission packets when they are queued in bursts. The scheduler decides the timing for the transmitted packets. Quality of service (QoS) is the prioritization of traffic based on service class (Differentiated services) or reserved connection (Integrated services).

Algorithms

In the course of time, many network queueing disciplines have been developed. Each of these provides specific reordering or dropping of network packets inside various transmit or receive buffers. [1] Queuing disciplines are commonly used as attempts to compensate for various networking conditions, like reducing the latency for certain classes of network packets, and are generally used as part of QoS measures. [2] [3] [4]

Classful queueing disciplines allow the creation of classes, which work like branches on a tree. Rules can then be set to filter packets into each class. Each class can itself have assigned other classful or classless queueing discipline. Classless queueing disciplines do not allow adding more queueing disciplines to it. [5]

Examples of algorithms suitable for managing network traffic include:

Queueing Algorithms
AlgorithmAcronymTypeHW Support
Generic cell rate algorithm GCRA
CHOose and Kill for unresponsive flowsCHOKeClassless
Controlled Delay CoDelClassless
Common Applications Kept Enhanced [6] CAKE
Earliest TxTime FirstETFClasslessYes
First In, First Out FIFOClassless
Fair Queue FQClassless
Fair Queuing Controlled Delay FQ-CoDelClassless
Flow Queuing with Proportional Integral controller EnhancedFQ-PIEClassless
Generalized Random Early DetectionGREDClassless
Heavy-Hitter Filter [7] HHFClassless
Multiqueue PriorityMQ-PRIOClasslessYes
MultiqueueMULTIQClasslessYes
Network Emulator [8] NETEMClassless
Proportional Integral controller-Enhanced [9] PIEClassless
Random Early Detection REDClassless
Stochastic Fair Blue SFBClassless
Stochastic Fairness QueueingSFQClassless
Token Bucket Filter TBFClassless
Class Based Queueing CBQClassful
Credit-Based Shaper CBSClassfulYes
Deficit Round Robin [10] DRRClassful
Enhanced Transmission Selection ETSClassful
Hierarchical Fair Service Curve HFSCClassful
Hierarchical Token Bucket [11] HTBClassful
PriorityPRIOClassful
Quick Fair Queueing [12] QFQClassful
Time Aware Priority ShaperTAPRIOClassfulYes

Several of the above have been implemented as Linux kernel modules [13] [14] and are freely available.

Bufferbloat

Bufferbloat is a phenomenon in packet-switched networks in which excess buffering of packets causes high latency and packet delay variation. Bufferbloat can be addressed by a network scheduler that strategically discards packets to avoid an unnecessarily high buffering backlog. Examples include CoDel, FQ-CoDel and Random early detection.

Implementations

Linux kernel

The Linux kernel's packet scheduler is part of the network stack, together with netfilter, nftables, and Berkeley Packet Filter. Simplified Structure of the Linux Kernel.svg
The Linux kernel's packet scheduler is part of the network stack, together with netfilter, nftables, and Berkeley Packet Filter.

The Linux kernel packet scheduler is an integral part of the Linux kernel's network stack and manages the transmit and receive ring buffers of all NICs, by working on the layer 2 of the OSI model and handling Ethernet frames, for example.

The packet scheduler is configured using the utility called tc (short for "traffic control"). As the default queuing discipline, the packet scheduler uses a FIFO implementation called pfifo_fast, [15] although systemd since its version 217 changes the default queuing discipline to fq_codel. [16]

The ifconfig and ip utilities enable system administrators to configure the buffer sizes txqueuelen and rxqueuelen for each device separately in terms of number of Ethernet frames regardless of their size. The Linux kernel's network stack contains several other buffers, which are not managed by the network scheduler. [lower-alpha 1]

Berkeley Packet Filter filters can be attached to the packet scheduler's classifiers. The eBPF functionality brought by version 4.1 of the Linux kernel in 2015 extends the classic BPF programmable classifiers to eBPF. [17] These can be compiled using the LLVM eBPF backend and loaded into a running kernel using the tc utility. [18]

BSD and OpenBSD

ALTQ is the implementation of a network scheduler for BSDs. As of OpenBSD version 5.5 ALTQ was replaced by the HFSC scheduler.

See also

Notes

  1. The overall size of all buffers has been the point of critique by the Bufferbloat project, which provided a partial solution with CoDel that has been primarily tested in OpenWrt.

Related Research Articles

Traffic shaping is a bandwidth management technique used on computer networks which delays some or all datagrams to bring them into compliance with a desired traffic profile. Traffic shaping is used to optimize or guarantee performance, improve latency, or increase usable bandwidth for some kinds of packets by delaying other kinds. It is often confused with traffic policing, the distinct but related practice of packet dropping and packet marking.

<span class="mw-page-title-main">Network interface controller</span> Hardware component that connects a computer to a network

A network interface controller is a computer hardware component that connects a computer to a computer network.

Explicit Congestion Notification (ECN) is an extension to the Internet Protocol and to the Transmission Control Protocol and is defined in RFC 3168 (2001). ECN allows end-to-end notification of network congestion without dropping packets. ECN is an optional feature that may be used between two ECN-enabled endpoints when the underlying network infrastructure also supports it.

Deficit Round Robin (DRR), also Deficit Weighted Round Robin (DWRR), is a scheduling algorithm for the network scheduler. DRR is, like weighted fair queuing (WFQ), a packet-based implementation of the ideal Generalized Processor Sharing (GPS) policy. It was proposed by M. Shreedhar and G. Varghese in 1995 as an efficient and fair algorithm.

Netfilter is a framework provided by the Linux kernel that allows various networking-related operations to be implemented in the form of customized handlers. Netfilter offers various functions and operations for packet filtering, network address translation, and port translation, which provide the functionality required for directing packets through a network and prohibiting packets from reaching sensitive locations within a network.

The token bucket is an algorithm used in packet-switched and telecommunications networks. It can be used to check that data transmissions, in the form of packets, conform to defined limits on bandwidth and burstiness. It can also be used as a scheduling algorithm to determine the timing of transmissions that will comply with the limits set for the bandwidth and burstiness: see network scheduler.

ALTQ is the network scheduler for Berkeley Software Distribution. ALTQ provides queueing disciplines, and other components related to quality of service (QoS), required to realize resource sharing. It is most commonly implemented on BSD-based routers. ALTQ is included in the base distribution of FreeBSD, NetBSD, and DragonFly BSD, and was integrated into the pf packet filter of OpenBSD but later replaced by a new queueing subsystem.

The type of service (ToS) field is the second byte of the IPv4 header. It has had various purposes over the years, and has been defined in different ways by five RFCs.

Linux IP Firewalling Chains, normally called ipchains, is free software to control the packet filter or firewall capabilities in the 2.2 series of Linux kernels. It superseded ipfirewall, but was replaced by iptables in the 2.4 series. Unlike iptables, ipchains is stateless.

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit , sigreturn , read and write to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

Packet loss occurs when one or more packets of data travelling across a computer network fail to reach their destination. Packet loss is either caused by errors in data transmission, typically across wireless networks, or network congestion. Packet loss is measured as a percentage of packets lost with respect to packets sent.

In routers and switches, active queue management (AQM) is the policy of dropping packets inside a buffer associated with a network interface controller (NIC) before that buffer becomes full, often with the goal of reducing network congestion or improving end-to-end latency. This task is performed by the network scheduler, which for this purpose uses various algorithms such as random early detection (RED), Explicit Congestion Notification (ECN), or controlled delay (CoDel). RFC 7567 recommends active queue management as a best practice.

The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received. In addition, if the driver for the network interface supports promiscuous mode, it allows the interface to be put into that mode so that all packets on the network can be received, even those destined to other hosts.

Blue is a scheduling discipline for the network scheduler developed by graduate student Wu-chang Feng for Professor Kang G. Shin at the University of Michigan and others at the Thomas J. Watson Research Center of IBM in 1999.

Bufferbloat is a cause of high latency and jitter in packet-switched networks caused by excess buffering of packets. Bufferbloat can also cause packet delay variation, as well as reduce the overall network throughput. When a router or switch is configured to use excessively large buffers, even very high-speed networks can become practically unusable for many interactive applications like voice over IP (VoIP), audio streaming, online gaming, and even ordinary web browsing.

CoDel is an active queue management (AQM) algorithm in network routing, developed by Van Jacobson and Kathleen Nichols and published as RFC8289. It is designed to overcome bufferbloat in networking hardware, such as routers, by setting limits on the delay network packets experience as they pass through buffers in this equipment. CoDel aims to improve on the overall performance of the random early detection (RED) algorithm by addressing some of its fundamental misconceptions, as perceived by Jacobson, and by being easier to manage.

Time-Sensitive Networking (TSN) is a set of standards under development by the Time-Sensitive Networking task group of the IEEE 802.1 working group. The TSN task group was formed in November 2012 by renaming the existing Audio Video Bridging Task Group and continuing its work. The name changed as a result of the extension of the working area of the standardization group. The standards define mechanisms for the time-sensitive transmission of data over deterministic Ethernet networks.

XDP is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This implementation is licensed under GPL. Large technology firms including Amazon, Google and Intel support its development. Microsoft released their free and open source implementation XDP for Windows in May 2022. It is licensed under MIT License.

<span class="mw-page-title-main">Dave Taht</span> American network engineer

Dave Täht is an American network engineer, musician, lecturer, asteroid exploration advocate, and Internet activist. He is the chief executive officer of TekLibre.

eBPF Safe dynamic programs and tools

eBPF is a technology that can run sandboxed programs in a privileged context such as the operating system kernel. It is used to safely and efficiently extend the capabilities of the kernel at runtime without requiring changes to kernel source code or loading kernel modules. Safety is provided through an in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively. Examples of programs that are automatically rejected are programs without strong exit guarantees and programs dereferencing pointers without safety-checks. Loaded programs which passed the verifier are either interpreted or in-kernel JIT compiled for native execution performance. The execution model is event-driven and with few exceptions run-to-completion, meaning, programs can be attached to various hook points in the operating system kernel and are run upon triggering of an event. eBPF use cases include networking such as XDP, tracing and security subsystems. Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues, Brendan Gregg famously dubbed eBPF "superpowers for Linux". Linus Torvalds said, "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for". Due to its success in Linux, the eBPF runtime has been ported to other operating systems such as Windows.

References

  1. "Traffic Control HOWTO: Classless Queuing Disciplines (qdiscs)". tldp.org. Retrieved November 24, 2013.
  2. "Traffic Control HOWTO: Components of Linux Traffic Control". tldp.org. Retrieved November 24, 2013.
  3. "Traffic Control HOWTO: Traditional Elements of Traffic Control". tldp.org. Retrieved November 24, 2013.
  4. "Queuing Disciplines: Order of Packet Transmission and Dropping" (PDF). tau.ac.il. October 25, 2006. Retrieved March 18, 2014.
  5. "Advanced traffic control - ArchWiki". wiki.archlinux.org. Retrieved 2023-09-11.
  6. "Let them run CAKE". LWN.net.
  7. "Heavy-Hitter Filter qdisc". kernel.org.
  8. "Network emulator Linux kernel network scheduler module". kernel.org . Retrieved 2013-09-07.
  9. "Proportional Integral controller Enhanced (PIE)". kernel.org.
  10. "DRR Linux kernel network scheduler module". kernel.org . Retrieved 2013-09-07.
  11. "HTB Linux kernel network scheduler module". kernel.org . Retrieved 2013-09-07.
  12. "QFQ Linux kernel network scheduler module". kernel.org . Retrieved 2013-09-07.
  13. "The Linux kernel network scheduler". kernel.org. 2012-12-26. Retrieved 2013-09-07.
  14. "tc(8) - Linux manual page". man7.org. Retrieved 2023-09-11.
  15. "Linux Advanced Routing and Traffic Control HOWTO, Section 9.2.1. pfifo_fast". lartc.org. 2012-05-19. Retrieved 2014-09-19.
  16. "systemd System and Service Manager: NEWS file". freedesktop.org. 2015-05-22. Retrieved 2015-06-09.
  17. "Linux kernel 4.1, Section 11. Networking". kernelnewbies.org. 2015-06-21.
  18. "BPF and XDP Reference Guide". Cilium documentation web site.