New API

Last updated

New API (also referred to as NAPI) is an interface to use interrupt mitigation techniques for networking devices in the Linux kernel. Such an approach is intended to reduce the overhead of packet receiving. The idea is to defer incoming message handling until there is a sufficient amount of them so that it is worth handling them all at once.

Contents

Motivation

A straightforward method of implementing a network driver is to interrupt the kernel by issuing an interrupt request (IRQ) for each and every incoming packet. However, servicing IRQs is costly in terms of processor resources and time. Therefore, the straightforward implementation can be very inefficient in high-speed networks, constantly interrupting the kernel with the thousands of packets per second. Overall performance of the system as well as network throughput can suffer as a result.

Polling is an alternative to interrupt-based processing. The kernel can periodically check for the arrival of incoming network packets without being interrupted, which eliminates the overhead of interrupt processing. Establishing an optimal polling frequency is important, however. Too frequent polling wastes CPU resources by repeatedly checking for incoming packets that have not yet arrived. On the other hand, polling too infrequently introduces latency by reducing system reactivity to incoming packets, and it may result in the loss of packets if the incoming packet buffer fills up before being processed.

As a compromise, the Linux kernel uses the interrupt-driven mode by default and only switches to polling mode when the flow of incoming packets exceeds a certain threshold, known as the "weight" of the network interface.

Compliant drivers

A driver using the NAPI interface will work as follow:

Advantages

History

NAPI was an over-three-year effort by Alexey Kuznetsov, Jamal Hadi Salim and Robert Olsson. Initial effort to include NAPI was met with resistance by some members of the community, however David Miller worked hard to ensure NAPI's inclusion.

A lot of real world testing was done in the Uppsala university network before inclusion. In fact, www.slu.se was the first production NAPI-based OS and is still powered to this day by NAPI-based Bifrost/Linux routers. The pktgen traffic generator was also born around this time. Pktgen was extensively used to test NAPI scenarios not induced by real world traffic.

Related Research Articles

<span class="mw-page-title-main">Interrupt</span> Signal to a computer processor emitted by hardware or software

In digital computers, an interrupt is a request for the processor to interrupt currently executing code, so that the event can be processed in a timely manner. If the request is accepted, the processor will suspend its current activities, save its state, and execute a function called an interrupt handler to deal with the event. This interruption is often temporary, allowing the software to resume normal activities after the interrupt handler finishes, although the interrupt could instead indicate a fatal error.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

<span class="mw-page-title-main">Network interface controller</span> Hardware component that connects a computer to a network

A network interface controller is a computer hardware component that connects a computer to a computer network.

TCP offload engine (TOE) is a technology used in some network interface cards (NIC) to offload processing of the entire TCP/IP stack to the network controller. It is primarily used with high-speed network interfaces, such as gigabit Ethernet and 10 Gigabit Ethernet, where processing overhead of the network stack becomes significant. TOEs are often used as a way to reduce the overhead associated with Internet Protocol (IP) storage protocols such as iSCSI and Network File System (NFS).

The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the transmission has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

<span class="mw-page-title-main">Linux kernel interfaces</span> An overview and comparison of the Linux kernal APIs and ABIs.

The Linux kernel provides several interfaces to user-space applications that are used for different purposes and that have different properties by design. There are two types of application programming interface (API) in the Linux kernel that are not to be confused: the "kernel–user space" API and the "kernel internal" API.

In operating systems, an interrupt storm is an event during which a processor receives an inordinate number of interrupts that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.

Netlink is a socket family used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets available on certain Unix-like operating systems, including its original incarnation as a Linux kernel interface, as well as in the form of a later implementation on FreeBSD. Similarly to the Unix domain sockets, and unlike INET sockets, Netlink communication cannot traverse host boundaries. However, while the Unix domain sockets use the file system namespace, Netlink sockets are usually addressed by process identifiers (PIDs).

The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received. In addition, if the driver for the network interface supports promiscuous mode, it allows the interface to be put into that mode so that all packets on the network can be received, even those destined to other hosts.

<span class="mw-page-title-main">Completely Fair Scheduler</span>

The Completely Fair Scheduler (CFS) is a process scheduler that was merged into the 2.6.23 release of the Linux kernel and is the default scheduler of the tasks of the SCHED_NORMAL class. It handles CPU resource allocation for executing processes, and aims to maximize overall CPU utilization while also maximizing interactive performance.

<span class="mw-page-title-main">Linux kernel</span> Operating system kernel

The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU operating system, which was written to be a free (libre) replacement for Unix.

In computer operating system design, kernel preemption is a property possessed by some kernels, in which the CPU can be interrupted in the middle of executing kernel code and assigned other tasks.

cgroups is a Linux kernel feature that limits, accounts for, and isolates the resource usage of a collection of processes.

The Data Plane Development Kit (DPDK) is an open source software project managed by the Linux Foundation. It provides a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the operating system kernel to processes running in user space. This offloading achieves higher computing efficiency and higher packet throughput than is possible using the interrupt-driven processing provided in the kernel.

Interrupt coalescing, also known as interrupt moderation, is a technique in which events which would normally trigger a hardware interrupt are held back, either until a certain amount of work is pending, or a timeout timer triggers. Used correctly, this technique can reduce interrupt load by up to an order of magnitude, while only incurring relatively small latency penalties. Interrupt coalescing is typically combined with either a hardware FIFO or direct memory access, to allow for continued data throughput while interrupts are being held back.

kGraft is a feature of the Linux kernel that implements live patching of a running kernel, which allows kernel patches to be applied while the kernel is still running. By avoiding the need for rebooting the system with a new kernel that contains the desired patches, kGraft aims to maximize the system uptime and availability. At the same time, kGraft allows kernel-related security updates to be applied without deferring them to scheduled downtimes. Internally, kGraft allows entire functions in a running kernel to be replaced with their patched versions, doing that safely by selectively using original versions of functions to ensure per-process consistency while the live patching is performed.

devpts

devpts is a virtual filesystem directory available in the Linux kernel since version 2.1.93. It is normally mounted at /dev/pts and contains solely devices files which represent slaves to the multiplexing master located at /dev/ptmx which in turn is used to implement terminal emulators.

XDP is an eBPF-based high-performance data path used to send and receive network packets at high rates by bypassing most of the operating system networking stack. It is merged in the Linux kernel since version 4.8. This implementation is licensed under GPL. Large technology firms including Amazon, Google and Intel support its development. Microsoft released their free and open source implementation XDP for Windows in May 2022. It is licensed under MIT License.

io_uring is a Linux kernel system call interface for storage device asynchronous I/O operations addressing performance issues with similar interfaces provided by functions like read /write or aio_read /aio_write etc. for operations on data accessed by file descriptors.

References

    Further reading