Interrupt storm

Last updated December 31, 2024

In operating systems, an interrupt storm is an event during which a processor receives an inordinate number of interrupts that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.

Background

Because interrupt processing is typically a non-preemptible task in time-sharing operating systems, an interrupt storm will cause sluggish response to user input, or even appear to freeze the system completely. This state is commonly known as live lock . In such a state, the system is spending most of its resources processing interrupts instead of completing other work. To the end-user, it does not appear to be processing anything at all as there is often no output. An interrupt storm is sometimes mistaken for thrashing, since they both have similar symptoms (unresponsive or sluggish response to user input, little or no output).

Common causes include: misconfigured or faulty hardware, faulty device drivers, flaws in the operating system, or metastability in one or more components. The latter condition rarely occurs outside of prototype or amateur-built hardware.

Most modern hardware and operating systems have methods for mitigating the effect of an interrupt storm. For example, most Ethernet controllers implement interrupt "rate limiting", which causes the controller to wait a programmable amount of time between each interrupt it generates. When not present within the device, similar functionality is usually written into the device driver, and/or the operating system itself.

The most common cause is when a device "behind" another signals an interrupt to an APIC (Advanced Programmable Interrupt Controller). Most computer peripherals generate interrupts through an APIC as the number of interrupts is most always less (typically 15 for the modern PC) than the number of devices. The OS must then query each driver registered to that interrupt to ask if the interrupt originated from its hardware. Faulty drivers may always claim "yes", causing the OS to not query other drivers registered to that interrupt (only one interrupt can be processed at a time). The device which originally requested the interrupt therefore does not get its interrupt serviced, so a new interrupt is generated (or is not cleared) and the processor becomes swamped with continuous interrupt signals. Any operating system can live lock under an interrupt storm caused by such a fault. A kernel debugger can usually break the storm by unloading the faulty driver, allowing the driver "underneath" the faulty one to clear the interrupt, if user input is still possible.

This occurred in an older version of FreeBSD, where PCI cards that were configured to operate in ISA compatibility mode could not properly interact with the ISA interrupt routing. This would either cause interrupts to never be detected by the operating system, or the operating system would never be able to clear them, resulting in an interrupt storm. ^[1]

As drivers are most often implemented by a 3rd party, most operating systems also have a polling mode that queries for pending interrupts at fixed intervals or in a round-robin fashion. This mode can be set globally, on a per-driver, per-interrupt basis, or dynamically if the OS detects a fault condition or excessive interrupt generation. A polling mode may be enabled dynamically when the number of interrupts or the resource use caused by an interrupt, passes certain thresholds. When these thresholds are no longer exceeded, an OS may then change the interrupting driver, interrupt, or interrupt handling globally, from an interrupt mode to a polling mode. Interrupt rate limiting in hardware usually negates the use of a polling mode, but can still happen during normal operation during intense I/O if the processor is unable switch contexts quickly enough to keep pace.

History

Perhaps the first interrupt storm occurred during the Apollo 11's lunar descent in 1969.^[2]

Considerations

Interrupt rate limiting must be carefully configured for optimum results. For example, an Ethernet controller with interrupt rate limiting will buffer the packets it receives from the network in between each interrupt. If the rate is set too low, the controller's buffer will overflow, and packets will be dropped. The rate must take into account how fast the buffer may fill between interrupts, and the interrupt latency between the interrupt and the transfer of the buffer to the system.

Interrupt mitigating

There are hardware-based and software-based approaches to the problem. For example, FreeBSD detects interrupt storms and masks problematic interrupts for some time in response.^{[ citation needed ]}

The system used by NAPI is an example of the hardware-based approach: the system (driver) starts in interrupt enabled state, and the Interrupt handler then disables the interrupt and lets a thread/task handle the event(s) and then task polls the device, processing some number of events and enabling the interrupt.

Another interesting approach using hardware support is one where the device generates interrupt when the event queue state changes from "empty" to "not empty". Then, if there are no free DMA descriptors at the RX FIFO tail, the device drops the event. The event is then added to the tail and the FIFO entry is marked as occupied. If at that point entry (tail−1) is free (cleared), an interrupt will be generated (level interrupt) and the tail pointer will be incremented. If the hardware requires the interrupt be acknowledged, the CPU (interrupt handler) will do that, handle the valid DMA descriptors at the head, and return from the interrupt.

Related Research Articles

In the context of an operating system, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and other computer programs to access hardware functions without needing to know precise details about the hardware being used.

In digital computers, an interrupt is a request for the processor to interrupt currently executing code, so that the event can be processed in a timely manner. If the request is accepted, the processor will suspend its current activities, save its state, and execute a function called an interrupt handler to deal with the event. This interruption is often temporary, allowing the software to resume normal activities after the interrupt handler finishes, although the interrupt could instead indicate a fatal error.

In computing, interrupt latency refers to the delay between the start of an Interrupt Request (IRQ) and the start of the respective Interrupt Service Routine (ISR). For many operating systems, devices are serviced as soon as the device's interrupt handler is executed. Interrupt latency may be affected by microprocessor design, interrupt controllers, interrupt masking, and the operating system's (OS) interrupt handling methods.

An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.

Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).

A universal asynchronous receiver-transmitter is a peripheral device for asynchronous serial communication in which the data format and transmission speeds are configurable. It sends data bits one by one, from the least significant to the most significant, framed by start and stop bits so that precise timing is handled by the communication channel. The electric signaling levels are handled by a driver circuit external to the UART. Common signal levels are RS-232, RS-485, and raw TTL for short debugging links. Early teletypewriters used current loops.

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

I²C (Inter-Integrated Circuit; pronounced as “eye-squared-see” or “eye-two-see”), alternatively known as I2C or IIC, is a synchronous, multi-controller/multi-target (historically termed as multi-master/multi-slave), single-ended, serial communication bus invented in 1982 by Philips Semiconductors. It is widely used for attaching lower-speed peripheral integrated circuits (ICs) to processors and microcontrollers in short-distance, intra-board communication.

<span class="mw-page-title-main">Amiga 4000</span> 1992 personal computer

The Amiga 4000, or A4000, from Commodore is the successor of the Amiga 2000 and Amiga 3000 computers. There are two models: the A4000/040 released in October 1992 with a Motorola 68040 CPU, and the A4000/030 released in April 1993 with a Motorola 68EC030.

Hardware abstractions are sets of routines in software that provide programs with access to hardware resources through programming interfaces. The programming interface allows all devices in a particular class C of hardware devices to be accessed through identical interfaces even though C may contain different subclasses of devices that each provide a different hardware interface.

The Advanced Host Controller Interface (AHCI) is a technical standard defined by Intel that specifies the register-level interface of Serial ATA (SATA) host controllers in a non-implementation-specific manner in its motherboard chipsets.

The Quick Emulator (QEMU) is a free and open-source emulator that uses dynamic binary translation to emulate a computer's processor; that is, it translates the emulated binary codes to an equivalent binary format which is executed by the machine. It provides a variety of hardware and device models for the virtual machine, enabling it to run different guest operating systems. QEMU can be used with a Kernel-based Virtual Machine (KVM) to emulate hardware at near-native speeds. Additionally, it supports user-level processes, allowing applications compiled for one processor architecture to run on another.

The Intel 8259 is a programmable interrupt controller (PIC) designed for the Intel 8085 and 8086 microprocessors. The initial part was 8259, a later A suffix version was upward compatible and usable with the 8086 or 8088 processor. The 8259 combines multiple interrupt input sources into a single interrupt output to the host microprocessor, extending the interrupt levels available in a system beyond the one or two levels found on the processor chip. The 8259A was the interrupt controller for the ISA bus in the original IBM PC and IBM PC AT.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the I/O operation has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

The High Precision Event Timer (HPET) is a hardware timer available in modern x86-compatible personal computers. Compared to older types of timers available in the x86 architecture, HPET allows more efficient processing of highly timing-sensitive applications, such as multimedia playback and OS task switching. It was developed jointly by Intel and Microsoft and has been incorporated in PC chipsets since 2005. Formerly referred to by Intel as a Multimedia Timer, the term HPET was selected to avoid confusion with the software multimedia timers introduced in the MultiMedia Extensions to Windows 3.0.

In a computer, an interrupt request is a hardware signal sent to the processor that temporarily stops a running program and allows a special program, an interrupt handler, to run instead. Hardware interrupts are used to handle events such as receiving data from a modem or network card, key presses, or mouse movements.

In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults and malicious behavior.

A kernel is a computer program at the core of a computer's operating system that always has complete control over everything in the system. The kernel is also responsible for preventing and mitigating conflicts between different processes. It is the portion of the operating system code that is always resident in memory and facilitates interactions between hardware and software components. A full kernel controls all hardware resources via device drivers, arbitrates conflicts between processes concerning such resources, and optimizes the utilization of common resources e.g. CPU & cache usage, file systems, and network sockets. On most systems, the kernel is one of the first programs loaded on startup. It handles the rest of startup as well as memory, peripherals, and input/output (I/O) requests from software, translating them into data-processing instructions for the central processing unit.

The POSIX terminal interface is the generalized abstraction, comprising both an application programming interface for programs, and a set of behavioural expectations for users of a terminal, as defined by the POSIX standard and the Single Unix Specification. It is a historical development from the terminal interfaces of BSD version 4 and Seventh Edition Unix.

The Data Plane Development Kit (DPDK) is an open source software project managed by the Linux Foundation. It provides a set of data plane libraries and network interface controller polling-mode drivers for offloading TCP packet processing from the operating system kernel to processes running in user space. This offloading achieves higher computing efficiency and higher packet throughput than is possible using the interrupt-driven processing provided in the kernel.

References

↑ "Problems updating FreeBSD's card system from ISA to PCI". www.usenix.org. Retrieved 2024-05-07.
↑ Murray, Charles (1989). Apollo: The Race to the Moon. Simon and Schuster. pp. 345–355.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Problems updating FreeBSD's card system from ISA to PCI". www.usenix.org. Retrieved 2024-05-07.

[2] Murray, Charles (1989). Apollo: The Race to the Moon. Simon and Schuster. pp. 345–355.

[1]

[2]