Interrupt handler

Last updated

In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, software interrupt instructions, or software exceptions, and are used for implementing device drivers or transitions between protected modes of operation, such as system calls.

Contents

The traditional form of interrupt handler is the hardware interrupt handler. Hardware interrupts arise from electrical conditions or low-level protocols implemented in digital logic, are usually dispatched via a hard-coded table of interrupt vectors, asynchronously to the normal execution stream (as interrupt masking levels permit), often using a separate stack, and automatically entering into a different execution context (privilege level) for the duration of the interrupt handler's execution. In general, hardware interrupts and their handlers are used to handle high-priority conditions that require the interruption of the current code the processor is executing. [1] [2]

Later it was found convenient for software to be able to trigger the same mechanism by means of a software interrupt (a form of synchronous interrupt). Rather than using a hard-coded interrupt dispatch table at the hardware level, software interrupts are often implemented at the operating system level as a form of callback function.

Interrupt handlers have a multitude of functions, which vary based on what triggered the interrupt and the speed at which the interrupt handler completes its task. For example, pressing a key on a computer keyboard, [1] or moving the mouse, triggers interrupts that call interrupt handlers which read the key, or the mouse's position, and copy the associated information into the computer's memory. [2]

An interrupt handler is a low-level counterpart of event handlers. However, interrupt handlers have an unusual execution context, many harsh constraints in time and space, and their intrinsically asynchronous nature makes them notoriously difficult to debug by standard practice (reproducible test cases generally don't exist), thus demanding a specialized skillset—an important subset of system programming—of software engineers who engage at the hardware interrupt layer.

Interrupt flags

Unlike other event handlers, interrupt handlers are expected to set interrupt flags to appropriate values as part of their core functionality.

Even in a CPU which supports nested interrupts, a handler is often reached with all interrupts globally masked by a CPU hardware operation. In this architecture, an interrupt handler would normally save the smallest amount of context necessary, and then reset the global interrupt disable flag at the first opportunity, to permit higher priority interrupts to interrupt the current handler. It is also important for the interrupt handler to quell the current interrupt source by some method (often toggling a flag bit of some kind in a peripheral register) so that the current interrupt isn't immediately repeated on handler exit, resulting in an infinite loop.

Exiting an interrupt handler with the interrupt system in exactly the right state under every eventuality can sometimes be an arduous and exacting task, and its mishandling is the source of many serious bugs, of the kind that halt the system completely. These bugs are sometimes intermittent, with the mishandled edge case not occurring for weeks or months of continuous operation. Formal validation of interrupt handlers is tremendously difficult, while testing typically identifies only the most frequent failure modes, thus subtle, intermittent bugs in interrupt handlers often ship to end customers.

Execution context

In a modern operating system, upon entry the execution context of a hardware interrupt handler is subtle.

For reasons of performance, the handler will typically be initiated in the memory and execution context of the running process, to which it has no special connection (the interrupt is essentially usurping the running context—process time accounting will often accrue time spent handling interrupts to the interrupted process). However, unlike the interrupted process, the interrupt is usually elevated by a hard-coded CPU mechanism to a privilege level high enough to access hardware resources directly.

Stack space considerations

In a low-level microcontroller, the chip might lack protection modes and have no memory management unit (MMU). In these chips, the execution context of an interrupt handler will be essentially the same as the interrupted program, which typically runs on a small stack of fixed size (memory resources have traditionally been extremely scant at the low end). Nested interrupts are often provided, which exacerbates stack usage. A primary constraint on the interrupt handler in this programming endeavour is to not exceed the available stack in the worst-case condition, requiring the programmer to reason globally about the stack space requirement of every implemented interrupt handler and application task.

When allocated stack space is exceeded (a condition known as a stack overflow), this is not normally detected in hardware by chips of this class. If the stack is exceeded into another writable memory area, the handler will typically work as expected, but the application will fail later (sometimes much later) due to the handler's side effect of memory corruption. If the stack is exceeded into a non-writable (or protected) memory area, the failure will usually occur inside the handler itself (generally the easier case to later debug).

In the writable case, one can implement a sentinel stack guard—a fixed value right beyond the end of the legal stack whose value can be overwritten, but never will be if the system operates correctly. It is common to regularly observe corruption of the stack guard with some kind of watch dog mechanism. This will catch the majority of stack overflow conditions at a point in time close to the offending operation.

In a multitasking system, each thread of execution will typically have its own stack. If no special system stack is provided for interrupts, interrupts will consume stack space from whatever thread of execution is interrupted. These designs usually contain an MMU, and the user stacks are usually configured such that stack overflow is trapped by the MMU, either as a system error (for debugging) or to remap memory to extend the space available. Memory resources at this level of microcontroller are typically far less constrained, so that stacks can be allocated with a generous safety margin.

In systems supporting high thread counts, it is better if the hardware interrupt mechanism switches the stack to a special system stack, so that none of the thread stacks need account for worst-case nested interrupt usage. Tiny CPUs as far back as the 8-bit Motorola 6809 from 1978 have provided separate system and user stack pointers.

Constraints in time and concurrency

For many reasons, it is highly desired that the interrupt handler execute as briefly as possible, and it is highly discouraged (or forbidden) for a hardware interrupt to invoke potentially blocking system calls. In a system with multiple execution cores, considerations of reentrancy are also paramount. If the system provides for hardware DMA, concurrency issues can arise even with only a single CPU core. (It is not uncommon for a mid-tier microcontroller to lack protection levels and an MMU, but still provide a DMA engine with many channels; in this scenario, many interrupts are typically triggered by the DMA engine itself, and the associated interrupt handler is expected to tread carefully.)

A modern practice has evolved to divide hardware interrupt handlers into front-half and back-half elements. The front-half (or first level) receives the initial interrupt in the context of the running process, does the minimal work to restore the hardware to a less urgent condition (such as emptying a full receive buffer) and then marks the back-half (or second level) for execution in the near future at the appropriate scheduling priority; once invoked, the back-half operates in its own process context with fewer restrictions and completes the handler's logical operation (such as conveying the newly received data to an operating system data queue).

Divided handlers in modern operating systems

In several operating systems Linux, Unix,[ citation needed ] macOS, Microsoft Windows, z/OS, DESQview and some other operating systems used in the pastinterrupt handlers are divided into two parts: the First-Level Interrupt Handler (FLIH) and the Second-Level Interrupt Handlers (SLIH). FLIHs are also known as hard interrupt handlers or fast interrupt handlers, and SLIHs are also known as slow/soft interrupt handlers, or Deferred Procedure Calls in Windows.

A FLIH implements at minimum platform-specific interrupt handling similar to interrupt routines. In response to an interrupt, there is a context switch, and the code for the interrupt is loaded and executed. The job of a FLIH is to quickly service the interrupt, or to record platform-specific critical information which is only available at the time of the interrupt, and schedule the execution of a SLIH for further long-lived interrupt handling. [2]

FLIHs cause jitter in process execution. FLIHs also mask interrupts. Reducing the jitter is most important for real-time operating systems, since they must maintain a guarantee that execution of specific code will complete within an agreed amount of time. To reduce jitter and to reduce the potential for losing data from masked interrupts, programmers attempt to minimize the execution time of a FLIH, moving as much as possible to the SLIH. With the speed of modern computers, FLIHs may implement all device and platform-dependent handling, and use a SLIH for further platform-independent long-lived handling.

FLIHs which service hardware typically mask their associated interrupt (or keep it masked as the case may be) until they complete their execution. An (unusual) FLIH which unmasks its associated interrupt before it completes is called a reentrant interrupt handler. Reentrant interrupt handlers might cause a stack overflow from multiple preemptions by the same interrupt vector, and so they are usually avoided. In a priority interrupt system, the FLIH also (briefly) masks other interrupts of equal or lesser priority.

A SLIH completes long interrupt processing tasks similarly to a process. SLIHs either have a dedicated kernel thread for each handler, or are executed by a pool of kernel worker threads. These threads sit on a run queue in the operating system until processor time is available for them to perform processing for the interrupt. SLIHs may have a long-lived execution time, and thus are typically scheduled similarly to threads and processes.

In Linux, FLIHs are called upper half, and SLIHs are called lower half or bottom half. [1] [2] This is different from naming used in other Unix-like systems, where both are a part of bottom half.[ clarification needed ]

See also

Related Research Articles

<span class="mw-page-title-main">Computer multitasking</span> Concurrent execution of multiple processes

In computing, multitasking is the concurrent execution of multiple tasks over a certain period of time. New tasks can interrupt already started ones before they finish, instead of waiting for them to end. As a result, a computer executes segments of multiple tasks in an interleaved manner, while the tasks share common processing resources such as central processing units (CPUs) and main memory. Multitasking automatically interrupts the running program, saving its state and loading the saved state of another program and transferring control to it. This "context switch" may be initiated at fixed time intervals, or the running program may be coded to signal to the supervisory software when it can be interrupted.

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point, and then restoring a different, previously saved, state. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multiprogramming or multitasking operating system. In a traditional CPU, each process - a program in execution - utilizes the various CPU registers to store data and hold the current state of the running process. However, in a multitasking operating system, the operating system switches between processes or threads to allow the execution of multiple processes simultaneously. For every switch, the operating system must save the state of the currently running process, followed by loading the next process state, which will run on the CPU. This sequence of operations that stores the state of the running process and the loading of the following running process is called a context switch.

<span class="mw-page-title-main">Interrupt</span> Signal to a computer processor emitted by hardware or software

In digital computers, an interrupt is a request for the processor to interrupt currently executing code, so that the event can be processed in a timely manner. If the request is accepted, the processor will suspend its current activities, save its state, and execute a function called an interrupt handler to deal with the event. This interruption is often temporary, allowing the software to resume normal activities after the interrupt handler finishes, although the interrupt could instead indicate a fatal error.

In computing, interrupt latency refers to the delay between the start of an Interrupt Request (IRQ) and the start of the respective Interrupt Service Routine (ISR). For many operating systems, devices are serviced as soon as the device's interrupt handler is executed. Interrupt latency may be affected by microprocessor design, interrupt controllers, interrupt masking, and the operating system's (OS) interrupt handling methods.

<span class="mw-page-title-main">Operating system</span> Software that manages computer hardware resources

An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.

A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. An RTOS is distinct from a time-sharing operating system, such as Unix, which manages the sharing of system resources with a scheduler, data buffers, or fixed task prioritization in a multitasking or multiprogramming environment. Processing time requirements need to be fully understood and bound rather than just kept as a minimum. All processing must occur within the defined constraints. Real-time operating systems are event-driven and preemptive, meaning the OS can monitor the relevant priority of competing tasks, and make changes to the task priority. Event-driven systems switch between tasks based on their priorities, while time-sharing systems switch the task based on clock interrupts.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">Process (computing)</span> Particular execution of a computer program

In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes are rooted in an operating system (OS) process which comprises the program code, assigned system resources, physical and logical access permissions, and data structures to initiate, control and coordinate execution activity. Depending on the OS, a process may be made up of multiple threads of execution that execute instructions concurrently.

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. The implementation of threads and processes differs between operating systems. In Modern Operating Systems, Tanenbaum shows that many distinct models of process organization are possible. In many cases, a thread is a component of a process. The multiple threads of a given process may be executed concurrently, sharing resources such as memory, while different processes do not share these resources. In particular, the threads of a process share its executable code and the values of its dynamically allocated variables and non-thread-local global variables at any given time.

<span class="mw-page-title-main">System call</span> Way for programs to access kernel services

In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all memory references on the memory bus, translating these requests, known as virtual memory addresses, into physical addresses in main memory.

Execution in computer and software engineering is the process by which a computer or virtual machine reads and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a 'fetch–decode–execute' cycle for each instruction done by control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

RTLinux is a hard realtime real-time operating system (RTOS) microkernel that runs the entire Linux operating system as a fully preemptive process. The hard real-time property makes it possible to control robots, data acquisition systems, manufacturing plants, and other time-sensitive instruments and machines from RTLinux applications. The design was patented. Despite the similar name, it is not related to the Real-Time Linux project of the Linux Foundation.

<span class="mw-page-title-main">General protection fault</span>

A general protection fault (GPF) in the x86 instruction set architectures (ISAs) is a fault initiated by ISA-defined protection mechanisms in response to an access violation caused by some running code, either in the kernel or a user program. The mechanism is first described in Intel manuals and datasheets for the Intel 80286 CPU, which was introduced in 1983; it is also described in section 9.8.13 in the Intel 80386 programmer's reference manual from 1986. A general protection fault is implemented as an interrupt. Some operating systems may also classify some exceptions not related to access violations, such as illegal opcode exceptions, as general protection faults, even though they have nothing to do with memory protection. If a CPU detects a protection violation, it stops executing the code and sends a GPF interrupt. In most cases, the operating system removes the failing process from the execution queue, signals the user, and continues executing other processes. If, however, the operating system fails to catch the general protection fault, i.e. another protection violation occurs before the operating system returns from the previous GPF interrupt, the CPU signals a double fault, stopping the operating system. If yet another failure occurs, the CPU is unable to recover; since 80286, the CPU enters a special halt state called "Shutdown", which can only be exited through a hardware reset. The IBM PC AT, the first PC-compatible system to contain an 80286, has hardware that detects the Shutdown state and automatically resets the CPU when it occurs. All descendants of the PC AT do the same, so in a PC, a triple fault causes an immediate system reset.

In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

Signals are standardized messages sent to a running program to trigger specific behavior, such as quitting or error handling. They are a limited form of inter-process communication (IPC), typically used in Unix, Unix-like, and other POSIX-compliant operating systems.

In computer science, asynchronous I/O is a form of input/output processing that permits other processing to continue before the transmission has finished. A name used for asynchronous I/O in the Windows API is overlapped I/O.

In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit, or in some cases software emulation of those features. However, technologies that emulate or supply an NX bit will usually impose a measurable overhead while using a hardware-supplied NX bit imposes no measurable overhead.

DioneOS is a multitasking preemptive, real-time operating system (RTOS). The system is designed for microcontrollers, originally released on 2 February 2011 for the Texas Instruments TI MSP430x, and then on 29 March 2013 for the ARM Cortex-M3. Target microcontroller platforms have limited resources, i.e., system clock frequency of tens of MHz, and memory amounts of tens to a few hundred kilobytes (KB). The RTOS is adapted to such conditions by providing a compact and efficient image. The efficiency term here means minimizing further central processing unit (CPU) load caused by system use. According to this definition, the system is more effective when it consumes less CPU time to execute its internal parts, e.g., managing threads.

<span class="mw-page-title-main">ARM Cortex-R</span> Family of microprocessor cores with ARM microarchitecture

The ARM Cortex-R is a family of 32-bit and 64-bit RISC ARM processor cores licensed by Arm Holdings. The cores are optimized for hard real-time and safety-critical applications. Cores in this family implement the ARM Real-time (R) profile, which is one of three architecture profiles, the other two being the Application (A) profile implemented by the Cortex-A family and the Microcontroller (M) profile implemented by the Cortex-M family. The ARM Cortex-R family of microprocessors currently consists of ARM Cortex-R4(F), ARM Cortex-R5(F), ARM Cortex-R7(F), ARM Cortex-R8(F), ARM Cortex-R52(F), and ARM Cortex-R82(F).

References

  1. 1 2 3 "The Linux Kernel Module Programming Guide, Chapter 12. Interrupt Handlers". The Linux Documentation Project. May 18, 2007. Retrieved February 20, 2015.
  2. 1 2 3 4 Jonathan Corbet; Alessandro Rubini; Greg Kroah-Hartman (January 27, 2005). "Linux Device Drivers, Chapter 10. Interrupt Handling" (PDF). O'Reilly Media . Retrieved February 20, 2015.