Double fault

Last updated November 22, 2021

On the x86 architecture, a double fault exception occurs if the processor encounters a problem while trying to service a pending interrupt or exception. An example situation when a double fault would occur is when an interrupt is triggered but the segment in which the interrupt handler resides is invalid. If the processor encounters a problem when calling the double fault handler, a triple fault is generated and the processor shuts down.

As double faults can only happen due to kernel bugs, they are rarely caused by user space programs in a modern protected mode operating system, unless the program somehow gains kernel access (some viruses and also some low-level DOS programs). Other processors like PowerPC or SPARC generally save state to predefined and reserved machine registers. A double fault will then be a situation where another exception happens while the processor is still using the contents of these registers to process the exception. SPARC processors have four levels of such registers, i.e. they have a 4-window register system.

Related Research Articles

In computing, a context switch is the process of storing the state of a process or thread, so that it can be restored and resume execution at a later point. This allows multiple processes to share a single central processing unit (CPU), and is an essential feature of a multitasking operating system.

In digital computers, an interrupt is a response by the processor to an event that needs attention from the software. An interrupt condition alerts the processor and serves as a request for the processor to interrupt the currently executing code when permitted, so that the event can be processed in a timely manner. If the request is accepted, the processor responds by suspending its current activities, saving its state, and executing a function called an interrupt handler to deal with the event. This interruption is temporary, and, unless the interrupt indicates a fatal error, the processor resumes normal activities after the interrupt handler finishes.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

In computing and computer programming, exception handling is the process of responding to the occurrence of exceptions – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an exception breaks the normal flow of execution and executes a pre-registered exception handler; the details of how this is done depend on whether it is a hardware or software exception and how the software exception is implemented. Exception handling, if provided, is facilitated by specialized programming language constructs, hardware mechanisms like interrupts, or operating system (OS) inter-process communication (IPC) facilities like signals. Some exceptions, especially hardware ones, may be handled so gracefully that execution can resume where it was interrupted.

On the x86 computer architecture, a triple fault is a special kind of exception generated by the CPU when an exception occurs while the CPU is trying to invoke the double fault exception handler, which itself handles exceptions occurring while trying to invoke a regular exception handler.

A translation lookaside buffer (TLB) is a memory cache that is used to reduce the time taken to access a user memory location. It is a part of the chip's memory-management unit (MMU). The TLB stores the recent translations of virtual memory to physical memory and can be called an address-translation cache. A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

A general protection fault (GPF) in the x86 instruction set architectures (ISAs) is a fault initiated by ISA-defined protection mechanisms in response to an access violation caused by some running code, either in the kernel or a user program. The mechanism is first described in Intel manuals and datasheets for the Intel 80286 CPU, which was introduced in 1983; it is also described in section 9.8.13 in the Intel 80386 programmer's reference manual from 1986. A general protection fault is implemented as an interrupt. Some operating systems may also classify some exceptions not related to access violations, such as illegal opcode exceptions, as general protection faults, even though they have nothing to do with memory protection. If a CPU detects a protection violation, it stops executing the code and sends a GPF interrupt. In most cases, the operating system removes the failing process from the execution queue, signals the user, and continues executing other processes. If, however, the operating system fails to catch the general protection fault, i.e. another protection violation occurs before the operating system returns from the previous GPF interrupt, the CPU signals a double fault, stopping the operating system. If yet another failure occurs, the CPU is unable to recover; since 80286, the CPU enters a special halt state called "Shutdown", which can only be exited through a hardware reset. The IBM PC AT, the first PC-compatible system to contain an 80286, has hardware that detects the Shutdown state and automatically resets the CPU when it occurs. All descendants of the PC AT do the same, so in a PC, a triple fault causes an immediate system reset.

In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, software interrupt instructions, or software exceptions, and are used for implementing device drivers or transitions between protected modes of operation, such as system calls.

Signals are standardized messages sent to a running program to trigger specific behavior, such as quitting or error handling. They are a limited form of inter-process communication (IPC), typically used in Unix, Unix-like, and other POSIX-compliant operating systems.

In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.

The Pentium F00F bug is a design flaw in the majority of Intel Pentium, Pentium MMX, and Pentium OverDrive processors. Discovered in 1997, it can result in the processor ceasing to function until the computer is physically rebooted. The bug has been circumvented through operating system updates.

In operating systems, an interrupt storm is an event during which a processor receives an inordinate number of interrupts that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.

The interrupt priority level (IPL) is a part of the current system interrupt state, which indicates the interrupt requests that will currently be accepted. The IPL may be indicated in hardware by the registers in a Programmable Interrupt Controller, or in software by a bitmask or integer value and source code of threads

In computing ntoskrnl.exe, also known as kernel image, provides the kernel and executive layers of the Microsoft Windows NT kernel space, and is responsible for various system services such as hardware abstraction, process and memory management, thus making it a fundamental part of the system. It contains the cache manager, the executive, the kernel, the security reference monitor, the memory manager, and the scheduler (Dispatcher).

The task state segment (TSS) is a structure on x86-based computers which holds information about a task. It is used by the operating system kernel for task management. Specifically, the following information is stored in the TSS:

In computing and operating systems, a trap, also known as an exception or a fault, is typically a type of synchronous interrupt caused by an exceptional condition. A trap usually results in a switch to kernel mode, wherein the operating system performs some action before returning control to the originating process. A trap in a kernel process is more serious than a trap in a user process, and in some systems is fatal. In some usages, the term trap refers specifically to an interrupt intended to initiate a context switch to a monitor program or debugger.

In programming and software design, an event is an action or occurrence recognized by software, often originating asynchronously from the external environment, that may be handled by the software. Computer events can be generated or triggered by the system, by the user, or in other ways. Typically, events are handled synchronously with the program flow; that is, the software may have one or more dedicated places where events are handled, frequently an event loop. A source of events includes the user, who may interact with the software through the computer's peripherals - for example, by typing on the keyboard. Another source is a hardware device such as a timer. Software can also trigger its own set of events into the event loop, e.g. to communicate the completion of a task. Software that changes its behavior in response to events is said to be event-driven, often with the goal of being interactive.

The Interrupt flag (IF) is a flag bit in the CPU's FLAGS register, which determines whether or not the (CPU) will respond immediately to maskable hardware interrupts. If the flag is set to 1 maskable interrupts are enabled. If reset such interrupts will be disabled until interrupts are enabled. The Interrupt flag does not affect the handling of non-maskable interrupts (NMIs) or software interrupts generated by the INT instruction.

Blue screen of death Error screen displayed after a fatal system error on a Windows computer

A blue screen of death (BSoD), officially known as a stop error or blue screen error, is an error screen that the Windows operating system displays in the event of a fatal system error. It indicates a system crash, in which the operating system has reached a critical condition where it can no longer operate safely, e.g., hardware failure or a unexpected termination of a crucial process.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either cold, in which the power to the system is physically turned off and back on again causing an initial boot of the machine, or warm in which the system restarts without the need to interrupt the power. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Double fault

See also

Further reading

Related Research Articles