Fatal system error

Last updated

Linux 3.8 kernel panic Ubuntu 13.04 VirtualBox Kernel Panic.png
Linux 3.8 kernel panic

A error (known as a system crash, stop error, kernel error, or bug check) occurs when an operating system halts because it has reached a condition where it can no longer operate safely (i.e. where critical data could be lost or the system damaged in other ways).

Contents

In Microsoft Windows, a fatal system error can be deliberately caused from a kernel-mode driver with either the KeBugCheck or KeBugCheckEx function. [1] However, this should only be done as a last option when a critical driver is corrupted and is impossible to recover. This design parallels that in OpenVMS. The Unix kernel panic concept is very similar.

In Windows

When a bug check is issued, a crash dump file will be created if the system is configured to create them. [2] This file contains a "snapshot" of useful low-level information about the system that can be used to debug the root cause of the problem and possibly other things in the background.

If the user has enabled it, the system will also write an entry to the system event log. The log entry contains information about the bug check (including the bug check code and its parameters) as well as a link that will report the bug and provide the user with prescriptive suggestions if the cause of the check is definitive and well-known.

Next, if a kernel debugger is connected and active when the bug check occurs, the system will break into the debugger where the cause of the crash can be investigated. If no debugger is attached, then a blue text screen is displayed that contains information about why the error occurred, which is commonly known as a blue screen or bug check screen.

The user will only see the blue screen if the system is not configured to automatically restart (which became the default setting in Windows XP SP2). Otherwise, it appears as though the system simply rebooted (though a blue screen may be visible briefly). In Windows, bug checks are only supported by the Windows NT kernel. The corresponding system routine in Windows 9x, named SHELL_SYSMODAL_Message, does not halt the system like bug checks do. Instead, it displays the infamous "blue screen of death" (BSoD) and allows the user to attempt to continue.

The Windows DDK and the WinDbg documentation both have reference information about most bug checks. The WinDbg package is available as a free download and can be installed by most users. The Windows DDK is larger and more complicated to install.

See also

Related Research Articles

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

<span class="mw-page-title-main">Debugger</span> Computer program used to test and debug other programs

A debugger or debugging tool is a computer program used to test and debug other programs. The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its execution and monitor changes in computer resources that may indicate malfunctioning code. Typical debugging facilities include the ability to run or halt the target program at specific points, display the contents of memory, CPU registers or storage devices, and modify memory or register contents in order to enter selected test data that might be a cause of faulty program execution.

<span class="mw-page-title-main">Kernel panic</span> Fatal error condition associated with Unix-like computer operating systems

A kernel panic is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher risk of major data loss. The term is largely specific to Unix and Unix-like systems. The equivalent on Microsoft Windows operating systems is a stop error, often called a "blue screen of death".

<span class="mw-page-title-main">Crash (computing)</span> When a computer program stops functioning properly and self-terminates

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it, usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

NTLDR is the boot loader for all releases of Windows NT operating system from 1993 with the release of Windows NT 3.1 up until Windows XP and Windows Server 2003. From Windows Vista onwards it was replaced by the BOOTMGR bootloader. NTLDR is typically run from the primary storage device, but it can also run from portable storage devices such as a CD-ROM, USB flash drive, or floppy disk. NTLDR can also load a non NT-based operating system given the appropriate boot sector in a file.

In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.

In computing, a non-maskable interrupt (NMI) is a hardware interrupt that standard interrupt-masking techniques in the system cannot ignore. It typically occurs to signal attention for non-recoverable hardware errors. Some NMIs may be masked, but only by using proprietary methods specific to the particular NMI. With regard to SPARC, the non-maskable interrupt (NMI), despite having the highest priority among interrupts, can be prevented from occurring through the use of an interrupt mask.

<span class="mw-page-title-main">Dr. Watson (debugger)</span> Application debugger for Microsoft Windows

Dr. Watson is an application debugger included with the Microsoft Windows operating system. It may be named drwatson.exe, drwtsn32.exe or dwwin.exe, depending on the version of Windows.

WinDbg is a multipurpose debugger for the Microsoft Windows computer operating system, distributed by Microsoft. Debugging is the process of finding and resolving errors in a system; in computing it also includes exploring the internal operation of software as a help to development. It can be used to debug user mode applications, device drivers, and the operating system itself in kernel mode.

<span class="mw-page-title-main">Crash reporter</span> System software that identify and report crash details

A crash reporter is usually a system software whose function is to identify reporting crash details and to alert when there are crashes, in production or on development / testing environments. Crash reports often include data such as stack traces, type of crash, trends and version of software. These reports help software developers- Web, SAAS, mobile apps and more, to diagnose and fix the underlying problem causing the crashes. Crash reports may contain sensitive information such as passwords, email addresses, and contact information, and so have become objects of interest for researchers in the field of computer security.

<span class="mw-page-title-main">Linux kernel oops</span> Serious, non-fatal error in the Linux kernel

In computing, an oops is a serious but non-fatal error in the Linux kernel. An oops may precede a kernel panic, but it may also allow continued operation with compromised reliability. The term does not stand for anything, other than that it is a simple mistake.

<span class="mw-page-title-main">Windows Error Reporting</span> Crash reporting technology

Windows Error Reporting (WER) is a crash reporting technology introduced by Microsoft with Windows XP and included in later Windows versions and Windows Mobile 5.0 and 6.0. Not to be confused with the Dr. Watson debugging tool which left the memory dump on the user's local machine, Windows Error Reporting collects and offers to send post-error debug information using the Internet to Microsoft when an application crashes or stops responding on a user's desktop. No data is sent without the user's consent. When a crash dump reaches the Microsoft server, it is analyzed, and information about a solution is sent back to the user if available. Solutions are served using Windows Error Reporting Responses. Windows Error Reporting runs as a Windows service. Kinshuman Kinshumann is the original architect of WER. WER was also included in the Association for Computing Machinery (ACM) hall of fame for its impact on the computing industry.

<span class="mw-page-title-main">Screen of death</span> Fatal error displays in operating systems

In computing, a screen of death, colloquially referred to as a blue screen of death, is an informal term for a type of a computer operating system error message displayed onscreen when the system has experienced a fatal system error. The fatal error typically results in unsaved work being lost and often indicates serious problems with the system's hardware or software. These error screens are usually the result of a kernel panic, although the terms are frequently used interchangeably. Most screens of death are displayed on an even background color with a message advising the user to restart the computer.

Windows Vista contains a range of new technologies and features that are intended to help network administrators and power users better manage their systems. Notable changes include a complete replacement of both the Windows Setup and the Windows startup processes, completely rewritten deployment mechanisms, new diagnostic and health monitoring tools such as random access memory diagnostic program, support for per-application Remote Desktop sessions, a completely new Task Scheduler, and a range of new Group Policy settings covering many of the features new to Windows Vista. Subsystem for UNIX Applications, which provides a POSIX-compatible environment is also introduced.

<span class="mw-page-title-main">Blue screen of death</span> Error screen displayed after a fatal system error on a computer running Microsoft Windows or ReactOS

The blue screen of death is a critical error screen displayed by the Microsoft Windows and ReactOS operating systems in the event of a fatal system error. It indicates a system crash, in which the operating system has reached a critical condition where it can no longer operate safely.

<span class="mw-page-title-main">Driver Verifier</span> Windows driver troubleshooter

Driver Verifier is a tool included in Microsoft Windows that replaces the default operating system subroutines with ones that are specifically developed to catch device driver bugs. Once enabled, it monitors and stresses drivers to detect illegal function calls or actions that may be causing system corruption. It acts within the kernel mode and can target specific device drivers for continual checking or make driver verifier functionality multithreaded, so that several device drivers can be stressed at the same time. It can simulate certain conditions such as low memory, I/O verification, pool tracking, IRQL checking, deadlock detection, DMA checks, IRP logging, etc. The verifier works by forcing drivers to work with minimal resources, making potential errors that might happen only rarely in a working system manifest immediately. Typically fatal system errors are generated by the stressed drivers in the test environment, producing core dumps that can be analysed and debugged immediately; without stressing, intermittent faults would occur in the field, without proper troubleshooting facilities or personnel.

In engineering, debugging is the process of finding the root cause of and workarounds and possible fixes for bugs.

In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot in which the power to the system is physically turned off and back on again ; or a warm reboot in which the system restarts while still powered up. The term restart is used to refer to a reboot when the operating system closes all programs and finalizes all pending input and output operations before initiating a soft reboot.

Memory forensics is forensic analysis of a computer's memory dump. Its primary application is investigation of advanced computer attacks which are stealthy enough to avoid leaving data on the computer's hard drive. Consequently, the memory (RAM) must be analyzed for forensic information.

Timeout Detection and Recovery or TDR is a feature of the Windows operating system (OS) introduced in Windows Vista. It detects response problems from a graphics card (GPU), and if a timeout occurs, the OS will attempt a card reset to recover a functional and responsive desktop environment. However, if the attempt was unsuccessful, it results in the Blue Screen of Death (BSOD). The recovery tries to mitigate the scenario where an end user superfluously reboots their device should it become unresponsive.

References

  1. "KeBugCheckEx function (wdm.h)". Microsoft Learn . 25 February 2022. Retrieved 1 May 2024.
  2. "Kernel-Mode Dump Files". Microsoft Learn . 28 December 2023. Retrieved 1 May 2024.