Crash (computing)

Last updated

A kernel panic displayed on an iMac. This is the most common form of an operating system failure in Unix-like systems. Crashed computer.jpg
A kernel panic displayed on an iMac. This is the most common form of an operating system failure in Unix-like systems.

In computing, a crash, or system crash, occurs when a computer program such as a software application or an operating system stops functioning properly and exits. On some operating systems or individual applications, a crash reporting service will report the crash and any details relating to it (or give the user the option to do so), usually to the developer(s) of the application. If the program is a critical part of the operating system, the entire system may crash or hang, often resulting in a kernel panic or fatal system error.

Contents

Most crashes are the result of a software bug. Typical causes include accessing invalid memory addresses, [lower-alpha 1] incorrect address values in the program counter, buffer overflow, overwriting a portion of the affected program code due to an earlier bug, executing invalid machine instructions (an illegal or unauthorized opcode), or triggering an unhandled exception. The original software bug that started this chain of events is typically considered to be the cause of the crash, which is discovered through the process of debugging. The original bug can be far removed from the code that actually triggered the crash.

In early personal computers, attempting to write data to hardware addresses outside the system's main memory could cause hardware damage. Some crashes are exploitable and let a malicious program or hacker execute arbitrary code, allowing the replication of viruses or the acquisition of data which would normally be inaccessible.

Application crashes

A display at Frankfurt Airport running a program under Windows XP that has crashed due to a memory read access violation Computer crash airport.jpg
A display at Frankfurt Airport running a program under Windows XP that has crashed due to a memory read access violation

An application typically crashes when it performs an operation that is not allowed by the operating system. The operating system then triggers an exception or signal in the application. Unix applications traditionally responded to the signal by dumping core. Most Windows and Unix GUI applications respond by displaying a dialogue box (such as the one shown to the right) with the option to attach a debugger if one is installed. Some applications attempt to recover from the error and continue running instead of exiting.

An application can also contain code to crash [lower-alpha 2] after detecting a severe error.

Typical errors that result in application crashes include:

Crash to desktop

A "crash to desktop" is said to occur when a program (commonly a video game) unexpectedly quits, abruptly taking the user back to the desktop. Usually, the term is applied only to crashes where no error is displayed, hence all the user sees as a result of the crash is the desktop. Many times there is no apparent action that causes a crash to desktop. During normal function, the program may freeze for a shorter period of time, and then close by itself. Also during normal function, the program may become a black screen and repeatedly play the last few seconds of sound (depending on the size of the audio buffer) that was being played before it crashes to desktop. Other times it may appear to be triggered by a certain action, such as loading an area.

Crash to desktop bugs are considered particularly problematic for users. Since they frequently display no error message, it can be very difficult to track down the source of the problem, especially if the times they occur and the actions taking place right before the crash do not appear to have any pattern or common ground. One way to track down the source of the problem for games is to run them in windowed-mode. Windows Vista has a feature that can help track down the cause of a CTD problem when it occurs on any program.[ clarification needed ] Windows XP included a similar feature as well.[ clarification needed ]

Some computer programs, such as StepMania and BBC's Bamzooki , also crash to desktop if in full-screen, but display the error in a separate window when the user has returned to the desktop.

Web server crashes

The software running the web server behind a website may crash, rendering it inaccessible entirely or providing only an error message instead of normal content.

For example: if a site is using an SQL database (such as MySQL) for a script (such as PHP) and that SQL database server crashes, then PHP will display a connection error.

Operating system crashes

A Blue Screen of Death as displayed in Windows XP, Vista, and 7 Windows XP BSOD.png
A Blue Screen of Death as displayed in Windows XP, Vista, and 7
A kernel panic as displayed in OS X Mountain Lion OS X Mountain Lion kernel panic.jpg
A kernel panic as displayed in OS X Mountain Lion

An operating system crash commonly occurs when a hardware exception occurs that cannot be handled. Operating system crashes can also occur when internal sanity-checking logic within the operating system detects that the operating system has lost its internal self-consistency.

Modern multi-tasking operating systems, such as Linux, and macOS, usually remain unharmed when an application program crashes.

Some operating systems, e.g., z/OS, have facilities for Reliability, availability and serviceability (RAS) and the OS can recover from the crash of a critical component, whether due to hardware failure, e.g., uncorrectable ECC error, or to software failure, e.g., a reference to an unassigned page.

Abnormal end

An Abnormal end or ABEND is an abnormal termination of software, or a program crash. Errors or crashes on the Novell NetWare network operating system are usually called ABENDs. Communities of NetWare administrators sprung up around the Internet, such as abend.org.

This usage derives from the ABEND macro on IBM OS/360, ..., z/OS operating systems. Usually capitalized, but may appear as "abend". Some common ABEND codes are System ABEND 0C7 (data exception) and System ABEND 0CB (division by zero). [1] [2] [3] Abends can be "soft" (allowing automatic recovery) or "hard" (terminating the activity). [4] The term is jocularly claimed to be derived from the German word "abend" meaning "evening". [5]

Security and privacy implications of crashes

Depending on the application, the crash may contain the user's sensitive and private information. [6] Moreover, many software bugs which cause crashes are also exploitable for arbitrary code execution and other types of privilege escalation. [7] [8] For example, a stack buffer overflow can overwrite the return address of a subroutine with an invalid value, which will cause, e.g., a segmentation fault, when the subroutine returns. However, if an exploit overwrites the return address with a valid value, the code in that address will be executed.

Crash reproduction

When crashes are collected in the field using a crash reporter, the next step for developers is to be able to reproduce them locally. For this, several techniques exist: STAR uses symbolic execution, [9] EvoCrash performs evolutionary search. [10]

See also

Notes

  1. Types of invalid addresses include:
  2. In OS/360 and successors the application normally uses an ABEND macro with a user completion code.

Related Research Articles

<span class="mw-page-title-main">Buffer overflow</span> Anomaly in computer security and programming

In programming and information security, a buffer overflow or buffer overrun is an anomaly whereby a program writes data to a buffer beyond the buffer's allocated memory, overwriting adjacent memory locations.

<span class="mw-page-title-main">BIOS</span> Firmware for hardware initialization and OS runtime services

In computing, BIOS is firmware used to provide runtime services for operating systems and programs and to perform hardware initialization during the booting process. The BIOS firmware comes pre-installed on an IBM PC or IBM PC compatible's system board and exists in some UEFI-based systems to maintain compatibility with operating systems that do not support UEFI native operation. The name originates from the Basic Input/Output System used in the CP/M operating system in 1975. The BIOS originally proprietary to the IBM PC has been reverse engineered by some companies looking to create compatible systems. The interface of that original system serves as a de facto standard.

<span class="mw-page-title-main">MVS</span> Operating system for IBM mainframes

Multiple Virtual Storage, more commonly called MVS, is the most commonly used operating system on the System/370, System/390 and IBM Z IBM mainframe computers. IBM developed MVS, along with OS/VS1 and SVS, as a successor to OS/360. It is unrelated to IBM's other mainframe operating system lines, e.g., VSE, VM, TPF.

<span class="mw-page-title-main">Operating system</span> Software that manages computer hardware resources

An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.

In computing, a segmentation fault or access violation is a fault, or failure condition, raised by hardware with memory protection, notifying an operating system (OS) the software has attempted to access a restricted area of memory. On standard x86 computers, this is a form of general protection fault. The operating system kernel will, in response, usually perform some corrective action, generally passing the fault on to the offending process by sending the process a signal. Processes can in some cases install a custom signal handler, allowing them to recover on their own, but otherwise the OS default signal handler is used, generally causing abnormal termination of the process, and sometimes a core dump.

<span class="mw-page-title-main">Virtual memory</span> Computer memory management technique

In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very large (main) memory".

<span class="mw-page-title-main">Thread (computing)</span> Smallest sequence of programmed instructions that can be managed independently by a scheduler

In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system. In many cases, a thread is a component of a process.

In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise terminated abnormally. In practice, other key pieces of program state are usually dumped at the same time, including the processor registers, which may include the program counter and stack pointer, memory management information, and other processor and operating system flags and information. A snapshot dump is a memory dump requested by the computer operator or by the running program, after which the program is able to continue. Core dumps are often used to assist in diagnosing and debugging errors in computer programs.

<span class="mw-page-title-main">Memory management unit</span> Hardware translating virtual addresses to physical address

A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all memory references on the memory bus, translating these requests, known as virtual memory addresses, into physical addresses in main memory.

In computing, a bus error is a fault raised by hardware, notifying an operating system (OS) that a process is trying to access memory that the CPU cannot physically address: an invalid address for the address bus, hence the name. In modern use on most architectures these are much rarer than segmentation faults, which occur primarily due to memory access violations: problems in the logical address or permissions.

Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that has not been allocated to it. This prevents a bug or malware within a process from affecting other processes, or the operating system itself. Protection may encompass all accesses to a specified area of memory, write accesses, or attempts to execute the contents of the area. An attempt to access unauthorized memory results in a hardware fault, e.g., a segmentation fault, storage violation exception, generally causing abnormal termination of the offending process. Memory protection for computer security includes additional techniques such as address space layout randomization and executable-space protection.

<span class="mw-page-title-main">General protection fault</span>

A general protection fault (GPF) in the x86 instruction set architectures (ISAs) is a fault initiated by ISA-defined protection mechanisms in response to an access violation caused by some running code, either in the kernel or a user program. The mechanism is first described in Intel manuals and datasheets for the Intel 80286 CPU, which was introduced in 1983; it is also described in section 9.8.13 in the Intel 80386 programmer's reference manual from 1986. A general protection fault is implemented as an interrupt. Some operating systems may also classify some exceptions not related to access violations, such as illegal opcode exceptions, as general protection faults, even though they have nothing to do with memory protection. If a CPU detects a protection violation, it stops executing the code and sends a GPF interrupt. In most cases, the operating system removes the failing process from the execution queue, signals the user, and continues executing other processes. If, however, the operating system fails to catch the general protection fault, i.e. another protection violation occurs before the operating system returns from the previous GPF interrupt, the CPU signals a double fault, stopping the operating system. If yet another failure occurs, the CPU is unable to recover; since 80286, the CPU enters a special halt state called "Shutdown", which can only be exited through a hardware reset. The IBM PC AT, the first PC-compatible system to contain an 80286, has hardware that detects the Shutdown state and automatically resets the CPU when it occurs. All descendants of the PC AT do the same, so in a PC, a triple fault causes an immediate system reset.

In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space. Besides, the actual page contents may need to be loaded from a backing store, such as a disk. The MMU detects the page fault, but the operating system's kernel handles the exception by making the required page accessible in the physical memory or denying an illegal memory access.

In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

BIOS implementations provide interrupts that can be invoked by operating systems and application programs to use the facilities of the firmware on IBM PC compatible computers. Traditionally, BIOS calls are mainly used by DOS programs and some other software such as boot loaders. BIOS runs in the real address mode of the x86 CPU, so programs that call BIOS either must also run in real mode or must switch from protected mode to real mode before calling BIOS and then switching back again. For this reason, modern operating systems that use the CPU in Protected mode or Long mode generally do not use the BIOS interrupt calls to support system functions, although they use the BIOS interrupt calls to probe and initialize hardware during booting. Real mode has the 1MB memory limitation, modern boot loaders use the unreal mode or protected mode to access up to 4GB memory.

<span class="mw-page-title-main">Protection ring</span> Layer of protection in computer systems

In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults and malicious behavior.

<span class="mw-page-title-main">Windows Error Reporting</span> Crash reporting technology

Windows Error Reporting (WER) is a crash reporting technology introduced by Microsoft with Windows XP and included in later Windows versions and Windows Mobile 5.0 and 6.0. Not to be confused with the Dr. Watson debugging tool which left the memory dump on the user's local machine, Windows Error Reporting collects and offers to send post-error debug information using the Internet to Microsoft when an application crashes or stops responding on a user's desktop. No data is sent without the user's consent. When a crash dump reaches the Microsoft server, it is analyzed, and information about a solution is sent back to the user if available. Solutions are served using Windows Error Reporting Responses. Windows Error Reporting runs as a Windows service. Kinshuman Kinshumann is the original architect of WER. WER was also included in the Association for Computing Machinery (ACM) hall of fame for its impact on the computing industry.

<span class="mw-page-title-main">Blue screen of death</span> Error screen displayed after a fatal system error on a computer running Microsoft Windows or ReactOS

The blue screen of death is a critical error screen displayed by the Microsoft Windows and ReactOS operating systems in the event of a fatal system error. It indicates a system crash, in which the operating system has reached a critical condition where it can no longer operate safely.

<span class="mw-page-title-main">Kernel (operating system)</span> Core of a computer operating system

The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. The kernel is also responsible for preventing and mitigating conflicts between different processes. It is the portion of the operating system code that is always resident in memory and facilitates interactions between hardware and software components. A full kernel controls all hardware resources via device drivers, arbitrates conflicts between processes concerning such resources, and optimizes the utilization of common resources e.g. CPU & cache usage, file systems, and network sockets. On most systems, the kernel is one of the first programs loaded on startup. It handles the rest of startup as well as memory, peripherals, and input/output (I/O) requests from software, translating them into data-processing instructions for the central processing unit.

In operating systems, memory management is the function responsible for managing the computer's primary memory.

References

  1. "ABEND" (PDF). OS Release 21 - System/360 Operating System - Supervisor Services and Macro Instructions (PDF) (Eighth ed.). IBM. September 1974. pp. 97–99. GC28-6646-7. Retrieved 8 July 2023.
  2. "0Cx - z/OS MVS System Codes". IBM.
  3. List of ABEND codes Archived 2018-09-16 at the Wayback Machine on madisoncollege.edu
  4. Parziale, Lydia (2008). z/VM and Linux Operations for z/OS System Programmers. IBM Redbooks. ISBN   9780738431598. page 352
  5. "Abend" Archived September 29, 2011, at the Wayback Machine on dictionary.die.net
  6. Satvat, Kiavash; Saxena, Nitesh (2018). "Crashing Privacy: An Autopsy of a Web Browser's Leaked Crash Reports". arXiv: 1808.01718 [cs.CR].
  7. "Analyze Crashes to Find Security Vulnerabilities in Your Apps". Msdn.microsoft.com. 26 April 2007. Archived from the original on 11 December 2011. Retrieved 26 June 2014.
  8. "Jesse Ruderman » Memory safety bugs in C++ code". Squarefree.com. 1 November 2006. Archived from the original on 11 December 2013. Retrieved 26 June 2014.
  9. Chen, Ning; Kim, Sunghun (2015). "STAR: Stack Trace Based Automatic Crash Reproduction via Symbolic Execution". IEEE Transactions on Software Engineering. 41 (2): 198–220. doi:10.1109/TSE.2014.2363469. ISSN   0098-5589. S2CID   6299263.
  10. Soltani, Mozhan; Panichella, Annibale; van Deursen, Arie (2017). "A Guided Genetic Algorithm for Automated Crash Reproduction". 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). pp. 209–220. doi:10.1109/ICSE.2017.27. ISBN   978-1-5386-3868-2. S2CID   199514177. Archived from the original on 25 January 2022. Retrieved 21 December 2020.