Linux kernel oops

Last updated
Linux kernel oops on SPARC Linux 2.4 oops sparc.png
Linux kernel oops on SPARC
Linux kernel oops on PA-RISC with a dead ASCII cow Linux-2.6-oops-parisc.jpg
Linux kernel oops on PA-RISC with a dead ASCII cow

In computing, an oops is a serious but non-fatal error in the Linux kernel. An oops may precede a kernel panic, but it may also allow continued operation with compromised reliability. The term does not stand for anything, other than that it is a simple mistake.

Contents

Functioning

When the kernel detects a problem, it kills any offending processes and prints an oops message, which Linux kernel engineers can use in debugging the condition that created the oops and fixing the underlying programming error. After a system has experienced an oops, some internal resources may no longer be operational. Thus, even if the system appears to work correctly, undesirable side effects may have resulted from the active task being killed. A kernel oops often leads to a kernel panic when the system attempts to use resources that have been lost. Some kernels are configured to panic when many oopses (10,000 by default) have occurred. [1] [2] This oops limit is due to the potential, for example, for attackers to repeatedly trigger an oops and an associated resource leak, which eventually overflows an integer and allows further exploitation. [3] [4]

The official Linux kernel documentation regarding oops messages resides in the file Documentation/admin-guide/bug-hunting.rst [5] of the kernel sources. Some logger configurations may affect the ability to collect oops messages. [6] The kerneloops software can collect and submit kernel oopses to a repository such as the www.kerneloops.org website, [7] which provides statistics and public access to reported oopses.

For a person not familiar with technical details of computers and operating systems, an oops message might look confusing. Unlike other operating systems such as Windows or macOS, Linux chooses to present details explaining the crash of the kernel rather than display a simplified, user-friendly message, such as the BSoD on Windows. A simplified crash screen has been proposed a few times, however currently none are in development. [8]

See also

Related Research Articles

<span class="mw-page-title-main">GNU Hurd</span> Operating system kernel designed as a replacement for Unix

GNU Hurd is a collection of microkernel servers written as part of GNU, for the GNU Mach microkernel. It has been under development since 1990 by the GNU Project of the Free Software Foundation, designed as a replacement for the Unix kernel, and released as free software under the GNU General Public License. When the Linux kernel proved to be a viable solution, development of GNU Hurd slowed, at times alternating between stasis and renewed activity and interest.

XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as of June 2014, XFS is supported by most Linux distributions; Red Hat Enterprise Linux uses it as its default file system.

ext3, or third extended filesystem, is a journaled file system that is commonly used by the Linux kernel. It used to be the default file system for many popular Linux distributions. Stephen Tweedie first revealed that he was working on extending ext2 in Journaling the Linux ext2fs Filesystem in a 1998 paper, and later in a February 1999 kernel mailing list posting. The filesystem was merged with the mainline Linux kernel in November 2001 from 2.4.15 onward. Its main advantage over ext2 is journaling, which improves reliability and eliminates the need to check the file system after an unclean shutdown. Its successor is ext4.

<span class="mw-page-title-main">Kernel panic</span> Fatal error condition associated with Unix-like computer operating systems

A kernel panic is a safety measure taken by an operating system's kernel upon detecting an internal fatal error in which either it is unable to safely recover or continuing to run the system would have a higher risk of major data loss. The term is largely specific to Unix and Unix-like systems. The equivalent on Microsoft Windows operating systems is a stop error, often called a "blue screen of death".

Hexspeak, like leetspeak, is a novelty form of variant English spelling using the hexadecimal digits. Created by programmers as memorable magic numbers, hexspeak words can serve as a clear and unique identifier with which to mark memory or data.

The Linux kernel mailing list (LKML) is the main electronic mailing list for Linux kernel development, where the majority of the announcements, discussions, debates, and flame wars over the kernel take place. Many other mailing lists exist to discuss the different subsystems and ports of the Linux kernel, but LKML is the principal communication channel among Linux kernel developers. It is a very high-volume list, usually receiving about 1,000 messages each day, most of which are kernel code patches.

<span class="mw-page-title-main">Fatal system error</span> Error that stops the operating system

A fatal system error occurs when an operating system halts because it has reached a condition where it can no longer operate safely.

The magic SysRq key is a key combination understood by the Linux kernel, which allows the user to perform various low-level commands regardless of the system's state. It is often used to recover from freezes, or to reboot a computer without corrupting the filesystem. Its effect is similar to the computer's hardware reset button but with many more options and much more control.

seccomp is a computer security facility in the Linux kernel. seccomp allows a process to make a one-way transition into a "secure" state where it cannot make any system calls except exit , sigreturn , read and write to already-open file descriptors. Should it attempt any other system calls, the kernel will either just log the event or terminate the process with SIGKILL or SIGSYS. In this sense, it does not virtualize the system's resources but isolates the process from them entirely.

<span class="mw-page-title-main">Crash reporter</span> System software that identify and report crash details

A crash reporter is usually a system software whose function is to identify reporting crash details and to alert when there are crashes, in production or on development / testing environments. Crash reports often include data such as stack traces, type of crash, trends and version of software. These reports help software developers- Web, SAAS, mobile apps and more, to diagnose and fix the underlying problem causing the crashes. Crash reports may contain sensitive information such as passwords, email addresses, and contact information, and so have become objects of interest for researchers in the field of computer security.

Configuration Menu Language (CML) was used, in Linux kernel versions prior to 2.5.45, to configure the values that determine the composition and exact functionality of the kernel. Many possible variations in kernel functionality can exist; and customization is possible, for instance for the specifications of the exact hardware it will run on. It can also be tuned for administrator preferences.

A machine check exception (MCE) is a type of computer error that occurs when a problem involving the computer's hardware is detected. With most mass-market personal computers, an MCE indicates faulty or misconfigured hardware.

kernel.org is the main distribution point of source code for the Linux kernel, which is the base of the Linux operating system.

<span class="mw-page-title-main">Blue screen of death</span> Error screen displayed after a fatal system error on a computer running Microsoft Windows or ReactOS

The Blue Screen of Death (BSoD), officially known as a Stop error, Blue screen error, fatal error, bugcheck, Stop error screen, Stop message, or Blue Screen, is a critical error screen displayed by the Microsoft Windows or ReactOS operating systems in the event of a fatal system error. It indicates a system crash, in which the operating system has reached a critical condition where it can no longer operate safely. Possible issues include hardware failure, an issue with or without a device driver, or unexpected termination of a crucial process or thread.

<span class="mw-page-title-main">Linux kernel</span> Operating system kernel

The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU operating system, which was written to be a free (libre) replacement for Unix.

nftables is a subsystem of the Linux kernel providing filtering and classification of network packets/datagrams/frames. It has been available since Linux kernel 3.13 released on 19 January 2014.

kdump is a feature of the Linux kernel that creates crash dumps in the event of a kernel crash. When triggered, kdump exports a memory image that can be analyzed for the purposes of debugging and determining the cause of a crash. The dumped image of main memory, exported as an Executable and Linkable Format (ELF) object, can be accessed either directly through /proc/vmcore during the handling of a kernel crash, or it can be automatically saved to a locally accessible file system, to a raw device, or to a remote system accessible over network.

RPMsg is a protocol enabling inter-processor communication inside multi-core processors.

Bcachefs is a copy-on-write (COW) file system for Linux-based operating systems. Its primary developer, Kent Overstreet, first announced it in 2015, and efforts are ongoing to have it included in the mainline Linux kernel. It is intended to compete with the modern features of ZFS or Btrfs, and the speed and performance of ext4 or XFS.

This article documents the version history of the Linux kernel. The Linux kernel is a free and open-source, monolithic, Unix-like operating system kernel. It was conceived and created in 1991 by Linus Torvalds.

References

  1. Horn, Jann (7 November 2022). "[PATCH] exit: Put an upper limit on how often we can oops". lore.kernel.org. Retrieved 31 January 2023.
  2. "Documentation for /proc/sys/kernel/". docs.kernel.org. Retrieved 31 January 2023.
  3. Corbet, Jonathan (18 November 2022). "Averting excessive oopses". LWN.net .
  4. Jenkins, Seth (19 January 2023). "Exploiting null-dereferences in the Linux kernel". Google Project Zero.
  5. "bug-hunting". kernel.org.
  6. "DevDocs/KernelOops". madwifi-project.org. Archived from the original on 2020-08-03. Retrieved 2010-08-21.
  7. "kerneloops(8) - Linux man page" . Retrieved 31 January 2023.
  8. Larabel, Michael (10 March 2019). "A DRM-Based Linux Oops Viewer Is Being Proposed Again - Similar To Blue Screen of Death". Phoronix.

Further reading