Supervisor Mode Access Prevention (SMAP) is a feature of some CPU implementations such as the Intel Broadwell microarchitecture that allows supervisor mode programs to optionally set user-space memory mappings so that access to those mappings from supervisor mode will cause a trap. This makes it harder for malicious programs to "trick" the kernel into using instructions or data from a user-space program. [1] [2]
Supervisor Mode Access Prevention is designed to complement Supervisor Mode Execution Prevention (SMEP), which was introduced earlier. SMEP can be used to prevent supervisor mode from unintentionally executing user-space code. SMAP extends this protection to reads and writes. [2]
Without Supervisor Mode Access Prevention, supervisor code usually has full read and write access to user-space memory mappings (or has the ability to obtain full access). This has led to the development of several security exploits, including privilege escalation exploits, which operate by causing the kernel to access user-space memory when it did not intend to. [3] Operating systems can block these exploits by using SMAP to force unintended user-space memory accesses to trigger page faults. Additionally, SMAP can expose flawed kernel code which does not follow the intended procedures for accessing user-space memory. [1]
However, the use of SMAP in an operating system may lead to a larger kernel size and slower user-space memory accesses from supervisor code, because SMAP must be temporarily disabled any time supervisor code intends to access user-space memory. [4]
Processors indicate support for Supervisor Mode Access Prevention through the Extended Features CPUID leaf.
SMAP is enabled when memory paging is active and the SMAP bit in the CR4 control register is set. SMAP can be temporarily disabled for explicit memory accesses by setting the EFLAGS.AC (Alignment Check) flag. The stac
(Set AC Flag) and clac
(Clear AC Flag) instructions can be used to easily set or clear the flag. [5]
When the SMAP bit in CR4 is set, explicit memory reads and writes to user-mode pages performed by code running with a privilege level less than 3 will always result in a page fault if the EFLAGS.AC flag is not set. Implicit reads and writes (such as those made to descriptor tables) to user-mode pages will always trigger a page fault if SMAP is enabled, regardless of the value of EFLAGS.AC. [5]
Linux kernel support for Supervisor Mode Access Prevention was implemented by H. Peter Anvin. [1] It was merged into the mainline Linux 3.7 kernel (released December 2012) and it is enabled by default for processors which support the feature. [4]
FreeBSD has supported Supervisor Mode Execution Prevention since 2012 [6] and Supervisor Mode Access Prevention since 2018. [7]
OpenBSD has supported Supervisor Mode Access Prevention and the related Supervisor Mode Execution Prevention since 2012, [8] with OpenBSD 5.3 being the first release with support for the feature enabled. [9]
NetBSD support for Supervisor Mode Execution Prevention (SMEP) was implemented by Maxime Villard in December 2015. [10] Support for Supervisor Mode Access Prevention (SMAP) was also implemented by Maxime Villard, in August 2017. [11] NetBSD 8.0 was the first release with both features supported and enabled. [12]
Haiku support for Supervisor Mode Execution Prevention (SMEP) was implemented by Jérôme Duval in January 2018. [13]
macOS has support for SMAP at least since macOS 10.13 released 2017. [14]
An operating system (OS) is system software that manages computer hardware and software resources, and provides common services for computer programs.
In computing, a system call is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services, creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.
x86-64 is a 64-bit version of the x86 instruction set, first announced in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.
In computing, Physical Address Extension (PAE), sometimes referred to as Page Address Extension, is a memory management feature for the x86 architecture. PAE was first introduced by Intel in the Pentium Pro, and later by AMD in the Athlon processor. It defines a page table hierarchy of three levels (instead of two), with table entries of 64 bits each instead of 32, allowing these CPUs to directly access a physical address space larger than 4 gigabytes (232 bytes).
In the 80386 microprocessor and later, virtual 8086 mode allows the execution of real mode applications that are incapable of running directly in protected mode while the processor is running a protected mode operating system. It is a hardware virtualization technique that allowed multiple 8086 processors to be emulated by the 386 chip. It emerged from the painful experiences with the 80286 protected mode, which by itself was not suitable to run concurrent real-mode applications well. John Crawford developed the Virtual Mode bit at the register set, paving the way to this environment.
Unified Extensible Firmware Interface is a specification that defines the architecture of the platform firmware used for booting the computer hardware and its interface for interaction with the operating system. Examples of firmware that implement the specification are AMI Aptio, Phoenix SecureCore, TianoCore EDK II, InsydeH2O. UEFI replaces the BIOS which was present in the boot ROM of all personal computers that are IBM PC compatible, although it can provide backwards compatibility with the BIOS using CSM booting. Intel developed the original Extensible Firmware Interface (EFI) specification. Some of the EFI's practices and data formats mirror those of Microsoft Windows. In 2005, UEFI deprecated EFI 1.10.
Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.
W^X is a security feature in operating systems and virtual machines. It is a memory protection policy whereby every page in a process's or kernel's address space may be either writable or executable, but not both. Without such protection, a program can write CPU instructions in an area of memory intended for data and then run those instructions. This can be dangerous if the writer of the memory is malicious. W^X is the Unix-like terminology for a strict use of the general concept of executable space protection, controlled via the mprotect
system call.
QEMU is a free and open-source emulator. It emulates a computer's processor through dynamic binary translation and provides a set of different hardware and device models for the machine, enabling it to run a variety of guest operating systems. It can interoperate with Kernel-based Virtual Machine (KVM) to run virtual machines at near-native speed. QEMU can also do emulation for user-level processes, allowing applications compiled for one architecture to run on another.
The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.
In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults and malicious behavior.
In computer security, executable-space protection marks memory regions as non-executable, such that an attempt to execute machine code in these regions will cause an exception. It makes use of hardware features such as the NX bit, or in some cases software emulation of those features. However, technologies that emulate or supply an NX bit will usually impose a measurable overhead while using a hardware-supplied NX bit imposes no measurable overhead.
Minix 3 is a small, Unix-like operating system. It is published under a BSD-3-Clause license and is a successor project to the earlier versions, Minix 1 and 2.
The Berkeley Packet Filter (BPF) is a technology used in certain computer operating systems for programs that need to, among other things, analyze network traffic. It provides a raw interface to data link layers, permitting raw link-layer packets to be sent and received. In addition, if the driver for the network interface supports promiscuous mode, it allows the interface to be put into that mode so that all packets on the network can be received, even those destined to other hosts.
The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. The kernel is also responsible for preventing and mitigating conflicts between different processes It is the portion of the operating system code that is always resident in memory and facilitates interactions between hardware and software components. A full kernel controls all hardware resources via device drivers, arbitrates conflicts between processes concerning such resources, and optimizes the utilization of common resources e.g. CPU & cache usage, file systems, and network sockets. On most systems, the kernel is one of the first programs loaded on startup. It handles the rest of startup as well as memory, peripherals, and input/output (I/O) requests from software, translating them into data-processing instructions for the central processing unit.
Mode setting is a software operation that activates a display mode for a computer's display controller by using VESA BIOS Extensions or UEFI Graphics extensions.
In computing, the term 3 GB barrier refers to a limitation of some 32-bit operating systems running on x86 microprocessors. It prevents the operating systems from using all of 4 GiB (4 × 10243 bytes) of main memory. The exact barrier varies by motherboard and I/O device configuration, particularly the size of video RAM; it may be in the range of 2.75 GB to 3.5 GB. The barrier is not present with a 64-bit processor and 64-bit operating system, or with certain x86 hardware and an operating system such as Linux or certain versions of Windows Server and macOS that allow use of Physical Address Extension (PAE) mode on x86 to access more than 4 GiB of RAM.
Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.
Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.
A virtual kernel architecture (vkernel) is an operating system virtualisation paradigm where kernel code can be compiled to run in the user space, for example, to ease debugging of various kernel-level components, in addition to general-purpose virtualisation and compartmentalisation of system resources. It is used by DragonFly BSD in its vkernel implementation since DragonFly 1.7, having been first revealed in September 2006, and first released in the stable branch with DragonFly 1.8 in January 2007. The long-term goal, in addition to easing kernel development, is to make it easier to support internet-connected computer clusters without compromising local security. Similar concepts exist in other operating systems as well; in Linux, a similar virtualisation concept is known as user-mode Linux; whereas in NetBSD since the summer of 2007, it has been the initial focus of the rump kernel infrastructure.