Kernel page-table isolation

Last updated
One set of page table for use in kernel mode includes both kernel-space and user-space. The second set of page table for use in user mode contains a copy of user-space and a minimal set of kernel-space handling system calls and interrupts. Kernel page-table isolation.svg
One set of page table for use in kernel mode includes both kernel-space and user-space. The second set of page table for use in user mode contains a copy of user-space and a minimal set of kernel-space handling system calls and interrupts.

Kernel page-table isolation (KPTI or PTI, [1] previously called KAISER) [2] [3] is a Linux kernel feature that mitigates the Meltdown security vulnerability (affecting mainly Intel's x86 CPUs) [4] and improves kernel hardening against attempts to bypass kernel address space layout randomization (KASLR). It works by better isolating user space and kernel space memory. [5] [6] KPTI was merged into Linux kernel version 4.15, [7] and backported to Linux kernels 4.14.11, 4.9.75, and 4.4.110. [8] [9] [10] Windows [11] and macOS [12] released similar updates. KPTI does not address the related Spectre vulnerability. [13]

Contents

Background on KAISER

The KPTI patches were based on KAISER (short for Kernel Address Isolation to have Side-channels Efficiently Removed), [6] a technique conceived in 2016 [14] and published in June 2017 back when Meltdown was not known yet. KAISER makes it harder to defeat KASLR, a 2014 mitigation for a much less severe issue.

In 2014, the Linux kernel adopted kernel address space layout randomization (KASLR), [15] which makes it more difficult to exploit other kernel vulnerabilities, [16] which relies on kernel address mappings remaining hidden from user space. [17] Despite prohibiting access to these kernel mappings, it turns out that there are several side-channel attacks in modern processors that can leak the location of this memory, making it possible to work around KASLR. [6] [18] [19] [20]

KAISER addressed these problems in KASLR by eliminating some sources of address leakage. [6] Whereas KASLR merely prevents address mappings from leaking, KAISER also prevents the data from leaking, thereby covering the Meltdown case. [21]

KPTI is based on KAISER. Without KPTI enabled, whenever executing user-space code (applications), Linux would also keep its entire kernel memory mapped in page tables, although protected from access. The advantage is that when the application makes a system call into the kernel or an interrupt is received, kernel page tables are always present, so most context switching-related overheads (TLB flush, page-table swapping, etc) can be avoided. [5]

Meltdown vulnerability and KPTI

In January 2018, the Meltdown vulnerability was published, known to affect Intel's x86 CPUs and ARM Cortex-A75. [22] [23] It was a far more severe vulnerability than the KASLR bypass that KAISER originally intended to fix: It was found that contents of kernel memory could also be leaked, not just the locations of memory mappings, as previously thought.

KPTI (conceptually based on KAISER) prevents Meltdown by preventing most protected locations from being mapped to user space.

AMD x86 processors are not currently known to be affected by Meltdown and don't need KPTI to mitigate them. [13] [24] However, AMD processors are still susceptible to KASLR bypass when KPTI is disabled. [20]

Implementation

KPTI fixes these leaks by separating user-space and kernel-space page tables entirely. One set of page tables includes both kernel-space and user-space addresses same as before, but it is only used when the system is running in kernel mode. The second set of page tables for use in user mode contains a copy of user-space and a minimal set of kernel-space mappings that provides the information needed to enter or exit system calls, interrupts and exceptions. [5]

On processors that support the process-context identifiers (PCID), a translation lookaside buffer (TLB) flush can be avoided, [5] but even then it comes at a significant performance cost, particularly in syscall-heavy and interrupt-heavy workloads. [25]

The overhead was measured to be 0.28% according to KAISER's original authors; [6] a Linux developer measured it to be roughly 5% for most workloads and up to 30% in some cases, even with the PCID optimization; [5] for database engine PostgreSQL the impact on read-only tests on an Intel Skylake processor was 7–17% (or 16–23% without PCID), [26] while a full benchmark lost 13–19% (Coffee Lake vs. Broadwell-E). [27] Many benchmarks have been done by Phoronix, [28] [29] [1] Redis slowed by 6–7%. [27] Linux kernel compilation slowed down by 5% on Haswell. [30]

KPTI can partially be disabled with the "nopti" kernel boot option. Also provisions were created to disable KPTI if newer processors fix the information leaks. [2]

Related Research Articles

x86-64 Type of instruction set which is a 64-bit version of the x86 instruction set

x86-64 is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.

A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. It is a part of the chip's memory-management unit (MMU). A TLB may reside between the CPU and the CPU cache, between CPU cache and the main memory or between the different levels of the multi-level cache. The majority of desktop, laptop, and server processors include one or more TLBs in the memory-management hardware, and it is nearly always present in any processor that utilizes paged or segmented virtual memory.

Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited function in memory, ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap and libraries.

The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.

<span class="mw-page-title-main">Free and open-source graphics device driver</span> Software that controls computer-graphics hardware

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

Unified Video Decoder is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.

AMD PowerPlay is the brand name for a set of technologies for the reduction of the energy consumption implemented in several of AMD's graphics processing units and APUs supported by their proprietary graphics device driver "Catalyst". AMD PowerPlay is also implemented into ATI/AMD chipsets which integrated graphics and into AMD's Imageon handheld chipset, that was sold to Qualcomm in 2008.

Supervisor Mode Access Prevention (SMAP) is a feature of some CPU implementations such as the Intel Broadwell microarchitecture that allows supervisor mode programs to optionally set user-space memory mappings so that access to those mappings from supervisor mode will cause a trap. This makes it harder for malicious programs to "trick" the kernel into using instructions or data from a user-space program.

Intel MPX was a set of extensions to the x86 instruction set architecture. With compiler, runtime library and operating system support, Intel MPX claimed to enhance security to software by checking pointer references whose normal compile-time intentions are maliciously exploited at runtime due to buffer overflows. In practice, there have been too many flaws discovered in the design for it to be useful, and support has been deprecated or removed from most compilers and operating systems. Intel has listed MPX as removed in 2019 and onward hardware in section 2.5 of its Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1.

<span class="mw-page-title-main">AMD PowerTune</span> Brand name by AMD

AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed by software. This allows the processor to meet the instantaneous performance needs of the operation being performed, while minimizing power draw, heat generation and noise avoidance. AMD PowerTune aims to solve thermal design power and performance constraints.

<span class="mw-page-title-main">Ryzen</span> AMD brand for microprocessors

Ryzen is a brand of multi-core x86-64 microprocessors designed and marketed by AMD for desktop, mobile, server, and embedded platforms based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mainstream, enthusiast, server, and workstation segments and accelerated processing units (APUs) marketed for mainstream and entry-level segments and embedded systems applications.

<span class="mw-page-title-main">Meltdown (security vulnerability)</span> Microprocessor security vulnerability

Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

<span class="mw-page-title-main">Spectre (security vulnerability)</span> Processor security vulnerability

Spectre refers to one of the two original transient execution CPU vulnerabilities, which involve microarchitectural timing side-channel attacks. These affect modern microprocessors that perform branch prediction and other forms of speculation. On most processors, the speculative execution resulting from a branch misprediction may leave observable side effects that may reveal private data to attackers. For example, if the pattern of memory accesses performed by such speculative execution depends on private data, the resulting state of the data cache constitutes a side channel through which an attacker may be able to extract information about the private data using a timing attack.

Speculative Store Bypass (SSB) is the name given to a hardware security vulnerability and its exploitation that takes advantage of speculative execution in a similar way to the Meltdown and Spectre security vulnerabilities. It affects the ARM, AMD and Intel families of processors. It was discovered by researchers at Microsoft Security Response Center and Google Project Zero (GPZ). After being leaked on 3 May 2018 as part of a group of eight additional Spectre-class flaws provisionally named Spectre-NG, it was first disclosed to the public as "Variant 4" on 21 May 2018, alongside a related speculative execution vulnerability designated "Variant 3a".

Lazy FPU state leak, also referred to as Lazy FP State Restore or LazyFP, is a security vulnerability affecting Intel Core CPUs. The vulnerability is caused by a combination of flaws in the speculative execution technology present within the affected CPUs and how certain operating systems handle context switching on the floating point unit (FPU). By exploiting this vulnerability, a local process can leak the content of the FPU registers that belong to another process. This vulnerability is related to the Spectre and Meltdown vulnerabilities that were publicly disclosed in January 2018.

<span class="mw-page-title-main">Foreshadow</span> Hardware vulnerability for Intel processors

Foreshadow, known as L1 Terminal Fault (L1TF) by Intel, is a vulnerability that affects modern microprocessors that was first discovered by two independent teams of researchers in January 2018, but was first disclosed to the public on 14 August 2018. The vulnerability is a speculative execution attack on Intel processors that may result in the disclosure of sensitive information stored in personal computers and third-party clouds. There are two versions: the first version (original/Foreshadow) targets data from SGX enclaves; and the second version (next-generation/Foreshadow-NG) targets virtual machines (VMs), hypervisors (VMM), operating systems (OS) kernel memory, and System Management Mode (SMM) memory. A listing of affected Intel hardware has been posted.

In digital computing, hardware security bugs are hardware bugs or flaws that create vulnerabilities affecting computer central processing units (CPUs), or other devices which incorporate programmable processors or logic and have direct memory access, which allow data to be read by a rogue process when such reading is not authorized. Such vulnerabilities are considered "catastrophic" by security analysts.

<span class="mw-page-title-main">Microarchitectural Data Sampling</span> CPU vulnerabilities

The Microarchitectural Data Sampling (MDS) vulnerabilities are a set of weaknesses in Intel x86 microprocessors that use hyper-threading, and leak data across protection boundaries that are architecturally supposed to be secure. The attacks exploiting the vulnerabilities have been labeled Fallout, RIDL, ZombieLoad., and ZombieLoad 2.

Transient execution CPU vulnerabilities are vulnerabilities in a computer system in which a speculative execution optimization implemented in a microprocessor is exploited to leak secret data to an unauthorized party. The classic example is Spectre that gave its name to this kind of side-channel attack, but since January 2018 many different vulnerabilities have been identified.

Retbleed is a speculative execution attack on x86-64 and ARM processors, including some recent Intel and AMD chips. First made public in 2022, it is a variant of the Spectre vulnerability which exploits retpoline, which was a mitigation for speculative execution attacks.

References

  1. 1 2 Larabel, Michael (2018-01-03). "Further Analyzing The Intel CPU "x86 PTI Issue" On More Systems". Phoronix.
  2. 1 2 Corbet, Jonathan (2017-12-20). "The current state of kernel page-table isolation". LWN.net.
  3. Cimpanu, Catalin (2018-01-03). "OS Makers Preparing Patches for Secret Intel CPU Security Bug". Bleeping Computer.
  4. "Spectre, Meltdown: Critical CPU Security Flaws Explained – ExtremeTech". ExtremeTech. 2018-01-04. Retrieved 2018-01-05.
  5. 1 2 3 4 5 Corbet, Jonathan (2017-11-15). "KAISER: hiding the kernel from user space". LWN.net.
  6. 1 2 3 4 5 Gruss, Daniel; Lipp, Moritz; Schwarz, Michael; Fellner, Richard; Maurice, Clémentine; Mangard, Stefan (2017-06-24). KASLR is Dead: Long Live KASLR (PDF). Engineering Secure Software and Systems 2017.
  7. Corbet, Jonathan (2017-12-20). "Kernel page-table isolation merged". LWN.net.
  8. Kroah-Hartman, Greg (2018-01-02). "Linux 4.14.11 Changelog". kernel.org.
  9. Kroah-Hartman, Greg (2018-01-05). "Linux 4.9.75 Changelog". kernel.org.
  10. Kroah-Hartman, Greg (2018-01-05). "Linux 4.4.110 Changelog".
  11. @aionescu (2017-11-14). "Windows 17035 Kernel ASLR/VA Isolation In Practice" (Tweet) via Twitter.
  12. "Apple has already partially implemented fix in macOS for 'KPTI' Intel CPU security flaw". AppleInsider. 3 January 2018. Retrieved 2018-01-03.
  13. 1 2 Coldewey, Devin (2018-01-04). "Kernel panic! What are Meltdown and Spectre, the bugs affecting nearly every computer and device?". TechCrunch.
  14. Gruss, Daniel (2018-01-03). "#FunFact: We submitted #KAISER to #bhusa17 and got it rejected". Archived from the original on 2018-01-08. Retrieved 2018-01-08 via Twitter.
  15. "Linux kernel 3.14, Section 1.7. Kernel address space randomization". kernelnewbies.org. 2014-03-30. Retrieved 2014-04-02.
  16. Bhattacharjee, Abhishek; Lustig, Daniel (2017-09-29). Architectural and Operating System Support for Virtual Memory. Morgan & Claypool Publishers. p. 56. ISBN   978-1-62705-933-6.
  17. Kerner, Sean Michael (2018-01-03). "KPTI Intel Chip Flaw Exposes Security Risks". eWEEK.
  18. Jang, Yeongjin; Lee, Sangho; Kim, Taesoo (2016). "Breaking Kernel Address Space Layout Randomization with Intel TSX" (PDF). Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. CCS '16. New York, NY, USA: ACM. pp. 380–392. doi: 10.1145/2976749.2978321 . ISBN   978-1-4503-4139-4.
  19. Gruss, Daniel; Maurice, Clémentine; Fogh, Anders; Lipp, Moritz; Mangard, Stefan (2016). "Prefetch Side-Channel Attacks" (PDF). Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. CCS '16. New York, NY, USA: ACM. pp. 368–379. doi:10.1145/2976749.2978356. ISBN   978-1-4503-4139-4. S2CID   15973158.
  20. 1 2 Hund, R.; Willems, C.; Holz, T. (May 2013). "Practical Timing Side Channel Attacks against Kernel Space ASLR" (PDF). 2013 IEEE Symposium on Security and Privacy. pp. 191–205. doi:10.1109/sp.2013.23. ISBN   978-0-7695-4977-4. S2CID   215754624.
  21. "Meltdown" (PDF).
  22. "Spectre, Meltdown: Critical CPU Security Flaws Explained – ExtremeTech". ExtremeTech. 2018-01-04. Retrieved 2018-01-05.
  23. Coldewey, Devin (2018-01-04). "Kernel panic! What are Meltdown and Spectre, the bugs affecting nearly every computer and device?". TechCrunch.
  24. "An Update on AMD Processor Security". AMD. 2018-01-04.
  25. Leyden, John; Williams, Chris (2018-01-02). "Kernel-memory-leaking Intel processor design flaw forces Linux, Windows redesign". The Register .
  26. Freund, Andres (2018-01-02). "heads up: Fix for intel hardware bug will lead to performance regressions". PostgreSQL development mailing list (pgsql-hackers).
  27. 1 2 Larabel, Michael (2018-01-02). "Initial Benchmarks Of The Performance Impact Resulting From Linux's x86 Security Changes". Phoronix .
  28. Larabel, Michael (2018-01-02). "Linux Gaming Performance Doesn't Appear Affected By The x86 PTI Work". Phoronix.
  29. Larabel, Michael (2018-01-03). "VM Performance Showing Mixed Impact With Linux 4.15 KPTI Patches – Phoronix". Phoronix.
  30. Velvindron, Loganaden (2018-01-04). "Linux KPTI performance hit on real workloads". Loganaden Velvindron. Retrieved 2018-01-05.