Lazy FP state restore

Last updated

Lazy FPU state leak (CVE - 2018-3665), also referred to as Lazy FP State Restore [1] or LazyFP, [2] [3] is a security vulnerability affecting Intel Core CPUs. [1] [4] The vulnerability is caused by a combination of flaws in the speculative execution technology present within the affected CPUs [1] and how certain operating systems handle context switching on the floating point unit (FPU). [2] By exploiting this vulnerability, a local process can leak the content of the FPU registers that belong to another process. This vulnerability is related to the Spectre and Meltdown vulnerabilities that were publicly disclosed in January 2018.

Contents

It was announced by Intel on 13 June 2018, after being discovered by employees at Amazon, Cyberus Technology and SYSGO. [1] [lower-alpha 1]

Besides being used for floating point arithmetic, the FPU registers are also used for other purposes, including for storing cryptographic data when using the AES instruction set, present in many Intel CPUs. [3] This means that this vulnerability may allow for key material to be compromised. [3]

Mechanism

The floating point and SIMD registers are large, and not used by every task (or thread) in the system. To make context switching faster, most common microprocessors support lazy state switching. Rather than storing the full state during a context switch, the operating system can simply mark the FPU "not available" in the hopes that the switched-to task will not need it. If the operating system has guessed correctly, time is saved. If the guess is wrong, the first FPU or SIMD instruction will cause a trap to the operating system, which can then save the state to the previous task and load the correct state for the current task.

In out-of-order CPUs, the "FPU not available" condition is not detected immediately. (In fact, it almost cannot be detected immediately, as there may be multiple fault-causing instructions executing simultaneously and the processor must take the first fault encountered to preserve the illusion of in-order execution. The information about which is first is not available until the in-order retire stage.) The processor speculatively executes the instruction using the previous task's register contents, and some following instructions, and only later detects the FPU not available condition. Although all architectural state is reverted to the beginning of the faulting instruction, it is possible to use part of the FPU state as the address in a memory load, triggering a load into the processor's cache. Exploitation then follows the same pattern as all Spectre-family vulnerabilities: as the cache state is not architectural state (the cache only affect speed, not correctness), the cache load is not undone and the address, including part of the previous task's register state, can later be detected by measuring the time taken to access different memory addresses.

It is possible to exploit this bug without actually triggering any operating system traps. By placing the FPU access in the shadow of a forced branch misprediction (e.g. using a retpoline) the processor will still speculatively execute the code, but will rewind to the mispredicted branch and never actually execute the operating system trap. This allows the attack to be rapidly repeated, quickly reading out the entire FPU and SIMD register state.

Mitigation

It is possible to mitigate the vulnerability at the operating system and hypervisor levels by always restoring the FPU state when switching process contexts. [6] With such a fix, no firmware upgrade is required. Some operating systems already did not lazily restore the FPU registers by default, protecting those operating systems on affected hardware platforms, even if the underlying hardware issue existed. [6] On Linux operating system using kernel 3.7 or higher, it is possible to force the kernel to eagerly restore the FPU registers by using the eagerfpu=on kernel parameter. [3] Also, many system software vendors and projects, including Linux distributions, [7] OpenBSD, [8] and Xen [4] have released patches to address the vulnerability.

Notes

  1. The OpenBSD project claims to have discovered the vulnerability independently. [5]

See also

Related Research Articles

Central processing unit Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

i386 32-bit microprocessor by Intel

The Intel 386, originally released as 80386 and later renamed i386, is a 32-bit microprocessor introduced in 1985. The first versions had 275,000 transistors and were the CPU of many workstations and high-end personal computers of the time. As the original implementation of the 32-bit extension of the 80286 architecture, the i386 instruction set, programming model, and binary encodings are still the common denominator for all 32-bit x86 processors, which is termed the i386-architecture, x86, or IA-32, depending on context.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

Floating-point unit Part of a computer system

A floating-point unit is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be very low, so that some systems prefer to compute these functions in software.

The 88000 is a RISC instruction set architecture developed by Motorola during the 1980s. The MC88100 arrived on the market in 1988, some two years after the competing SPARC and MIPS. Due to the late start and extensive delays releasing the second-generation MC88110, the m88k achieved very limited success outside of the MVME platform and embedded controller environments. When Motorola joined the AIM alliance in 1991 to develop the PowerPC, further development of the 88000 ended.

Single instruction, multiple data Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

The Intel i860 is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems, and which many considered to be a better design. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of floating-point vector-operations using Vector registers, which improves the performance of many graphic-intensive applications. The first microprocessor to implement 3DNow was the AMD K6-2, which was introduced in 1998. When the application was appropriate, this raised the speed by about 2–4 times.

SSE2 is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

Coprocessor Type of computer processor

A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.

Emotion Engine

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

Microarchitecture Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

Meltdown (security vulnerability) Microprocessor security vulnerability

Meltdown is a hardware vulnerability affecting Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

Spectre (security vulnerability) Processor security vulnerability

Spectre is a subset of security vulnerabilities within the class of vulnerabilities known as microarchitectural timing side-channel attacks. These affect modern microprocessors that perform branch prediction and other forms of speculation. On most processors, the speculative execution resulting from a branch misprediction may leave observable side effects that may reveal private data to attackers. For example, if the pattern of memory accesses performed by such speculative execution depends on private data, the resulting state of the data cache constitutes a side channel through which an attacker may be able to extract information about the private data using a timing attack.

Microarchitectural Data Sampling CPU vulnerabilities

The Microarchitectural Data Sampling (MDS) vulnerabilities are a set of weaknesses in Intel x86 microprocessors that use hyper-threading, and leak data across protection boundaries that are architecturally supposed to be secure. The attacks exploiting the vulnerabilities have been labeled Fallout, RIDL, ZombieLoad., and ZombieLoad 2.

Transient execution CPU vulnerabilities are vulnerabilities in a computer system in which a speculative execution optimization implemented in a microprocessor is exploited to leak secret data to an unauthorized party. The classic example is Spectre that gave its name to this kind of side-channel attack, but since January 2018 many different vulnerabilities have been identified.

SWAPGS, also known as Spectre variant 1 (swapgs), is a computer security vulnerability that utilizes the branch prediction used in modern microprocessors. Most processors use a form of speculative execution, this feature allows the processors to make educated guesses about the instructions that will most likely need to be executed in the near future. This speculation can leave traces in the cache, which attackers use to extract data using a timing attack, similar to side-channel exploitation of Spectre.

References

  1. 1 2 3 4 "Lazy FP state restore". Intel. 2018-06-13. Retrieved 2018-06-18.
  2. 1 2 Stecklina, Julian; Prescher, Thomas (2018-06-19). "LazyFP: Leaking FPU Register State using Microarchitectural Side-Channels". arXiv: 1806.07480 [cs.OS].
  3. 1 2 3 4 Prescher, Thomas; Stecklina, Julian; Galowicz, Jacek. "Intel LazyFP vulnerability: Exploiting lazy FPU state switching". Cyberus Technology. Retrieved 2018-06-18.
  4. 1 2 "Xen Security Advisory CVE-2018-3665 / XSA-267, version 3". 2018-06-13. Retrieved 2018-06-18.
  5. de Raadt, Theo (2018-06-14). "Inflamation by Bryan Cantrill". openbsd-tech (Mailing list). Retrieved 2018-06-18 via marc.info.
  6. 1 2 "Lazy FPU Save/Restore (CVE-2018-3665)". RedHat. 2018-06-14. Retrieved 2018-06-18.
  7. "CVE-2018-3665". Debian . Retrieved 2018-06-17.
  8. "OpenBSD 6.3 Errata". OpenBSD . Retrieved 2018-06-18.