Pacman (security vulnerability)

Last updated
Pacman
PACMAN vulnerability logo.png
Date discoveredPublicly disclosed June 10, 2022;21 months ago (2022-06-10)
Date patchedUnable to be patched
Affected hardwareApple M1 processors
Website pacmanattack.com

Pacman [lower-alpha 1] is a side-channel vulnerability in certain ARM CPUs that was made public by Massachusetts Institute of Technology security researchers on June 10, 2021. It affects the pointer authentication (PAC) mechanism in many ARMv8.3 chips, including Apple's M1 CPU. [1] Pacman creates an 'oracle' that lets an attacker guess a pointer's PAC signature without crashing the program if the guess is wrong. PAC signatures are typically less than 16 bits wide, so an attacker can use the oracle to guess the signature in 216 tries or fewer. It is unfixable without hardware changes because it is caused by the inherent design of CPU caches and branch predictors.

Contents

Impact and response

Pacman alone is not an exploitable vulnerability. PAC is a 'last line of defense' [2] that detects when software running on the CPU is being exploited by a memory corruption attack and reacts by crashing the software before the attacker completes their exploit. Apple stated that they did not believe the vulnerability posed a serious threat to users because it requires specific conditions to be exploited. [3]

Background

Pacman is similar to Spectre, abusing two key CPU optimizations to create a PAC oracle: branch prediction and memory caching. [4]

PAC (Pointer Authentication Codes)

PAC is a security feature in ARMv8.3-based computer processors that mitigates against return-oriented programming by adding a cryptographic signature to the upper bits of pointers. Compilers emit PAC 'sign' instructions before storing pointers to memory, and 'verify' instructions after loading pointers from memory. If an attacker tampers with the pointer, the signature becomes invalid and the program crashes when the pointer is next accessed. [5] PAC signatures are not cryptographically secure because they need to be small enough to fit into the unused upper bits of pointers. Therefore, if an attacker can reliably test whether a guessed signature is correct without crashing the program, they can brute-force the correct signature. [6]

Branch prediction

Modern CPUs employ branch prediction to reduce the number of pipeline stalls caused by conditional branches. Branch prediction uses heuristics to guess the direction of a conditional branch and begin executing the predicted path – while the condition is still being evaluated. Instructions executed during this period are 'speculative', and the CPU holds their results in the re-order buffer (ROB) without writing them back to memory. Once the CPU finishes evaluating the condition and determines that its initial prediction was correct, it 'retires' the instructions in the ROB by writing their changes back to memory and propagating any exceptions produced. If the speculation was incorrect, the CPU flushes the ROB and resumes execution at the correct location. [7]

Memory caching

A simplified schematic of a set-associative cache Set-associative cache.png
A simplified schematic of a set-associative cache

CPU caches accelerate memory accesses by caching frequently accessed memory on the CPU die. This lowers the cost of memory accesses from hundreds of cycles to fewer than 10, by reducing the amount of time spent communicating with the physically separate northbridge and RAM chip. When an uncached address is loaded, the CPU immediately stashes the loaded data into the cache, evicting another entry if the cache is full. These changes are not held in the ROB because the presence or absence of an address in the cache is considered 'unobservable', so stashes and evictions that occur during speculative execution persist after the ROB has been flushed, even if that path was not ultimately taken. [8]

Mechanism

Principle

Pacman tricks the CPU into checking the validity of a guessed PAC signature within a mispredicted branch so that exceptions produced by potentially incorrect guesses are discarded during the ROB flush. [9] If the guess was incorrect, the exception thrown during speculative execution forces the CPU to stall, preventing further instructions from being speculatively executed. A Pacman gadget is a sequence of instructions of the following form:

if(condition):ptr=verify(attacker_tamperable_pointer)load(ptr)

Sequences of this form are common and can be found in most compiled programs supporting PAC. [10] When the CPU mispredicts the condition, it begins speculatively executing the PAC verification instruction. If the attacker's guess was correct, the verification instruction succeeds and the CPU proceeds to load the address from memory; if the guess was incorrect, the verification instruction throws an exception. This exception is held in the ROB and then discarded once the CPU finds the condition to be false. The attacker then uses a hardware side-channel to determine whether the load instruction was executed, therefore determining whether their guessed signature was correct. [11]

Attack

Ravichandran et al. demonstrate that the cache-based Prime and Probe technique can be used to determine whether the load instruction executed. [11] The attacker determines if the load instruction in a Pacman gadget was executed by filling the cache with data, calling the gadget, and checking the latency of accessing the previously loaded addresses. If one of the addresses takes longer than before, it was evicted by the gadget and the attacker knows that their guess was correct. The attacker may then use this forged pointer elsewhere in the program to hijack it.

1. Train

The attacker calls the Pacman gadget many times with condition = true. The branch predictor is now trained to guess that the condition is true on subsequent calls. During this period, attacker_tamperable_pointer is its original value with a valid PAC signature.

2. Prime

The attacker fills the L1 cache by loading from addresses they control. The contents of these memory locations does not matter – the attacker just needs to be able to precisely measure their access latency. [12]

Initial L1 cache
Set 0
AddressValue
......
......
......
...
Set 0x1FFF
AddressValue
......
......
......
Primed L1 cache
Set 0
AddressValue
0xab30000AAAAAAAAA
0x1230000AAAAAAAAA
0x1920000AAAAAAAAA
...
Set 0x1FFF
AddressValue
0x2001FFFAAAAAAAAA
0x6a21FFFAAAAAAAAA
0x09b1FFFAAAAAAAAA

3. Evict

The attacker overwrites attacker_tamperable_pointer with their target pointer and guess for the target pointer's PAC signature. They then call the Pacman gadget with condition = false, causing the branch to be mispredicted. The branch predictor will speculatively execute the contents of the if statement, before eventually flushing the pipeline and rolling back. [13]

During this speculative execution, two things can occur:

  1. The speculative execution proceeds to the load() instruction. This means that the verify() instruction did not fault, implying the guessed signature was correct. The load() instruction will then load the target pointer into cache, evicting an address in the attacker's eviction set.
  2. Speculative execution faults on the verify instruction, preventing execution of the load(). This implies the guessed signature was wrong. Since this was speculatively executed within a mispredicted branch, the fault is not propagated to the program.
L1 Cache after running gadget if guess was correct
Set 0
AddressValue
0xab30000AAAAAAAAA
0x1230000AAAAAAAAA
0x1920000AAAAAAAAA
...
Set 0x1FFF
AddressValue
pacman address my data
0x6a21FFFAAAAAAAAA
0x09b1FFFAAAAAAAAA
L1 Cache after running gadget if guess was wrong
Set 0
AddressValue
0xab30000AAAAAAAAA
0x1230000AAAAAAAAA
0x1920000AAAAAAAAA
...
Set 0x1FFF
AddressValue
0x2001FFFAAAAAAAAA
0x6a21FFFAAAAAAAAA
0x09b1FFFAAAAAAAAA

4. Probe

The attacker measures the access time for each element in their eviction set. If one of the elements was evicted (i.e., the access is slow) then the guess was correct. If none of the elements were evicted (i.e., all accesses are fast) then the guess was wrong. This process can be repeated with different guesses until the correct signature is found. [11]

Notes

  1. Stylized PACMAN.

Related Research Articles

ARM is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Ltd. develops the ISAs and licenses them to other companies, who build the physical devices that use the instruction set. It also designs and licenses cores that implement these ISAs.

Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing the work after it is known that it is needed. If it turns out the work was not needed after all, most changes made by the work are reverted and the results are ignored.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

<span class="mw-page-title-main">Branch predictor</span> Digital circuit

In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving high performance in many modern pipelined microprocessor architectures.

In cryptography, a timing attack is a side-channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. Every logical operation in a computer takes time to execute, and the time can differ based on the input; with precise measurements of the time for each operation, an attacker can work backwards to the input. Finding secrets through timing information may be significantly easier than using cryptanalysis of known plaintext, ciphertext pairs. Sometimes timing information is combined with cryptanalysis to increase the rate of information leakage.

Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. Branch may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

Memory disambiguation is a set of techniques employed by high-performance out-of-order execution microprocessors that execute memory access instructions out of program order. The mechanisms for performing memory disambiguation, implemented using digital logic inside the microprocessor core, detect true dependencies between memory operations at execution time and allow the processor to recover when a dependence has been violated. They also eliminate spurious memory dependencies and allow for greater instruction-level parallelism by allowing safe out-of-order execution of loads and stores.

Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.

Runahead is a technique that allows a computer processor to speculatively pre-process instructions during cache miss cycles. The pre-processed instructions are used to generate instruction and data stream prefetches by executing instructions leading to cache misses before they would normally occur, effectively hiding memory latency. In runahead, the processor uses the idle execution resources to calculate instruction and data stream addresses using the available information that is independent of a cache miss. Once the processor has resolved the initial cache miss, all runahead results are discarded, and the processor resumes execution as normal. The primary use case of the technique is to mitigate the effects of the memory wall. The technique may also be used for other purposes, such as pre-computing branch outcomes to achieve highly accurate branch prediction.

Return-oriented programming (ROP) is a computer security exploit technique that allows an attacker to execute code in the presence of security defenses such as executable space protection and code signing.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture, and had many extension updates.

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most central processing units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on.

<span class="mw-page-title-main">Meltdown (security vulnerability)</span> Microprocessor security vulnerability

Meltdown is one of the two original transient execution CPU vulnerabilities. Meltdown affects Intel x86 microprocessors, IBM POWER processors, and some ARM-based microprocessors. It allows a rogue process to read all memory, even when it is not authorized to do so.

Lazy FPU state leak, also referred to as Lazy FP State Restore or LazyFP, is a security vulnerability affecting Intel Core CPUs. The vulnerability is caused by a combination of flaws in the speculative execution technology present within the affected CPUs and how certain operating systems handle context switching on the floating point unit (FPU). By exploiting this vulnerability, a local process can leak the content of the FPU registers that belong to another process. This vulnerability is related to the Spectre and Meltdown vulnerabilities that were publicly disclosed in January 2018.

Transient execution CPU vulnerabilities are vulnerabilities in a computer system in which a speculative execution optimization implemented in a microprocessor is exploited to leak secret data to an unauthorized party. The archetype is Spectre, and transient execution attacks like Spectre belong to the cache-attack category, one of several categories of side-channel attacks. Since January 2018 many different cache-attack vulnerabilities have been identified.

The ARM Cortex-A77 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM announced an increase of 23% and 35% in integer and floating point performance, respectively. Memory bandwidth increased 15% relative to the A76.

SWAPGS, also known as Spectre variant 1, is a computer security vulnerability that utilizes the branch prediction used in modern microprocessors. Most processors use a form of speculative execution, this feature allows the processors to make educated guesses about the instructions that will most likely need to be executed in the near future. This speculation can leave traces in the cache, which attackers use to extract data using a timing attack, similar to side-channel exploitation of Spectre.

References

See also