Transactional Synchronization Extensions

Last updated

Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speeding up execution of multi-threaded software through lock elision. According to different benchmarks, TSX/TSX-NI can provide around 40% faster applications execution in specific workloads, and 45 times more database transactions per second (TPS). [1] [2] [3] [4]

Contents

TSX/TSX-NI was documented by Intel in February 2012, and debuted in June 2013 on selected Intel microprocessors based on the Haswell microarchitecture. [5] [6] [7] Haswell processors below 45xx as well as R-series and K-series (with unlocked multiplier) SKUs do not support TSX/TSX-NI. [8] In August 2014, Intel announced a bug in the TSX/TSX-NI implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update. [9] [10]

In 2016, a side-channel timing attack was found by abusing the way TSX/TSX-NI handles transactional faults (i.e. page faults) in order to break kernel address space layout randomization (KASLR) on all major operating systems. [11] In 2021, Intel released a microcode update that disabled the TSX/TSX-NI feature on CPU generations from Skylake to Coffee Lake, as a mitigation for discovered security issues. [12]

Support for TSX/TSX-NI emulation is provided as part of the Intel Software Development Emulator. [13] There is also experimental support for TSX/TSX-NI emulation in a QEMU fork. [14]

Features

TSX/TSX-NI provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX/TSX-NI support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers. [15]

TSX/TSX-NI enables optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions. [15]

In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock.

Hardware Lock Elision

Hardware Lock Elision (HLE) adds two new instruction prefixes, XACQUIRE and XRELEASE. These two prefixes reuse the opcodes of the existing REPNE / REPE prefixes (F2H / F3H). On processors that do not support HLE, REPNE / REPE prefixes are ignored on instructions for which the XACQUIRE / XRELEASE are valid, thus enabling backward compatibility. [16]

The XACQUIRE prefix hint can only be used with the following instructions with an explicit LOCK prefix: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCHG8B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. The XCHG instruction can be used without the LOCK prefix as well.

The XRELEASE prefix hint can be used both with the instructions listed above, and with the MOV mem, reg and MOV mem, imm instructions.

HLE allows optimistic execution of a critical section by skipping the write to a lock, so that the lock appears to be free to other threads. A failed transaction results in execution restarting from the XACQUIRE-prefixed instruction, but treating the instruction as if the XACQUIRE prefix were not present.

Restricted Transactional Memory

Restricted Transactional Memory (RTM) is an alternative implementation to HLE which gives the programmer the flexibility to specify a fallback code path that is executed when a transaction cannot be successfully executed. Unlike HLE, RTM is not backward compatible with processors that do not support it. For backward compatibility, programs are required to detect support for RTM in the CPU before using the new instructions.

RTM adds three new instructions: XBEGIN, XEND and XABORT. The XBEGIN and XEND instructions mark the start and the end of a transactional code region; the XABORT instruction explicitly aborts a transaction. Transaction failure redirects the processor to the fallback code path specified by the XBEGIN instruction, with the abort status returned in the EAX register.

EAX register
bit position
Meaning
0Set if abort caused by XABORT instruction.
1If set, the transaction may succeed on a retry. This bit is always clear if bit 0 is set.
2Set if another logical processor conflicted with a memory address that was part of the transaction that aborted.
3Set if an internal buffer overflowed.
4Set if debug breakpoint was hit.
5Set if an abort occurred during execution of a nested transaction.
23:6Reserved.
31:24XABORT argument (only valid if bit 0 set, otherwise reserved).

XTEST instruction

TSX/TSX-NI provides a new XTEST instruction that returns whether the processor is executing a transactional region. This instruction is supported by the processor if it supports HLE or RTM or both.

TSX Suspend Load Address Tracking

TSX/TSX-NI Suspend Load Address Tracking (TSXLDTRK) is an instruction set extension that allows to temporarily disable tracking loads from memory in a section of code within a transactional region. This feature extends HLE and RTM, and its support in the processor must be detected separately.

TSXLDTRK introduces two new instructions, XSUSLDTRK and XRESLDTRK, for suspending and resuming load address tracking, respectively. While the tracking is suspended, any loads from memory will not be added to the transaction read set. This means that, unless these memory locations were added to the transaction read or write sets outside the suspend region, writes at these locations by other threads will not cause transaction abort. Suspending load address tracking for a portion of code within a transactional region allows to reduce the amount of memory that needs to be tracked for read-write conflicts and therefore increase the probability of successful commit of the transaction.

Implementation

Intel's TSX/TSX-NI specification describes how the transactional memory is exposed to programmers, but withholds details on the actual transactional memory implementation. [17] Intel specifies in its developer's and optimization manuals that Haswell maintains both read-sets and write-sets at the granularity of a cache line, tracking addresses in the L1 data cache of the processor. [18] [19] [20] [21] Intel also states that data conflicts are detected through the cache coherence protocol. [19]

Haswell's L1 data cache has an associativity of eight. This means that in this implementation, a transactional execution that writes to nine distinct locations mapping to the same cache set will abort. However, due to micro-architectural implementations, this does not mean that fewer accesses to the same set are guaranteed to never abort. Additionally, in CPU configurations with Hyper-Threading Technology, the L1 cache is shared between the two threads on the same core, so operations in a sibling logical processor of the same core can cause evictions. [19]

Independent research points into Haswell’s transactional memory most likely being a deferred update system using the per-core caches for transactional data and register checkpoints. [17] In other words, Haswell is more likely to use the cache-based transactional memory system, as it is a much less risky implementation choice. On the other hand, Intel's Skylake or later may combine this cache-based approach with memory ordering buffer (MOB) for the same purpose, possibly also providing multi-versioned transactional memory that is more amenable to speculative multithreading. [22]

History and bugs

In August 2014, Intel announced that a bug exists in the TSX/TSX-NI implementation on Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update. [9] [10] [23] The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014. [24]

The bug was found and then reported during a diploma thesis in the School of Electrical and Computer Engineering of the National Technical University of Athens. [25]

In October 2018, Intel disclosed a TSX/TSX-NI memory ordering issue found in some Skylake processors. [26] As a result of a microcode update, HLE support was disabled in the affected CPUs, and RTM was mitigated by sacrificing one performance counter when used outside of Intel SGX mode or System Management Mode (SMM). System software would have to either effectively disable RTM or update performance monitoring tools not to use the affected performance counter.

In June 2021, Intel published a microcode update that further disables TSX/TSX-NI on various Xeon and Core processor models from Skylake through Coffee Lake and Whiskey Lake as a mitigation for TSX Asynchronous Abort (TAA) vulnerability. Earlier mitigation for memory ordering issue was removed. [27] By default, with the updated microcode, the processor would still indicate support for RTM but would always abort the transaction. System software is able to detect this mode of operation and mask support for TSX/TSX-NI from the CPUID instruction, preventing detection of TSX/TSX-NI by applications. System software may also enable the "Unsupported Software Development Mode", where RTM is fully active, but in this case RTM usage may be subject to the issues described earlier, and therefore this mode should not be enabled on production systems. On some systems RTM can't be re-enabled when SGX is active. HLE is always disabled.

According to Intel 64 and IA-32 Architectures Software Developer's Manual from May 2020, Volume 1, Chapter 2.5 Intel Instruction Set Architecture And Features Removed, [18] HLE has been removed from Intel products released in 2019 and later. RTM is not documented as removed. However, Intel 10th generation Comet Lake and Ice Lake client processors, which were released in 2020, do not support TSX/TSX-NI, [28] [29] [30] [31] [32] including both HLE and RTM. Engineering versions of Comet Lake processors were still retaining TSX/TSX-NI support.

In Intel Architecture Instruction Set Extensions Programming Reference revision 41 from October 2020, [33] a new TSXLDTRK instruction set extension was documented. It was first included in Sapphire Rapids processors released in January 2023.

See also

Related Research Articles

<span class="mw-page-title-main">Pentium Pro</span> Sixth-generation x86 microprocessor by Intel

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputers like ASCI Red, the first computer to reach the trillion floating point operations per second (teraFLOPS) performance mark in 1996. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998.

In software engineering, a spinlock is a lock that causes a thread trying to acquire it to simply wait in a loop ("spin") while repeatedly checking whether the lock is available. Since the thread remains active but is not performing a useful task, the use of such a lock is a kind of busy waiting. Once acquired, spinlocks will usually be held until they are explicitly released, although in some implementations they may be automatically released if the thread being waited on blocks or "goes to sleep".

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for ECC memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture. They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response, or by returning the value read from the memory location.

Double compare-and-swap is an atomic primitive proposed to support certain concurrent programming techniques. DCAS takes two not necessarily contiguous memory locations and writes new values into them only if they match pre-supplied "expected" values; as such, it is an extension of the much more popular compare-and-swap (CAS) operation.

In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. Transactional memory systems provide high-level abstraction as an alternative to low-level thread synchronization. This abstraction allows for coordination between concurrent reads and writes of shared data in parallel systems.

<span class="mw-page-title-main">Sandy Bridge</span> Intel processor microarchitecture

Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors. The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture. Intel demonstrated a Sandy Bridge processor in 2009, and released first products based on the architecture in January 2011 under the Core brand.

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

<span class="mw-page-title-main">Haswell (microarchitecture)</span> Intel processor microarchitecture

Haswell is the codename for a processor microarchitecture developed by Intel as the "fourth-generation core" successor to the Ivy Bridge. Intel officially announced CPUs based on this microarchitecture on June 4, 2013, at Computex Taipei 2013, while a working Haswell chip was demonstrated at the 2011 Intel Developer Forum. With Haswell, which uses a 22 nm process, Intel also introduced low-power processors designed for convertible or "hybrid" ultrabooks, designated by the "U" suffix.

<span class="mw-page-title-main">Intel Core</span> Line of CPUs by Intel

Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets.

<span class="mw-page-title-main">Skylake (microarchitecture)</span> CPU microarchitecture by Intel

Skylake is the codename used by Intel for a processor microarchitecture that was launched in August 2015 succeeding the Broadwell microarchitecture. Skylake is a microarchitecture redesign using the same 14 nm manufacturing process technology as its predecessor, serving as a tock in Intel's tick–tock manufacturing and design model. According to Intel, the redesign brings greater CPU and GPU performance and reduced power consumption. Skylake CPUs share their microarchitecture with Kaby Lake, Coffee Lake, Cannon Lake, Whiskey Lake, and Comet Lake CPUs.

<span class="mw-page-title-main">Broadwell (microarchitecture)</span> Fifth model generation of Intel Processor

Broadwell is the fifth generation of the Intel Core Processor. It is Intel's codename for the 14 nanometer die shrink of its Haswell microarchitecture. It is a "tick" in Intel's tick–tock principle as the next step in semiconductor fabrication. Like some of the previous tick-tock iterations, Broadwell did not completely replace the full range of CPUs from the previous microarchitecture (Haswell), as there were no low-end desktop CPUs based on Broadwell.

Advanced Synchronization Facility (ASF) is a proposed extension to the x86-64 instruction set architecture that adds hardware transactional memory support. It was introduced by AMD; the latest specification was dated March 2009. As of October 2013, it was still in the proposal stage. No released microprocessors implement the extension.

Transient execution CPU vulnerabilities are vulnerabilities in a computer system in which a speculative execution optimization implemented in a microprocessor is exploited to leak secret data to an unauthorized party. The classic example is Spectre that gave its name to this kind of side-channel attack, but since January 2018 many different vulnerabilities have been identified.

References

  1. Richard M. Yoo; Christopher J. Hughes; Konrad Lai; Ravi Rajwar (November 2013). "Performance Evaluation of Intel Transactional Synchronization Extensions for High-Performance Computing" (PDF). intel-research.net. Archived from the original (PDF) on 2016-10-24. Retrieved 2013-11-14.
  2. Tomas Karnagel; Roman Dementiev; Ravi Rajwar; Konrad Lai; Thomas Legler; Benjamin Schlegel; Wolfgang Lehner (February 2014). "Improving In-Memory Database Index Performance with Intel Transactional Synchronization Extensions" (PDF). software.intel.com. Retrieved 2014-03-03.
  3. "Performance Evaluation of Intel Transactional Synchronization Extensions for High Performance Computing". supercomputing.org. November 2013. Archived from the original on 2013-10-29. Retrieved 2013-11-14.
  4. "Benchmarks: Haswell's TSX and Memory Transaction Throughput (HLE and RTM)". sisoftware.co.uk. Retrieved 2013-11-14.
  5. "Transactional Synchronization in Haswell". Software.intel.com. Retrieved 2012-02-07.
  6. "Transactional memory going mainstream with Intel Haswell". Ars Technica. 2012-02-08. Retrieved 2012-02-09.
  7. "The Core i7-4770K Review". Tom's Hardware. 2013-06-01. Retrieved 2012-06-03.
  8. "Intel Comparison Table of Haswell Pentium, i3, i5, and i7 models". intel.com. Retrieved 2014-02-11.
  9. 1 2 Scott Wasson (2014-08-12). "Errata prompts Intel to disable TSX in Haswell, early Broadwell CPUs". techreport.com. Retrieved 2014-08-12.
  10. 1 2 "Desktop 4th Generation Intel Core Processor Family, Desktop Intel Pentium Processor Family, and Desktop Intel Celeron Processor Family: Specification Update (Revision 014)" (PDF). Intel. June 2014. p. 46. Retrieved 2014-08-13. Under a complex set of internal timing conditions and system events, software using the Intel TSX/TSX-NI (Transactional Synchronization Extensions) instructions may observe unpredictable system behavior.
  11. "Breaking Kernel Address Space Layout Randomization with Intel TSX" (PDF). 2016.
  12. Gareth Halfacree (2021-06-29). "Intel sticks another nail in the coffin of TSX with feature-disabling microcode update". The Register . Retrieved 2012-10-17.
  13. Wooyoung Kim (2013-07-25). "Fun with Intel Transactional Synchronization Extensions". Intel . Retrieved 2013-11-12.
  14. Sebastien Dabdoub; Stephen Tu. "Supporting Intel Transactional Synchronization Extensions in QEMU" (PDF). mit.edu. Retrieved 2013-11-12.
  15. 1 2 Johan De Gelas (2012-09-20). "Making Sense of the Intel Haswell Transactional Synchronization eXtensions". AnandTech . Retrieved 2013-10-20.
  16. "Hardware Lock Elision Overview". intel.com. Archived from the original on 2013-10-29. Retrieved 2013-10-27.
  17. 1 2 David Kanter (2012-08-21). "Analysis of Haswell's Transactional Memory". Real World Technologies. Retrieved 2013-11-19.
  18. 1 2 "Intel 64 and IA-32 Architectures Software Developer's Manual Combined Volumes: 1, 2A, 2B, 2C, 3A, 3B, and 3C" (PDF). Intel. September 2013. p. 342. Retrieved 2013-11-19.
  19. 1 2 3 "Intel 64 and IA-32 Architectures Optimization Reference Manual" (PDF). Intel. September 2013. p. 446. Retrieved 2013-11-19.
  20. "Intel TSX implementation properties". Intel. 2013. Retrieved 2013-11-14. The processor tracks both the read-set addresses and the write-set addresses in the first level data cache (L1 cache) of the processor.
  21. De Gelas, Johan (September 20, 2012). "Making Sense of the Intel Haswell Transactional Synchronization eXtensions". AnandTech. Retrieved 23 December 2013. The whole "CPU does the fine grained locks" is based upon tagging the L1 (64 B) cachelines and there are 512 of them to be specific (64 x 512 = 32 KB). There is only one "lock tag" per cacheline.
  22. David Kanter (2012-08-21). "Haswell Transactional Memory Alternatives". Real World Technologies. Retrieved 2013-11-14.
  23. Ian Cutress (2014-08-12). "Intel Disables TSX Instructions: Erratum Found in Haswell, Haswell-E/EP, Broadwell-Y". AnandTech . Retrieved 2014-08-30.
  24. "Intel Core M Processor Family. Specification Update. December 2014. Revision 003. 330836-003" (PDF). Intel. December 2014. p. 10. Retrieved 2014-12-28. BDM53 1 E-0: X, F-0:, Status: Fixed ERRATA: Intel TSX Instructions Not Available. 1. Applies to Intel Core M-5Y70 processor. Intel TSX is supported on Intel Core M-5Y70 processor with Intel vPro Technology. Intel TSX is not supported on other processor SKUs.
  25. "HiPEAC info" (PDF). p. 12. Archived from the original (PDF) on 2017-03-05.
  26. "Performance Monitoring Impact of Intel® Transactional Synchronization Extension Memory Ordering Issue White Paper, June 2021, Revision 1.4" (PDF). Intel. 2021-06-12. p. 5. The October 2018 microcode update also disabled the HLE instruction prefix of Intel TSX and force all RTM transactions to abort when operating in Intel SGX mode or System Management Mode (SMM).
  27. "Intel® Transactional Synchronization Extensions (Intel® TSX) Memory and Performance Monitoring Update for Intel® Processors". Intel. 2021-06-12.
  28. "Intel® Core™ i9-10900K Processor specifications". Intel. 2020. Retrieved 2020-10-10.
  29. "Intel® Core™ i9-10980HK Processor specifications". Intel. 2020. Retrieved 2020-10-10.
  30. "Intel® Core™ i7-10810U Processor specifications". Intel. 2020. Retrieved 2020-10-10.
  31. "Intel® Xeon® W-1290P Processor specifications". Intel. 2020. Retrieved 2020-10-10.
  32. "Intel® Core™ i7-1068NG7 Processor specifications". Intel. 2020. Retrieved 2020-10-10.
  33. "Intel® Architecture Instruction Set Extensions Programming Reference" (PDF). Intel. 2020. Retrieved 2020-10-21.

Further reading