Advanced Matrix Extensions

Last updated

Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions (Intel AMX), are extensions to the x86 instruction set architecture (ISA) for microprocessors from Intel designed to work on matrices to accelerate artificial intelligence (AI) and machine learning (ML) workloads. [1]

Contents

Extensions

AMX was introduced by Intel in June 2020 and first supported by Intel with the Sapphire Rapids microarchitecture for Xeon servers, released in January 2023. [2] [3] It introduced 2-dimensional registers called tiles upon which accelerators can perform operations. It is intended as an extensible architecture; the first accelerator implemented is called tile matrix multiply unit (TMUL). [4] [5]

In Intel Architecture Instruction Set Extensions and Future Features revision 46, published in September 2022, a new AMX-FP16 extension was documented. This extension adds support for half-precision floating-point numbers. In revision 48 from March 2023, AMX-COMPLEX was documented, adding support for half-precision floating-point complex numbers. Both extensions are planned for inclusion in the future Granite Rapids processors (AMX-COMPLEX - only in Granite Rapids-D [6] ).

Tile matrix multiply unit

TMUL unit supports BF16 and INT8 input types. [7] AMX-FP16 also adds support for real and complex FP16 numbers. The register file consists of 8 tiles, each with 16 rows of size of 64 bytes (32 BF16/FP16 or 64 INT8 elements). The only supported operation is matrix multiplication [4]

4th Gen Intel Xeon Scalable processor can perform 2048 INT8 or 1024 BF16 operations per cycle: [8] [9] the maximal input sizes are for A and for B, where J is 64 for INT8 and 32 BF16. The matrix multiplication requires multiplication and additions, thus performing operations in 16 cycles. [9]

Software support

Related Research Articles

<span class="mw-page-title-main">Itanium</span> Family of 64-bit Intel microprocessors

Itanium is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture. The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel. Launched in June 2001, Intel initially marketed the processors for enterprise servers and high-performance computing systems. In the concept phase, engineers said "we could run circles around PowerPC, that we could kill the x86." Early predictions were that IA-64 would expand to the lower-end servers, supplanting Xeon, and eventually penetrate into the personal computers, eventually to supplant RISC and complex instruction set computing (CISC) architectures for all general-purpose applications.

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

Advanced Vector Extensions (AVX) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions, and a new coding scheme.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

In computing, half precision is a binary floating-point computer number format that occupies 16 bits in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.

<span class="mw-page-title-main">Intel Graphics Technology</span> Series of integrated graphics processors by Intel

Intel Graphics Technology (GT) is the collective name for a series of integrated graphics processors (IGPs) produced by Intel that are manufactured on the same package or die as the central processing unit (CPU). It was first introduced in 2010 as Intel HD Graphics and renamed in 2017 as Intel UHD Graphics.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the x86 instruction set architecture (ISA) that adds hardware transactional memory support, speeding up execution of multi-threaded software through lock elision. According to different benchmarks, TSX/TSX-NI can provide around 40% faster applications execution in specific workloads, and 4–5 times more database transactions per second (TPS).

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

<span class="mw-page-title-main">AArch64</span> 64-bit extension of the ARM architecture

AArch64 or ARM64 is the 64-bit extension of the ARM architecture family.

Intel MPX are discontinued set of extensions to the x86 instruction set architecture. With compiler, runtime library and operating system support, Intel MPX claimed to enhance security to software by checking pointer references whose normal compile-time intentions are maliciously exploited at runtime due to buffer overflows. In practice, there have been too many flaws discovered in the design for it to be useful, and support has been deprecated or removed from most compilers and operating systems. Intel has listed MPX as removed in 2019 and onward hardware in section 2.5 of its Intel® 64 and IA-32 Architectures Software Developer's Manual Volume 1.

Sapphire Rapids is a codename for Intel's server and workstation processors based on Intel 7.

Golden Cove is a codename for a CPU microarchitecture developed by Intel and released in November 2021. It succeeds four microarchitectures: Sunny Cove, Skylake, Willow Cove, and Cypress Cove. It is fabricated using Intel's Intel 7 process node, previously referred to as 10 nm Enhanced SuperFin (10ESF).

Downfall, known as Gather Data Sampling (GDS) by Intel, is a computer security vulnerability found in 6th through 11th generations of consumer and 1st through 4th generations of Xeon Intel x86-64 microprocessors. It is a transient execution CPU vulnerability which relies on speculative execution of Advanced Vector Extensions (AVX) instructions to reveal the content of vector registers.

References

  1. Hemsoth, Nicole (August 19, 2021). "With AMX, Intel Adds AI/ML Sparkle to Sapphire Rapids". The Next Platform.
  2. online, heise (28 June 2020). "Intel AMX: Erste Informationen zur Advanced Matrix Extensions Architecture". heise online.
  3. Cutress, Ian. "Intel Xeon Sapphire Rapids: How To Go Monolithic with Tiles". AnandTech .
  4. 1 2 "Intel® Architecture Instruction Set Extensions and Future Features".
  5. Schor, David (June 29, 2020). "The x86 Advanced Matrix Extension (AMX) Brings Matrix Operations; To Debut with Sapphire Rapids".
  6. Larabel, Michael (July 12, 2023). "Intel Granite Rapids D Support Merged Into GCC 14". Phoronix .
  7. "Advanced Matrix Extension (AMX) - x86 - WikiChip". en.wikichip.org.
  8. "Accelerate Artificial Intelligence (AI) Workloads with Intel Advanced Matrix Extensions (Intel AMX)" (PDF). Intel. Retrieved 2023-04-13.
  9. 1 2 "Intel® 64 and IA-32 Architectures Optimization Reference Manual Volume 1". Intel.
  10. "What's New in LLVM for 4th Gen Intel® Xeon® & Max Series CPUs" . Retrieved 21 April 2023.
  11. Larabel, Michael (2020-07-02). "Intel AMX Support Begins Landing In LLVM". Phoronix . Retrieved 2020-07-02.
  12. "[X86-64] Support Intel AMX instructions". GitHub . 2020-07-02. Retrieved 2020-07-02.
  13. 1 2 Larabel, Michael (2020-07-02). "Intel AMX Support Lands In The GNU Assembler". Phoronix . Retrieved 2020-07-02.
  14. "GCC 11 Release Series — Changes, New Features, and Fixes - GNU Project" . Retrieved 21 April 2023.
  15. "[PATCH] Enable GCC support for AMX". 2020-07-06. Retrieved 2020-07-09.
  16. "Enable GCC support for AMX-TILE,AMX-INT8,AMX-BF16. · gcc-mirror/gcc@5c60984". GitHub. Retrieved 2022-09-05.
  17. "commits with Intel AMX". 2020-07-02. Retrieved 2020-07-02.
  18. "x86: Detect Intel Advanced Matrix Extensions". 2020-07-02. Retrieved 2020-07-02.
  19. "Linux 5.16 Features Include FUTEX2, Intel AMX, Folios, DG2/Alchemist, More Apple Silicon Support". Phoronix .
  20. "Accessing Sapphire Rapids AMX instructions on vSphere". Earl C. Ruby III. 2023-08-24.