VEX prefix

Last updated

The VEX prefix (from "vector extensions") and VEX coding scheme are an extension to the IA-32 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

Contents

Features

The VEX coding scheme allows the definition of new instructions and the extension or modification of previously existing instruction codes. This serves the following purposes:

The VEX prefix replaces the most commonly used instruction prefix bytes and escape bytes. In many cases, the number of prefix bytes and escape bytes that are replaced is the same as the number of bytes in the VEX prefix, so that the total length of the VEX-encoded instruction is the same as the length of the legacy instruction code. In other cases, the VEX-encoded version is longer or shorter than the legacy code. In 32-bit mode VEX encoded instructions can only access the first 8 YMM/XMM registers; the encodings for the other registers would be interpreted as the legacy LDS and LES instructions that are not supported in 64-bit mode.

Instruction encoding

Intel 64 instruction format using VEX prefix
# of bytes0, 2, 3110, 10, 1, 2, 40, 1
Part[Prefixes][VEX]OPCODEModR/M[SIB][DISP][IMM]

The VEX coding scheme uses a code prefix consisting of two or three bytes, which may be added to existing or new instruction codes. [1]

In x86 architecture, instructions with a memory operand may use the ModR/M byte which specifies the addressing mode. This byte has three bit fields:

The base-plus-index and scale-plus-index forms of 32-bit addressing (encoded with r/m = 100 and mod ≠ 11) require another addressing byte, the SIB byte. It has the following fields:

REX and VEX encoding
ByteBit
REX
76543210
0 (0x4_)0100WRXB
VEX3 (3-byte VEX)
76543210
0 (0xC4)11000100
1m4m3m2m1m0
2W3210Lp1p0
VEX2 (2-byte VEX)
76543210
0 (0xC5)11000101
13210Lp1p0
REX2 (2-byte REX)
76543210
0 (0xD5)11010101
1M0R4X4B4WR3X3B3

The REX prefix provides additional space for encoding 64-bit addressing modes and additional registers present in the x86-64 architecture. Bit-field W changes the operand size to 64 bits, R expands reg to 4 bits, B expands r/m (or opreg in the few opcodes that encode the register in the 3 lowest opcode bits, such as "POP reg"), and X and B expand index and base in the SIB byte.

The VEX3 prefix contains all bit-fields from the REX prefix as well as various other prefixes, expanding addressing mode, register enumeration, operand size and width:

The VEX2 prefix is a 2-byte variant of the VEX3 prefix, that differs from the latter in the following points:

Instructions that require any of these bit-fields need to be encoded with the VEX3 prefix.

The REX2 prefix is a 2-byte variant of the REX prefix, introduced with Intel APX extensions which add 16 Extended GPR registers.

Register addressing in 64-bit mode using VEX prefix
Addressing modeBit 3Bits [2:0]Register typeCommon usage
REGVEX.RModRM.regGeneral purpose, mask, vectorRegister operand
RM (if ModRM.mod = 11)VEX.BModRM.r/mGPR, mask, vectorRegister operand
RMVEX.BModRM.r/mGPRRegister memory address
BASEVEX.BSIB.baseGPRBase + index × scale memory address
INDEXVEX.XSIB.indexGPRBase + index × scale memory address
VIDXVEX.XSIB.indexVectorBase + vector index × scale memory address
NDS/NDDVEX.v3v2v1v0GPR, mask, vectorRegister operand
IS4Imm8[7:4]VectorRegister operand

Technical description

Instructions coded with the VEX prefix can have up to four variable operands (in registers or memory) and one constant operand (immediate value). Instructions that need more than three variable operands use immediate operand bits to specify a 4th register operand (IS4 above). At most one of the operands can be a memory operand; and at most one of the operands can be an immediate constant of 4 or 8 bits. The remaining operands are registers.

The AVX instruction set is the first instruction set extension to use the VEX coding scheme. The AVX instruction set uses VEX prefix only for instructions using the SIMD XMM registers.

However, the VEX coding scheme has been used for other instruction types as well in subsequent expansions of the instruction set. For example:

The VEX prefix's initial-byte values, 0xC4 and 0xC5, are the same as the opcodes of the LDS and LES instructions. Not supported in 64-bit mode, the ambiguity is resolved in 32-bit mode by exploiting the fact that a legal LDS or LES's ModR/M byte cannot specify a register source operand; i.e., be of the form 11xxxxxx. Various bit-fields in the VEX prefix's second byte are inverted to ensure that the byte is always of this form. Similarly, the REX prefix's one-byte form has the four high-order bits set to four, which replaces sixteen opcodes numbered 0x40–0x4F. Previously, those opcodes were individual INC and DEC instructions for the eight standard processor registers; x86-64 code must use ModR/M INC and DEC instructions. [5]

Legacy SIMD instructions with a VEX prefix added are equivalent to the same instructions without VEX prefix with the following differences:

Instructions that use the whole 256-bit YMM register should not be mixed with non-VEX instructions that leave the upper half of the register unchanged, for reasons of efficiency. [6] [7]

The VEX prefix is not supported in real mode and virtual-8086 mode (all instructions with the VEX prefix will cause #UD in these modes).

History

Related Research Articles

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997. AMD also added MMX instruction set in its K6 processor.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

<span class="mw-page-title-main">MCS-51</span> Single chip microcontroller series by Intel

The Intel MCS-51 is a single chip microcontroller (MCU) series developed by Intel in 1980 for use in embedded systems. The architect of the Intel MCS-51 instruction set was John H. Wharton. Intel's original versions were popular in the 1980s and early 1990s, and enhanced binary compatible derivatives remain popular today. It is a complex instruction set computer, but also has some of the features of RISC architectures, such as a large register set and register windows, and has separate memory spaces for program instructions and data.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

x86-64 64-bit version of x86 architecture

x86-64 is a 64-bit version of the x86 instruction set, first announced in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of XMM (SIMD) registers on x86 instruction set architecture processors. These registers can load up to 128 bits of data and perform instructions, such as vector addition and multiplication, simultaneously.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

TLCS is a prefix applied to microcontrollers made by Toshiba. The product line includes multiple families of CISC and RISC architectures. Individual components generally have a part number beginning with "TMP". E.g. the TMP8048AP is a member of the TLCS-48 family.

The SSE5 was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.

Advanced Vector Extensions are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions, and a new coding scheme.

The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011. However AMD removed support for XOP from Zen (microarchitecture) onward.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring represents the polynomial . The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2k) than the traditional instruction set.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

The F16C instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats.

The EVEX prefix and corresponding coding scheme is an extension to the 32-bit x86 (IA-32) and 64-bit x86-64 (AMD64) instruction set architecture. EVEX is based on, but should not be confused with the MVEX prefix used by the Knights Corner processor.

<span class="mw-page-title-main">STM8</span>

The STM8 is an 8-bit microcontroller family by STMicroelectronics. The STM8 microcontrollers use an extended variant of the ST7 microcontroller architecture. STM8 microcontrollers are particularly low cost for a full-featured 8-bit microcontroller.

The ModR/M byte is an important part of instruction encoding for the x86 instruction set.

References

  1. Intel Corporation (January 2009). "Intel Advanced Vector Extensions Programming Reference".
  2. Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual (PDF). Sep 7, 2012. p. 73. 327364-001. Archived (PDF) from the original on Aug 4, 2021.
  3. Intel ® Architecture Instruction Set Extensions and Future Features (PDF). Sep 2023. p. 103. 314933-050. Archived (PDF) from the original on Dec 12, 2023.
  4. Intel, Software Developers Manual, order no. 325462-081, sep 2023, vol 2, section 2.7.11.3, p. 588. Archived on Dec 6, 2023
  5. Intel Corporation (2016-09-01). "Intel® 64 and IA-32 Architectures Developer's Manual: Vol. 2A". p. 2-8. Retrieved 2021-09-13.
  6. Intel, Avoiding AVX-SSE Transition Penalties, 2011. Archived on 26 Oct 2023.
  7. Stack Overflow, Why is this SSE code 6 times slower without VZEROUPPER on Skylake?, December 2016. Archived on 6 Jul 2023.
  8. "128-Bit SSE5 Instruction Set". AMD Developer Central. Retrieved 2009-06-02.
  9. Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.
  10. "Intel Software Network". Intel. Archived from the original on 2008-04-07. Retrieved 2008-04-05.
  11. "AMD and Intel incompatible - What to do?". AMD Developer Forums. Retrieved 2012-08-10.
  12. "AMD64 Architecture Programmer's Manual Volume 4: 128-Bit and 256-Bit Media Instructions" (PDF). AMD. December 22, 2010.
  13. "Striking a balance". Dave Christie, AMD Developer blogs. Archived from the original on 2013-11-09. Retrieved 2012-08-10.