SSE5

Last updated

The SSE5 (short for Streaming SIMD Extensions version 5) was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.

Contents

AMD chose not to implement SSE5 as originally proposed. In May 2009, AMD replaced SSE5 with three smaller instruction set extensions named as XOP, FMA4, and F16C, which retain the proposed functionality of SSE5, but encode the instructions differently for better compatibility with Intel's proposed AVX instruction set.

The three SSE5-derived instruction sets were introduced in the Bulldozer processor core, released in October 2011 on a 32 nm process. [1]

Compatibility

AMD's SSE5 extension bundle does not include the full set of Intel's SSE4 instructions, making it a competitor to SSE4 rather than a successor.

SSE5 enhancements

The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance. Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of instructions per cycle achievable by x86 code. [2] Selected new instructions include: [3]

AMD claimed SSE5 would provide dramatic performance improvements, particularly in high-performance computing (HPC), multimedia, and computer security applications, including a 5x performance gain for AES encryption and a 30% performance gain for the discrete cosine transform (DCT) used for example in video processing. [2]

2009 revision

The SSE5 specification included a proposed extension to the general coding scheme of x86 instructions in order to allow instructions to have more than two operands. In 2008, Intel announced their planned AVX instruction set which proposed a different way of coding instructions with more than two operands. The two proposed coding schemes, SSE5 and AVX, are mutually incompatible, although the AVX scheme has certain advantages over the SSE5 scheme: most importantly, AVX has plenty of space for future extensions, including larger vector sizes.

In May 2009, AMD published a revised specification for the planned future instructions. This revision changes the coding scheme to make it compatible with the AVX scheme, but with a differing prefix byte in order to avoid overlap between instructions introduced by AMD and instructions introduced by Intel.

The revised instruction set no longer carries the name SSE5, which has been criticized for being misleading, but most of the instructions in the new revision are functionally identical to the original SSE5 specification—only the way the instructions are coded differs. The planned additions to the AMD instruction set consists of three subsets:

  1. XOP: Integer vector multiply–accumulate instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction.
  2. FMA4: Floating-point vector multiply–accumulate.
  3. F16C: Half-precision floating-point conversion.

Both XOP and FMA4 are removed in newer AMD processors using the Zen microarchitecture. [4]

See also

Related Research Articles

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in its Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

x86-64 64-bit version of x86 architecture

x86-64 is a 64-bit version of the x86 instruction set, first announced in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.

SSE2 is one of the Intel SIMD processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of XMM (SIMD) registers on x86 instruction set architecture processors. These registers can load up to 128 bits of data and perform instructions, such as vector addition and multiplication, simultaneously.

Supplemental Streaming SIMD Extensions 3 is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.

SSE4 is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation. SSE4 extended the SSE3 instruction set which was released in early 2004. All software using previous Intel SIMD instructions are compatible with modern microprocessors supporting SSE4 instructions. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.

Advanced Vector Extensions are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge microarchitecture shipping in Q1 2011 and later by AMD with the Bulldozer microarchitecture shipping in Q4 2011. AVX provides new features, new instructions, and a new coding scheme.

The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011. However AMD removed support for XOP from Zen (microarchitecture) onward.

The VEX prefix and VEX coding scheme are an extension to the IA-32 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. Mathematically, the instruction implements multiplication of polynomials over the finite field GF(2) where the bitstring represents the polynomial . The CLMUL instruction also allows a more efficient implementation of the closely related multiplication of larger finite fields GF(2k) than the traditional instruction set.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

The F16C instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats.

Bit manipulation instructions sets are extensions to the x86 instruction set architecture for microprocessors from Intel and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-SIMD and operate only on general-purpose registers.

The x86 instruction set has several times been extended with SIMD instruction set extensions. These extensions, starting from the MMX instruction set extension introduced with Pentium MMX in 1997, typically define sets of wide registers and instructions that subdivide these registers into fixed-size lanes and perform a computation for each lane in parallel.

References

  1. Hruska, Joel (November 14, 2008). "AMD Fusion now pushed back to 2011". Ars Technica.
  2. 1 2 Vance, Ashlee (August 30, 2007). "AMD plots single thread boost with x86 extensions". The Register.
  3. "128-Bit SSE5 Instruction Set". AMD Developer Central. Archived from the original on January 15, 2008. Retrieved January 28, 2008.
  4. Michael Larabel (March 3, 2017). "The Impact Of GCC Zen Compiler Tuning On AMD Ryzen Performance". Phoronix . But with Zen being a clean-sheet design, there are some instruction set extensions found in Bulldozer processors not found in Zen/znver1. Those no longer present include FMA4 and XOP.