SSSE3

Last updated

Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.

Contents

History

SSSE3 was first introduced with Intel processors based on the Core microarchitecture on June 26, 2006 with the "Woodcrest" Xeons.

SSSE3 has been referred to by the codenames Tejas New Instructions (TNI) or Merom New Instructions (MNI) for the first processor designs intended to support it.

Functionality

SSSE3 contains 16 new discrete instructions. Each instruction can act on 64-bit MMX or 128-bit XMM registers. Therefore, Intel's materials refer to 32 new instructions. They include: [1]

CPUs with SSSE3

New instructions

In the table below, satsw(X) (read as 'saturate to signed word') takes a signed integer X, and converts it to −32768 if it is less than −32768, to +32767 if it is greater than 32767, and leaves it unchanged otherwise. As normal for the Intel architecture, bytes are 8 bits, words 16 bits, and dwords 32 bits; 'register' refers to an MMX or XMM vector register. [1]

PSIGNB, PSIGNW, PSIGNDPacked SignNegate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PABSB, PABSW, PABSDPacked Absolute ValueFill the elements of a register of bytes, words or dwords with the absolute values of the elements of another register
PALIGNRPacked Align Righttake two registers, concatenate their values, and pull out a register-length section from an offset given by an immediate value encoded in the instruction.
PSHUFBPacked Shuffle Bytestakes registers of bytes A = [a0 a1 a2 ...] and B = [b0 b1 b2 ...] and replaces A with [ab0 ab1 ab2 ...]; except that it replaces the ith entry with 0 if the top bit of bi is set.
PMULHRSWPacked Multiply High with Round and Scaletreat the 16-bit words in registers A and B as signed 16-bit fixed-point numbers between −1.00000000 and +0.99996948... (e.g. 0x4000 is treated as +0.5 and 0xA000 as −0.75), and multiply them together with correct rounding.
PMADDUBSWMultiply and Add Packed Signed and Unsigned BytesTake the bytes in registers A and B, multiply them together, add pairs, signed-saturate and store. I.e. [a0 a1 a2 …] pmaddubsw [b0 b1 b2 …] = [satsw(a0b0+a1b1) satsw(a2b2+a3b3) …]
PHSUBW, PHSUBDPacked Horizontal Subtract (Words or Doublewords)takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0−a1 a2−a3 … b0−b1 b2−b3 …]
PHSUBSWPacked Horizontal Subtract and Saturate Wordslike PHSUBW, but outputs [satsw(a0−a1) satsw(a2−a3) … satsw(b0−b1) satsw(b2−b3) …]
PHADDW, PHADDDPacked Horizontal Add (Words or Doublewords)takes registers A = [a0 a1 a2 …] and B = [b0 b1 b2 …] and outputs [a0+a1 a2+a3 … b0+b1 b2+b3 …]
PHADDSWPacked Horizontal Add and Saturate Wordslike PHADDW, but outputs [satsw(a0+a1) satsw(a2+a3) … satsw(b0+b1) satsw(b2+b3) …]

See also

Related Research Articles

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors.

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of floating-point vector operations using vector registers, which improves the performance of many graphics-intensive applications. The first microprocessor to implement 3DNow! was the AMD K6-2, which was introduced in 1998. When the application was appropriate, this raised the speed by about 2–4 times.

SSE2 is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revision of their Pentium 4 CPU. In April 2005, AMD introduced a subset of SSE3 in revision E of their Athlon 64 CPUs. The earlier SIMD instruction sets on the x86 platform, from oldest to newest, are MMX, 3DNow!, SSE, and SSE2.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

<span class="mw-page-title-main">Pentium</span> Brand of microprocessors produced by Intel

Pentium is a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium was first released on March 22, 1993.

SSE4 is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in Beijing, in the presentation. SSE4 is fully compatible with software written for previous generations of Intel 64 and IA-32 architecture microprocessors. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.

<span class="mw-page-title-main">VIA Nano</span>

The VIA Nano is a 64-bit CPU for personal computers. The VIA Nano was released by VIA Technologies in 2008 after five years of development by its CPU division, Centaur Technology. This new Isaiah 64-bit architecture was designed from scratch, unveiled on 24 January 2008, and launched on 29 May, including low-voltage variants and the Nano brand name. The processor supports a number of VIA-specific x86 extensions designed to boost efficiency in low-power appliances.

The SSE5 was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011. However AMD removed support for XOP from Zen (microarchitecture) onward.

The VEX prefix and VEX coding scheme are an extension to the x86 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 and Skylake-X CPUs; this includes the Core-X series, as well as the new Xeon Scalable Processor Family and Xeon D-2100 Embedded Series. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

References

  1. 1 2 "2.9.5". Intel 64 and IA-32 Architectures Optimization Reference Manual (PDF) (Technical report). Intel.com. 2016. pp. 92–93. Archived (PDF) from the original on June 20, 2018. Retrieved June 22, 2018.