List of discontinued x86 instructions

Last updated

Instructions that have at some point been present as documented instructions in one or more x86 processors, but where the processor series containing the instructions are discontinued or superseded, with no known plans to reintroduce the instructions.

Contents

Intel instructions

i386 instructions

The following instructions were introduced in the Intel 80386, but later discontinued:

InstructionOpcodeDescriptionEventual fate
XBTS r, r/m0F A6 /rExtract Bit StringDiscontinued from revision B1 of the 80386 onwards.

Opcodes briefly reused for CMPXCHG in Intel 486 stepping A only − CMPXCHG was moved to different opcode from 486 stepping B onwards.

Opcodes later reused for VIA PadLock.

IBTS r/m, r0F A7 /rInsert Bit String
MOV r32,TRx0F 24 /rMove from test register Present in Intel 386 and 486 − not present in Intel Pentium or any later Intel CPUs (except they're present in the i486-derived Quark X1000).

Present in all Cyrix CPUs.

MOV TRx,r320F 26 /rMove to test register

Itanium instructions

These instructions are only present in the x86 operation mode of early Intel Itanium processors with hardware support for x86. This support was added in "Merced" and removed in "Montecito", replaced with software emulation.

InstructionOpcodeDescription
JMPE r/m16
JMPE r/m32
0F 00 /6Jump To Intel Itanium Instruction Set. [1]
JMPE disp16/320F B8 rel16/32

MPX instructions

These instructions were introduced in 6th generation Intel Core "Skylake" CPUs. The last CPU generation to support them was the 9th generation Core "Coffee Lake" CPUs.

Intel MPX adds 4 new registers, BND0 to BND3, that each contains a pair of addresses. MPX also defines a bounds-table as a 2-level directory/table data structure in memory that contains sets of upper/lower bounds.

InstructionOpcode [lower-alpha 1] Description
BMDMK b, mF3 0F 1B /r [lower-alpha 2] Make lower and upper bound from memory address expression.

The lower bound is given by base component of address, the upper bound by 1-s complement of the address as a whole.

BNDCL b, r/mF3 0F 1A /rCheck address against lower bound.

BNDCL, BNDCU and BNDCL all produce a #BR exception if the bounds check fails.

BNDCU b, r/mF2 0F 1A /rCheck address against upper bound in 1's-complement form
BNDCN b, r/mF2 0F 1B /rCheck address against upper bound.
BMDMOV b, b/m66 0F 1A /rMove a pair of memory bounds to/from memory or between bounds-registers.
BNDMOV b/m, b66 0F 1B /r
BNDLDX b,mibNP 0F 1A /r [lower-alpha 3] Load bounds from the bounds-table, using address translation using an sib-addressing expression mib. [lower-alpha 4]
BNDSTX mib,bNP 0F 1B /r [lower-alpha 3] Store bounds into the bounds-table, using address translation using an sib-addressing expression mib. [lower-alpha 4]
BNDF2Instruction prefix used with certain branch instructions [lower-alpha 5] to indicate that they should not clear the bounds registers.
  1. For all of the MPX instructions, 16-bit addressing is disallowed − this effectively makes the address-size override prefix 67h mandatory in 16-bit mode and prohibited in 32-bit mode. In 64-bit mode, the 67h prefix is ignored for the MPX instructions − address size is always 64-bit. These behaviors are unique to the MPX instructions.
  2. For BNDMK in 64-bit mode, RIP-relative addressing is not permitted and will cause #UD.
  3. 1 2 The BNDLDX and BNDSTX instructions requires memory addressing modes that use the SIB byte − non-SIB addressing modes cause #UD.
  4. 1 2 The BNDLDX and BNDSTX instructions produce a #BR exception if bounds directory entry is not valid (which prevents address translation).
  5. The branch instructions that can accept a BND prefix are the near forms of JMP (opcodes E9 and FF /4), CALL (opcodes E8 and FF /2), RET (opcodes C2 and C3), and the short/near forms of the Jcc instructions (opcodes 70..7F and 0F 80..8F). If the BNDPRESERVE config bit is not set, then executing any of these branch instructions without the BND prefix will clear all four bounds registers. (Other branch instructions − such as e.g. far jumps, short jumps (EB), LOOP, IRET etc − do not clear the bounds registers regardless of whether an F2h prefix is present or not.)

Hardware Lock Elision

The Hardware Lock Elision feature of Intel TSX is marked in the Intel SDM as removed from 2019 onwards. [2] This feature took the form of two instruction prefixes, XACQUIRE and XRELEASE, that could be attached to memory atomics/stores to elide the memory locking that they represent.

Instruction prefixOpcodeDescription
XACQUIREF2Instruction prefix to indicate start of hardware lock elision, used with memory atomic instructions only (for other instructions, the F2 prefix may have other meanings). When used with such instructions, may start a transaction instead of performing the memory atomic operation.
XRELEASEF3Instruction prefix to indicate end of hardware lock elision, used with memory atomic/store instructions only (for other instructions, the F3 prefix may have other meanings). When used with such instructions during hardware lock elision, will end the associated transaction instead of performing the store/atomic.

VP2Intersect instructions

The VP2INTERSECT instructions (an AVX-512 subset) were introduced in Tiger Lake (11th generation mobile Core processors), but were never officially supported on any other Intel processors - they are now considered deprecated [3] and are listed in the Intel SDM as removed from 2023 onwards. [2]

InstructionOpcodeDescription
VP2INTERSECTD k1+1, xmm2, xmm3/m128/m32bcst
VP2INTERSECTD k1+1, ymm2, ymm3/m256/m32bcst
VP2INTERSECTD k1+1, zmm2, zmm3/m512/m32bcst
EVEX.NDS.F2.0F38.W0 68 /rStore, in an even/odd pair of mask registers, the indicators of the locations of value matches between 32-bit lanes in the two vector source arguments.
VP2INTERSECTQ k1+1, xmm2, xmm3/m128/m64bcst
VP2INTERSECTQ k1+1, ymm2, ymm3/m256/m64bcst
VP2INTERSECTQ k1+1, zmm2, zmm3/m512/m64bcst
EVEX.NDS.F2.0F38.W1 68 /rStore, in an even/odd pair of mask registers, the indicators of the locations of value matches between 64-bit lanes in the two vector source arguments.

Instructions specific to Xeon Phi processors

"Knights Corner" instructions

The first generation Xeon Phi processors, codenamed "Knights Corner" (KNC), supported a large number of instructions that are not seen in any later x86 processor. An instruction reference is available [4] − the instructions/opcodes unique to KNC are the ones with VEX and MVEX prefixes (except for the KMOV, KNOT and KORTEST instructions − these are kept with the same opcodes and function in AVX-512, but with an added "W" appended to their instruction names).

Most of these KNC-unique instructions are similar but not identical to instructions in AVX-512 − later Xeon Phi processors replaced these instructions with AVX-512.

Early versions of AVX-512 avoided the instruction encodings used by KNC's MVEX prefix, however with the introduction of Intel APX (Advanced Performance Extensions) in 2023, some of the old KNC MVEX instruction encodings have been reused for new APX encodings. For example, both KNC and APX accept the instruction encoding 62 F1 79 48 6F 04 C1 as valid, but assign different meanings to it:

  • KNC: VMOVDQA32 zmm0, k0, xmmword ptr [rcx+rax*8]{uint8} - vector load with data conversion
  • APX: VMOVDQA32 zmm0, [rcx+r16*8] - vector load with one of the new APX extended-GPRs used as scaled index

"Knights Landing" and "Knights Mill" instructions

Some of the AVX-512 instructions in the Xeon Phi "Knights Landing" and later models belong to the AVX-512 subsets "AVX512ER", "AVX512_4FMAPS", "AVX512PF" and "AVX512_4VNNIW", all of which are unique to the Xeon Phi series of processors. The ER and PF subsets were introduced in "Knights Landing" − the 4FMAPS and 4VNNIW instructions were later added in "Knights Mill".

The ER and 4FMAPS instructions are floating-point arithmetic instructions that all follow a given pattern where:

  • EVEX.W is used to specify floating-point format (0=FP32, 1=FP64)
  • The bottom opcode bit is used to select between packed and scalar operation (0: packed, 1:scalar)
  • For a given operation, all the scalar/packed variants belong to the same AVX-512 subset.
  • The instructions all support result masking by opmask registers. The AVX512ER instructions also all support broadcast of memory operands.
  • The only supported vector width is 512 bits.
OperationAVX-512
subset
Basic opcodeFP32 instructions (W=0)FP64 instructions (W=1)RC/SAE
PackedScalarPackedScalar
Xeon Phi specific instructions (ER, 4FMAPS)
Reciprocal approximation with an accuracy of [lower-alpha 1] EREVEX.66.0F38 (CA/CB) /rVRCP28PS z,z,z/m512VRCP28SS x,x,x/m32VRCP28PD z,z,z/m512VRCP28SD x,x,x/m64SAE
Reciprocal square root approximation with an accuracy of [lower-alpha 1] EREVEX.66.0F38 (CC/CD) /rVRSQRT28PS z,z,z/m512VRSQRT28SS x,x,x/m32VRSQRT28PD z,z,z/m512VRSQRT28SD x,x,x/m64SAE
Exponential approximation with relative error [lower-alpha 1] EREVEX.66.0F38 C8 /rVEXP2PS z,z/m512NoVEXP2PD z,z/m512NoSAE
Fused-multiply-add, 4 iterations4FMAPSEVEX.F2.0F38 (9A/9B) /rV4FMADDPS z,z+3,m128V4FMADDSS x,x+3,m128NoNo
Fused negate-multiply-add, 4 iterations4FMAPSEVEX.F2.0F38 (AA/AB) /rV4FNMADDPS z,z+3,m128V4FNMADDSS x,x+3,m128NoNo
  1. 1 2 3 For the AVX512ER instructions, a numerically exact reference is available as C code. [5]

The AVX512PF instructions are a set of 16 prefetch instructions. These instructions all use VSIB encoding, where a memory addressing mode using the SIB byte is required, and where the index part of the SIB byte is taken to index into the AVX512 vector register file rather than the GPR register file. The selected AVX512 vector register is then interpreted as a vector of indexes, causing the standard x86 base+index+displacement address calculation to be performed for each vector lane, causing one associated memory operation (prefetches in case of the AVX512PF instructions) to be performed for each active lane. The instruction encodings all follow a pattern where:

  • EVEX.W is used to specify format of the prefetchable data (0:FP32, 1:FP64)
  • The bottom bit of the opcode is used to indicate whether the AVX512 index register is considered a vector of sixteen signed 32-bit indexes (bit 0 not set) or eight signed 64-bit indexes (bit 0 set)
  • The instructions all support operation masking by opmask registers.
  • The only supported vector width is 512 bits.
OperationBasic opcode32-bit indexes (opcode C6)64-bit indexes (opcode C7)
FP32 prefetch (W=0)FP64 prefetch (W=1)FP32 prefetch (W=0)FP64 prefetch (W=1)
Prefetch into L1 cache (T0 hint)EVEX.66.0F38 (C6/C7) /1 /vsibVGATHERPF0DPS vm32z {k1}VGATHERPF0DPD vm32y {k1}VGATHERPF0QPS vm64z {k1}VGATHERPF0QPD vm64y {k1}
Prefetch into L2 cache (T1 hint)EVEX.66.0F38 (C6/C7) /2 /vsibVGATHERPF1DPS vm32z {k1}VGATHERPF1DPD vm32y {k1}VGATHERPF1QPS vm64z {k1}VGATHERPF1QPD vm64y {k1}
Prefetch into L1 cache (T0 hint) with intent to writeEVEX.66.0F38 (C6/C7) /5 /vsibVSCATTERPF0DPS vm32z {k1}VSCATTERPF0DPD vm32y {k1}VSCATTERPF0QPS vm64z {k1}VSCATTERPF0QPD vm64y {k1}
Prefetch into L2 cache (T1 hint) with intent to writeEVEX.66.0F38 (C6/C7) /6 /vsibVSCATTERPF1DPS vm32z {k1}VSCATTERPF1DPD vm32y {k1}VSCATTERPF1QPS vm64z {k1}VSCATTERPF1QPD vm64y {k1}

The AVX512_4VNNIW instructions read a 128-bit data item from memory, containing 4 two-component vectors (each component being signed 16-bit). Then, for each of 4 consecutive AVX-512 registers, they will, for each 32-bit lane, interpret the lane as a two-component vector (signed 16-bit) and perform a dot-product with the corresponding two-component vector that was read from memory (the first two-component vector from memory is used for the first AVX-512 source register, and so on). These results are then accumulated into a destination vector register.

InstructionOpcodeDescription
VP4DPWSSD zmm1{k1}{z}, zmm2+3, m128EVEX.512.F2.0F38.W0 52 /rDot-product of signed words with dword accumulation, 4 iterations
VP4DPWSSDS zmm1{k1}{z}, zmm2+3, m128EVEX.512.F2.0F38.W0 53 /rDot-product of signed words with dword accumulation and saturation, 4 iterations

Xeon Phi processors (from Knights Landing onwards) also featured the PREFETCHWT1 m8 instruction (opcode 0F 0D /2, prefetch into L2 cache with intent to write) − these were the only Intel CPUs to officially support this instruction, but it continues to be supported on some non-Intel processors (e.g. Zhaoxin YongFeng).

AMD instructions

Am386 SMM instructions

A handful of instructions to support System Management Mode were introduced in the Am386SXLV and Am386DXLV processors. [6] [7] They were also present in the later Am486SXLV and Am486DXLV processors.

The SMM functionality of these processors was implemented using Intel ICE microcode without a valid license, resulting in a lawsuit that AMD lost in 1994. [8] As a result of this loss, the ICE microcode was removed from all later AMD CPUs, and the SMM instructions removed with it.

InstructionOpcodeDescription
SMIF1Call SMM interrupt handler (only if DR7 bit 12 is set)
UMOV r/m8, r80F 10 /rMove data between registers and main system memory
UMOV r/m, r16/320F 11 /r
UMOV r8, r/m80F 12 /r
UMOV r16/32, r/m0F 13 /r
RES30F 07Return from SMM interrupt handler (Am386SXLV/DXLV only)
Takes a pointer in ES:EDI to a processor save state to resume from − this save state has format nearly identical to that of the undocumented Intel 386 LOADALL instruction. [9]
RES40F 07Return from SMM interrupt handler (Am486SXLV/DXLV only).
Similar to RES3, but with a different save state format. [10]

These SMM instructions were also present on the IBM 386SLC and its derivatives (albeit with the LOADALL -like SMM return opcode 0F 07 named ICERET). [9] [11]

3DNow! instructions

The 3DNow! instruction set extension was introduced in the AMD K6-2, mainly adding support for floating-point SIMD instructions using the MMX registers (two FP32 components in a 64-bit vector register). The instructions were mainly promoted by AMD, but were supported on some non-AMD CPUs as well. The processors supporting 3DNow! were:

InstructionOpcodeInstruction description
PFADD mm1,mm2/m640F 0F /r 9EPacked floating-point addition:
dst <- dst + src
PFSUB mm1,mm2/m640F 0F /r 9APacked floating-point subtraction:
dst <- dst − src
PFSUBR mm1,mm2/m640F 0F /r AAPacked floating-point reverse subtraction:
dst <- src − dst
PFMUL mm1,mm2/m640F 0F /r B4Packed floating-point multiplication:
dst <- dst * src
PFMAX mm1,mm2/m640F 0F /r A4Packed floating-point maximum:
dst <- (dst > src) ? dst : src
PFMIN mm1,mm2/m640F 0F /r 94Packed floating-point minimum:
dst <- (dst < src) ? dst : src
PFCMPEQ mm1,mm2/m640F 0F /r B0Packed floating-point comparison, equal:
dst <- (dst == src) ? 0xFFFFFFFF : 0
PFCMPGE mm1,mm2/m640F 0F /r 90Packed floating-point comparison, greater than or equal:
dst <- (dst >= src) ? 0xFFFFFFFF : 0
PFCMPGT mm1,mm2/m640F 0F /r A0Packed floating-point comparison, greater than:
dst <- (dst > src) ? 0xFFFFFFFF : 0
PF2ID mm1,mm2/m640F 0F /r 1DConverts packed floating-point operand to packed 32-bit signed integer, with round-to-zero
PI2FD mm1,mm2/m640F 0F /r 0DPacked 32-bit signed integer to floating-point conversion, with round-to-zero
PFRCP mm1,mm2/m640F 0F /r 96Floating-point reciprocal approximation (at least 14 bit precision):
temp <- approx(1.0/src[31:0])
dst[31:0] <- temp
dst[63:32] <- temp
The 3DNow! specification [12] does not directly specify the operation performed by the PFRCPIT1, PFRSQIT1 and PFRCPIT2 instructions − instead, it imposes requirements on the results of using these instructions together in specific ways: [lower-alpha 1]

If the bottom 32 bits of mm0 initially contains a value X in FP32 format, then the instruction sequence:

PFRCP mm1,mm0 PFRCPIT1 mm0,mm1 PFRCPIT2 mm0,mm1

must fill both 32-bit lanes of mm0 with in FP32 format, computed with an error of at most 1 ulp.

Similarly, the instruction sequence:
PFRSQRT mm1,mm0 MOVQ mm2,mm1 PFMUL mm1,mm1 PFRSQIT1 mm1,mm0 PFRCPIT2 mm1,mm2

must fill both 32-bit lanes of mm1 with in FP32 format, computed with an error of at most 1 ulp.

PFRSQRT mm1,mm2/m640F 0F /r 97Floating-point reciprocal square root approximation (at least 15 bit precision):
temp <- approx(1.0/sqrt(src[31:0]))
dst[31:0] <- temp
dst[63:32] <- temp
PFRCPIT1 mm1,mm2/m640F 0F /r A6Packed floating-point reciprocal, first iteration step
PFRSQIT1 mm1,mm2/m640F 0F /r A7Packed floating-point reciprocal square root, first iteration step
PFRCPIT2 mm1,mm2/m640F 0F /r B6Packed floating-point reciprocal/reciprocal square root, second iteration step
PFACC mm1,mm2/m640F 0F /r AEFloating-point accumulate (horizontal add):
dst[31:0] <- dst[31:0] + dst[63:32]
dst[63:32] <- src[31:0] + src[63:32]
PMULHRW mm1,mm2/m64, [lower-alpha 2]
PMULHRWA mm1,mm2/m64
0F 0F /r B7Multiply signed packed 16-bit integers with rounding and store the high 16 bits:
dst <- ((dst * src) + 0x8000) >> 16
PAVGUSB mm1,mm2/m640F 0F /r BFAverage of unsigned packed 8-bit integers:
dst <- (src+dst+1) >> 1
FEMMS0F 0EFaster Enter/Exit of the MMX or x87 floating-point state [lower-alpha 3]
  1. The 3DNow! precision requirements can be fulfilled in several different ways, for example:
    • On AMD K6-2, the PFRCPIT1, PFRSQIT1 and PFRCPIT2 instructions would perform various parts of a Newton-Raphson iteration to improve the precision of a low-precision initial result from PFRCP/PFRSQRT. [13]
    • On AMD Geode LX, the PFRCP and PFRSQRT instructions would instead compute their results with full 24-bit precision − this made it possible to turn the PFRCPIT1, PFRSQIT1 and PFRCPIT2 instructions into pure data movement instructions, performing the same operation as MOVQ. [14]
  2. The 3DNow! PMULHRW instruction has the same mnemonic as the Cyrix EMMI PMULHRW instruction, however its opcode and function differ (the EMMI instruction right-shifts its multiply-result by 15 bits, while the 3DNow! instruction right-shifts by 16 bits).

    Some assemblers/disassemblers, such as NASM, resolve this ambiguity by using the mnemonic PMULHRWA for the 3DNow! instruction and PMULHRWC for the EMMI instruction.

  3. The FEMMS instruction differs from the standard MMX EMMS instruction in that FEMMS makes the FP/MMX register contents undefined after the instruction is executed.

3DNow! also introduced a couple of prefetch instructions: PREFETCH m8 (opcode 0F 0D /0) and PREFETCHW m8 (opcode 0F 0D /1). These instructions, unlike the rest of 3DNow!, are not discontinued but continue to be supported on modern AMD CPUs. The PREFETCHW instruction is also supported on Intel CPUs starting with 65 nm Pentium 4, [15] albeit executed as NOP until Broadwell.

3DNow+ instructions added with Athlon and K6-2+

InstructionOpcodeInstruction description
PF2IW mm1,mm2/m640F 0F /r 1CPacked 32-bit floating-point to 16-bit signed integer conversion, with round-to-zero [lower-alpha 1]
PI2FW mm1,mm2/m640F 0F /r 0CPacked 16-bit signed integer to 32-bit floating-point conversion [lower-alpha 1]
PSWAPD mm1,mm2/m640F 0F /r BB [lower-alpha 2] Packed Swap Doubleword:
dst[31:0] <- src[63:32]
dst[63:32] <- src[31:0]
PFNACC mm1,mm2/m640F 0F /r 8APacked Floating-Point Negative Accumulate:
dst[31:0] <- dst[31:0] − dst[63:32]
dst[63:32] <- src[31:0] − src[63:32]
PFPNACC mm1,mm2/m640F 0F /r 8EPacked Floating-Point Positive-Negative Accumulate:
dst[31:0] <- dst[31:0] − dst[63:32]
dst[63:32] <- src[31:0] + src[63:32]
  1. 1 2 The PF2IW and PI2FW instructions also existed as undocumented instructions on the original K6-2.

    The undocumented variant of PF2IW in K6-2 would set the top 16 bits of each 32-bit result lane to all-0s, while the documented variant in later processors would sign-extend the 16-bit result to 32 bits. [16] [17]

  2. The PSWAPD instruction uses same opcode as the older undocumented K6-2 PSWAPW instruction. [17]

3DNow! instructions specific to Geode GX and LX

InstructionOpcodeInstruction description
PFRCPV mm1,mm2/m640F 0F /r 86Packed Floating-point Reciprocal Approximation
PFRSQRTV mm1,mm2/m640F 0F /r 87Packed Floating-point Reciprocal Square Root Approximation

SSE5 derived instructions

SSE5 was a proposed SSE extension by AMD, using a new "DREX" instruction encoding to add support for new 3-operand and 4-operand instructions to SSE. [18] The bundle did not include the full set of Intel's SSE4 instructions, making it a competitor to SSE4 rather than a successor.

AMD chose not to implement SSE5 as originally proposed − it was instead reworked into FMA4 and XOP, [19] which provided similar functionality but with a quite different instruction encoding − using the VEX prefix for the FMA4 instructions and the new VEX-like XOP prefix for most of the remaining instructions.

XOP instructions

Introduced with the bulldozer processor core, removed again from Zen (microarchitecture) onward.

A revision of most of the SSE5 instruction set.

The XOP instructions mostly make use of the XOP prefix, which is a 3-byte prefix with the following layout:

Byte 0Byte 1Byte 2
Bits7:07654321076543210
Usage8FhmmmmmWv̅v̅v̅v̅Lpp

where:

  • Overlines indicate inverted bits.
  • The R/X/B bits are argument extension bits similar to the RXB bits of the REX prefix.
  • mmmmm is an opcode-map specifier. While capable of encoding values from 8 to 31 (values 0 to 7 map to ModR/M-encoded variants of the older POP instruction, making them unusable for XOP), only maps 8, 9 and 0Ah were ever used: map 8 for instructions that take an 8-bit immediate, map 9 for instructions that don't take an immediate, and map 0Ah for instructions that take a 32-bit immediate.
  • W is used in a couple of different ways:
    • For XOP vector instructions, W is used to swap the last two vector source arguments to the instruction. For instructions that allow W=1, encodings with W=0 allow the second-to-last vector argument to be a memory argument, while encodings with W=1 allow the last vector argument to be a memory argument. For instructions that don't allow their last two vector arguments to be swapped, W is required to be 0.
    • For XOP-encoded integer-register instructions (the TBM and LWP instruction set extensions, see below), W is used for operand size. (0=32-bit, 1=64-bit)
  • vvvv is an extra source register argument, normally the first non-r/m source argument for instructions with ≥3 register arguments.
  • L is a vector length specifier. L=1 indicates 256-bit operation, L=0 indicates scalar or 128-bit operation.
  • pp is an embedded prefix − nominally 0/1/2/3=none/66h/F2h/F3h, but only 0 was ever used with any of the instructions defined for the XOP prefix.

The XOP instructions encoded with the XOP prefix are as follows:

Instruction descriptionInstruction mnemonicsOpcodeW=1
swap
allowed
L=1
(256b)
allowed
Extract fractional portion of floating-point value.Packed FP32VFRCZPS ymm1,ymm2/m256XOP.9 80 /rNoYes
Packed FP64VFRCZPD ymm1,ymm2/m256XOP.9 81 /rNoYes
Scalar FP32VFRCZSS xmm1,xmm2/m32XOP.9 82 /rNoNo
Scalar FP64VFRCZSD xmm1,xmm2/m64XOP.9 83 /rNoNo
Vector per-bit-lane conditional move.

VPCMOV dst,src1,src2,src3 performs the equivalent of dst <- (src1 AND src3) OR (src2 AND NOT(src3))

VPCMOV ymm1,ymm2,ymm3/m256,ymm4XOP.8 A2 /r /is4YesYes
Vector integer compare.

For each vector-register lane, compare src1 to src2, then set destination to all-1s if the comparison passes, all-0s if it fails. The imm8 argument specifies comparison function to perform:

  • 0: LT (less-than)
  • 1: LE (less-than-or-equal)
  • 2: GT (greater-than)
  • 3: GE (greater-than-or-equal)
  • 4: EQ (equal)
  • 5: NE (not-equal)
  • 6: FALSE (always-false)
  • 7: TRUE (always-true)
Signed 8-bit lanesVPCOMB xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 CC /r ibNoNo
Signed 16-bit lanesVPCOMW xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 CD /r ib
Signed 32-bit lanesVPCOMD xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 CE /r ib
Signed 64-bit lanesVPCOMQ xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 CF /r ib
Unsigned 8-bit lanesVPCOMUB xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 EC /r ib
Unsigned 16-bit lanesVPCOMUW xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 ED /r ib
Unsigned 32-bit lanesVPCOMUD xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 EE /r ib
Unsigned 64-bit lanesVPCOMUQ xmm1,xmm2,xmm3/m128,imm8 [lower-alpha 1] XOP.8 EF /r ib
Vector Integer Horizontal Add.

For each N-bit lane, split the lane into a series of M-bit lanes, add the M-bit lanes together, then store the result into the destination as an N-bit zero/sign-extended value.

2x8bit -> 16bit, signedVPHADDBW xmm1,xmm2/m128XOP.9 C1 /rNoNo
4x8bit -> 32bit, signedVPHADDBD xmm1,xmm2/m128XOP.9 C2 /r
8x8bit -> 64bit, signedVPHADDBQ xmm1,xmm2/m128XOP.9 C3 /r
2x16bit -> 32bit, signedVPHADDWD xmm1,xmm2/m128XOP.9 C6 /r
4x16bit -> 64bit, signedVPHADDWQ xmm1,xmm2/m128XOP.9 C7 /r
2x32bit -> 64bit, signedVPHADDDQ xmm1,xmm2/m128XOP.9 CB /r
2x8bit -> 16bit, unsignedVPHADDUBW xmm1,xmm2/m128XOP.9 D1 /r
4x8bit -> 32bit, unsignedVPHADDUBD xmm1,xmm2/m128XOP.9 D2 /r
8x8bit -> 64bit, unsignedVPHADDUBQ xmm1,xmm2/m128XOP.9 D3 /r
2x16bit -> 32bit, unsignedVPHADDUWD xmm1,xmm2/m128XOP.9 D6 /r
4x16bit -> 64bit, unsignedVPHADDUWQ xmm1,xmm2/m128XOP.9 D7 /r
2x32bit -> 64bit, unsignedVPHADDUDQ xmm1,xmm2/m128XOP.9 DB /r
Vector Integer Horizontal Subtract.

For each N-bit lane, split the lane into two signed sub-lanes of N/2 bits each, then subtract the upper lane from the lower lane, then store the result as a signed N-bit result.

2x8bit -> 16bitVPHSUBBW xmm1,xmm2/m128XOP.9 E1 /rNoNo
2x16bit -> 32bitVPHSUBWD xmm1,xmm2/m128XOP.9 E2 /r
2x32bit -> 64bitVPHSUBDQ xmm1,xmm2/m128XOP.9 E3 /r
Vector Signed Integer Multiply-Add.

For each N-bit lane, perform dest <- src1*src2 + src3

For src1 and src2, the factors to multiply may be taken as signed values from the low half of each lane, high half of each lane or the lane in full (picked in the same way for src1 and src2) − the addend and the result use the full lane.

16-bit, full-laneVPMACSWW xmm1,xmm2,xmm3/m128,xmm4XOP.8 95 /r /is4NoNo
32-bit, low-halfVPMACSWD xmm1,xmm2,xmm3/m128,xmm4XOP.8 96 /r /is4
64-bit, low-halfVPMACSDQL xmm1,xmm2,xmm3/m128,xmm4XOP.8 97 /r /is4
32-bit, full-laneVPMACSDD xmm1,xmm2,xmm3/m128,xmm4XOP.8 9E /r /is4
64-bit, high-halfVPMACSDQH xmm1,xmm2,xmm3/m128,xmm4XOP.8 9F /r /is4
16-bit, full-lane, saturatingVPMACSSWW xmm1,xmm2,xmm3/m128,xmm4XOP.8 85 /r /is4
32-bit, low-half, saturatingVPMACSSWD xmm1,xmm2,xmm3/m128,xmm4XOP.8 86 /r /is4
64-bit, low-half, saturatingVPMACSSDQL xmm1,xmm2,xmm3/m128,xmm4XOP.8 87 /r /is4
32-bit, full-lane, saturatingVPMACSSDD xmm1,xmm2,xmm3/m128,xmm4XOP.8 8E /r /is4
64-bit, high-half, saturatingVPMACSSDQH xmm1,xmm2,xmm3/m128,xmm4XOP.8 8F /r /is4
Packed multiply, add and accumulate signed word to signed doubleword.

For each 32-bit lane, treat src1 and src2 as 2-component vectors of signed 16-bit values, then compute their dot-product, then add src3 as a 32-bit value.

with saturationVPMADCSSWD xmm1,xmm2,xmm3/m128,xmm4XOP.8 A6 /r /is4NoNo
without saturationVPMADCSWD xmm1,xmm2,xmm3/m128,xmm4XOP.8 B6 /r /is4
Packed Permute Bytes.

For VPPERM dst,src1,src2,src3, src2:src1 are considered a 32-element vector of bytes. For each byte-lane, the byte in src3 is used to index into this 32-byte vector and transform the element:

  • bits 4:0 is used to pick one of the 32 bytes.
  • bits 7:6 specify a transform to perform on the byte (0=keep, 1=bitreverse, 2=set-to-zero, 3=replicate-MSB)
  • bit 5, if set, inverts the result after the transform.
VPPERM xmm1,xmm2,xmm3/m128,xmm4XOP.8 A3 /r /is4YesNo
Packed left-rotate.

Rotation amount is given in the last source argument. It may be provided as an immediate or a vector register − in the latter case, the rotation amount is provided on a per-lane basis.

8-bit lanesVPROTB xmm1,xmm2/m128,xmm3XOP.9 90 /rYesNo
VPROTB xmm1,xmm2/m128,imm8XOP.8 C0 /r ibNo
16-bit lanesVPROTW xmm1,xmm2/m128,xmm3XOP.9 91 /rYes
VPROTW xmm1,xmm2/m128,imm8XOP.8 C1 /r ibNo
32-bit lanesVPROTD xmm1,xmm2/m128,xmm3XOP.9 92 /rYes
VPROTD xmm1,xmm2/m128,imm8XOP.8 C2 /r ibNo
64-bit lanesVPROTQ xmm1,xmm2/m128,xmm3XOP.9 93 /rYes
VPROTQ xmm1,xmm2/m128,imm8XOP.8 C3 /r ibNo
Packed shift, with signed shift-amounts.

Shift-amount is provided on a per-vector-lane basis, and is taken from the bottom 8 bits of each lane of the last source argument. The shift-amount is considered signed − a positive value will cause left-shift, while a negative value causes right-shift.

8-bit, signedVPSHAB xmm1,xmm2/m128,xmm3XOP.9 98 /rYesNo
16-bit, signedVPSHAW xmm1,xmm2/m128,xmm3XOP.9 99 /r
32-bit, signedVPSHAD xmm1,xmm2/m128,xmm3XOP.9 9A /r
64-bit, signedVPSHAQ xmm1,xmm2/m128,xmm3XOP.9 9B /r
8-bit, unsignedVPSHLB xmm1,xmm2/m128,xmm3XOP.9 94 /r
16-bit, unsignedVPSHLW xmm1,xmm2/m128,xmm3XOP.9 95 /r
32-bit, unsignedVPSHLD xmm1,xmm2/m128,xmm3XOP.9 96 /r
64-bit, unsignedVPSHLQ xmm1,xmm2/m128,xmm3XOP.9 97 /r
  1. 1 2 3 4 5 6 7 8 For each VPCOM* instruction, a series of alias mnemonics are available for the instruction, one for each of the eight comparison functions encodable in the imm8 argument. These alias mnemonics specify the comparison to perform after the "VPCOM" part of the mnemonic. For example:
    • VPCOMEQB xmm1,xmm2,xmm3 is an alias for VPCOMB xmm1,xmm2,xmm3,4
    • VPCOMFALSEUQ xmm1,xmm2,[ebx] is an alias for VPCOMUQ xmm1,xmm2,[ebx],6

XOP also included two vector instructions that used the VEX prefix instead of the XOP prefix:

Instruction descriptionInstruction mnemonicsOpcodeW=1
swap
allowed
L=1
(256b)
allowed
Permute two-source double-precision floating-point values.VPERMIL2PD ymm1,ymm2,ymm3/m256,ymm4,imm4VEX.NP.0F3A 49 /r /is4YesYes
Permute two-source single-precision floating-point values.VPERMIL2PS ymm1,ymm2,ymm3/m256,ymm4,imm4VEX.NP.0F3A 48 /r /is4YesYes

The instructions VPERMIL2PD and VPERMIL2PS were originally defined by Intel in early drafts of the AVX specification [20] − they were removed in later drafts [21] and were never implemented in any Intel processor. They were, however, implemented by AMD, who designated them as being a part of the XOP instruction set extension. (Like the other parts of XOP, they've been removed in AMD Zen.)

FMA4 instructions

Supported in AMD processors starting with the Bulldozer architecture, removed in Zen. Not supported by any Intel chip as of 2023.

Fused multiply-add with four operands. FMA4 was realized in hardware before FMA3.

InstructionOpcodeMeaningNotes
VFMADDPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 69 /r /is4Fused Multiply-Add of Packed Double-Precision Floating-Point Values
VFMADDPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 68 /r /is4Fused Multiply-Add of Packed Single-Precision Floating-Point Values
VFMADDSD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6B /r /is4Fused Multiply-Add of Scalar Double-Precision Floating-Point Values
VFMADDSS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6A /r /is4Fused Multiply-Add of Scalar Single-Precision Floating-Point Values
VFMADDSUBPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 5D /r /is4Fused Multiply-Alternating Add/Subtract of Packed Double-Precision Floating-Point Values
VFMADDSUBPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 5C /r /is4Fused Multiply-Alternating Add/Subtract of Packed Single-Precision Floating-Point Values
VFMSUBADDPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 5F /r /is4Fused Multiply-Alternating Subtract/Add of Packed Double-Precision Floating-Point Values
VFMSUBADDPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 5E /r /is4Fused Multiply-Alternating Subtract/Add of Packed Single-Precision Floating-Point Values
VFMSUBPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6D /r /is4Fused Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFMSUBPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6C /r /is4Fused Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFMSUBSD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6F /r /is4Fused Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFMSUBSS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 6E /r /is4Fused Multiply-Subtract of Scalar Single-Precision Floating-Point Values
VFNMADDPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 79 /r /is4Fused Negative Multiply-Add of Packed Double-Precision Floating-Point Values
VFNMADDPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 78 /r /is4Fused Negative Multiply-Add of Packed Single-Precision Floating-Point Values
VFNMADDSD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7B /r /is4Fused Negative Multiply-Add of Scalar Double-Precision Floating-Point Values
VFNMADDSS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7A /r /is4Fused Negative Multiply-Add of Scalar Single-Precision Floating-Point Values
VFNMSUBPD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7D /r /is4Fused Negative Multiply-Subtract of Packed Double-Precision Floating-Point Values
VFNMSUBPS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7C /r /is4Fused Negative Multiply-Subtract of Packed Single-Precision Floating-Point Values
VFNMSUBSD xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7F /r /is4Fused Negative Multiply-Subtract of Scalar Double-Precision Floating-Point Values
VFNMSUBSS xmm0, xmm1, xmm2, xmm3C4E3 WvvvvL01 7E /r /is4Fused Negative Multiply-Subtract of Scalar Single-Precision Floating-Point Values

Trailing Bit Manipulation Instructions

AMD introduced TBM together with BMI1 in its Piledriver [22] line of processors; later AMD Jaguar and Zen-based processors do not support TBM. [23] No Intel processors (as of 2023) support TBM.

The TBM instructions are all encoded using the XOP prefix. They are all available in 32-bit and 64-bit forms, selected with the XOP.W bit (0=32bit, 1=64bit). (XOP.W is ignored outside 64-bit mode.) Like all instructions encoded with VEX/XOP prefixes, they are unavailable in Real Mode and Virtual-8086 mode.

InstructionOpcodeDescription [24] Equivalent C expression [25]
BEXTR reg,r/m,imm32XOP.A 10 /r imm32Bit field extract (immediate form) [lower-alpha 1]

The imm32 is interpreted as follows:

  • Bit 7:0 : start position
  • Bit 15:8 : length
  • Bit 31:16 : ignored
(src >> start) & ((1 << len) − 1)
BLCFILL reg,r/mXOP.9 01 /1Fill from lowest clear bitx & (x + 1)
BLCI reg,r/mXOP.9 02 /6Isolate lowest clear bitx | ~(x + 1)
BLCIC reg,r/mXOP.9 01 /5Isolate lowest clear bit and complement~x & (x + 1)
BLCMSK reg,r/mXOP.9 02 /1Mask from lowest clear bitx ^ (x + 1)
BLCS reg,r/mXOP.9 01 /3Set lowest clear bitx | (x + 1)
BLSFILL reg,r/mXOP.9 01 /2Fill from lowest set bitx | (x − 1)
BLSIC reg,r/mXOP.9 01 /6Isolate lowest set bit and complement~x | (x − 1)
T1MSKC reg,r/mXOP.9 01 /7Inverse mask from trailing ones~x | (x + 1)
TZMSK reg,r/mXOP.9 01 /4Mask from trailing zeros~x & (x − 1)
  1. For BEXTR, a register form is available as part of BMI1.

Lightweight Profiling instructions

The AMD Lightweight Profiling (LWP) feature was introduced in AMD Bulldozer and removed in AMD Zen. On all supported CPUs, the latest available microcode updates have disabled LWP due to Spectre mitigations. [26]

These instructions are available in Ring 3, but not available in Real Mode and Virtual-8086 mode. All of them use the XOP prefix.

InstructionOpcodeDescription
LLWPCB r32/64XOP.9 12 /0Load LWPCB (Lightweight Profiling Control Block) address. [lower-alpha 1]

Loading an address of 0 disables LWP. Loading a nonzero address will cause the CPU to perform validation of the specified LWPCB, then enable LWP if the validation passed. If LWP was already enabled, state for the previous LWPCB is flushed to memory.

SLWPCB r32/64XOP.9 12 /1Store LWPCB address [lower-alpha 1] to register, and flush LWP state to memory.

If LWP is not enabled, the stored address is 0.

LWPINS r32/64, r/m32, imm32XOP.A 12 /0 imm32Insert user event record with EventID=255 in LWP ring buffer. The arguments are inserted into the event record as follows:
  • The first argument is stored in bytes 23:16 (zero-extended if 32-bit)
  • The second argument is stored in bytes 7:4
  • The low 16 bits of the imm32 are stored in bytes 3:2 (the high 16 bits are ignored)

The LWPINS instruction sets CF=1 if LWP is enabled and the ring buffer is full, CF=0 otherwise.

LWPVAL r32/64, r/m32, imm32XOP.A 12 /1 imm32Decrement the event counter associated with the programmed value sample event. If the resulting counter value ends up negative, insert an event record with EventID=1 in LWP ring buffer. (The instruction arguments are inserted in this record in the same way as for LWPINS.)

Executes as NOP if LWP is not enabled or if the event counter is not enabled. If no event record is inserted, then the second argument (which may be a memory argument) is not accessed.

  1. 1 2 The address used by LLWPCB and SLWPCB is an effective-address, specified relative to the DS: segment base address. LLWPCB converts this effective-address to a linear-address by adding the DS base address to it, and SLWPCB converts it back by subtracting the DS base address. Changing the DS base address while LWP is enabled will thereby cause SLWPCB to return a different address than what was specified to LLWPCB, and may also cause XSAVE to fail to save LWP state properly.

Instructions from other vendors

Instructions specific to NEC V-series processors

These instructions are specific to the NEC V20/V30 CPUs and their successors, and do not appear in any non-NEC CPUs. Many of their opcodes have been reassigned to other instructions in later non-NEC CPUs.

InstructionOpcodeDescriptionAvailable on
TEST1 r/m8, CL
TEST1 r/m16, CL
0F 10 /0
0F 11 /0
Test one bit.

First argument specifies an 8/16-bit register or memory location.

Second argument specifies which bit to test.

All V-series [27] except V30MZ [28]
TEST1 r/m8, imm8
TEST1 r/m16, imm8
0F 18 /0 ib
0F 19 /0 ib
CLR1 r/m8, CL
CLR1 r/m16, CL
0F 12 /0
0F 13 /0
Clear one bit.
CLR1 r/m8, imm8
CLR1 r/m16, imm8
0F 1A /0 ib
0F 1B /0 ib
SET1 r/m8, CL
SET1 r/m16, CL
0F 14 /0
0F 15 /0
Set one bit.
SET1 r/m8, imm8
SET1 r/m16, imm8
0F 1C /0 ib
0F 1D /0 ib
NOT1 r/m8, CL
NOT1 r/m16, CL
0F 16 /0
0F 17 /0
Invert one bit.
NOT1 r/m8, imm8
NOT1 r/m16, imm8
0F 1E /0 ib
0F 1F /0 ib
ADD4S0F 20Add Nibble Strings.

Performs a string addition of integers in packed BCD format (2 BCD digits per byte). DS:SI points to a source integer, ES:DI to a destination integer, and CL provides the number of digits to add. The operation is then:

destination <- destination + source

SUB4S0F 22Subtract Nibble Strings.

destination <- destination − source

CMP4S0F 26Compare Nibble Strings.
ROL4 r/m80F 28 /0Rotate Left Nibble.

Concatenates its 8-bit argument with the bottom 4 bits of AL to form a 12-bit bitvector, then left-rotates this bitvector by 4 bits, then writes this bitvector back to its argument and the bottom 4 bits of AL.

ROR4 r/m80F 2A /0Rotate Right Nibble. Similar to ROL4, except performs a right-rotate by 4 bits.
EXT r8,r80F 33 /rBitfield extract.

Perform a bitfield read from memory. DS:SI (DS0:IX in NEC nomenclature) points to memory location to read from, first argument specifies bit-offset to read from, and second argument specifies the number of bits to read minus 1. The result is placed in AX. After the bitfield read, SI and the first argument are updated to point just beyond the just-read bitfield.

EXT r8,imm80F 3B /0 ib
INS r8,r80F 31 /rBitfield Insert.

Perform a bitfield write to memory. ES:DI (DS1:IY in NEC nomenclature) points to memory location to write to, AX contains data to write, first argument specifies bit-offset to write to, and second argument specifies the number of bits to write minus 1. After the bitfield write, DI and the first argument are updated to point just beyond the just-written bitfield.

INS r8,imm80F 39 /0 ib
REPC64Repeat if carry. Instruction prefix for use with CMPS/SCAS.
REPNC65Repeat if not carry. Instruction prefix for use with CMPS/SCAS.
FPO266 /r
67 /r
"Floating Point Operation 2": extra escape opcodes for floating-point coprocessor, in addition to the standard D8-DF ones used for x87.

The FPO2 escape opcodes are used by the NEC 72291 floating-point coprocessor - this coprocessor also uses the standard D8-DF escape opcodes, but uses them to encode an instruction set that is unique to the 72291 and not compatible with x87. A listing of the opcodes/instructions supported by the 72291 is available. [29]

BRKEM imm80F FF ibBreak to 8080 emulation mode.

Jump to an address picked from the IVT (Interrupt Vector Table) using the imm8 argument, similar to the 8086 INT instruction, but start executing as Intel 8080 code rather than x86 code.

V20, V30, V40, V50 [27]
BRKXA imm80F E0 ibBreak to Extended Address Mode.

Jump to an address picked from the IVT using the imm8 argument. Enables a simple memory paging mechanism after reading the IVT but before executing the jump. The paging mechanism uses an on-chip page table with 16Kbyte pages and no access rights checking. [30]

V33, V53 [27]
RETXA imm80F F0 ibReturn from Extended Address Mode.

Jump to an address picked from the IVT using the imm8 argument. Disables paging after reading the IVT but before executing the jump.

MOVSPA0F 25Transfer both SS and SP of old register bank after the bank has been switched by an interrupt or BRKCS instruction.V25, V35, [31] V55 [32]
BRKCS r160F 2D /0Perform software interrupt with context switch to register bank specified by low 3 bits of r16.
RETRBI0F 91Return from register bank context switch interrupt.
FINT0F 92Finish Interrupt.
TSKSW r160F 94 /7Perform task switch to register bank indicated by low 3 bits of r16.
MOVSPB r160F 95 /7Transfer SS and SP of current register bank to register bank indicated by low 3 bits of r16.
BTCLR imm8,imm8,cb0F 9C ib ib rel8Bit Test and Clear.

The first argument specifies a V25/V35 Special Function Register to test a bit in. The second argument specifies a bit position in that register. The third argument specifies a short branch offset. If the bit was set to 1, then it is cleared and a short branch is taken, else the branch is not taken.

STOP0F 9ECPU Halt.

Differs from the conventional 8086 HLT instruction in that the clock is stopped too, so that an NMI or CPU reset is needed to resume operation.

BRKS imm8F1 ibBreak and Enable Software Guard.

Jump to an address picked from the IVT using the imm8 argument, and then continue execution with "Software Guard" enabled. The "Software Guard" is an 8-bit Substitution cipher that, during instruction fetch/decode, translates opcode bytes using a 256-entry lookup table stored in an on-chip Mask ROM.

V25, V35 "Software Guard" [33]
BRKN imm863 ibBreak and Enable Native Mode. Similar to BRKS, excepts disables "Software Guard" rather than enabling it.
MOV r/m,DS38C /6Move to/from the DS2 and DS3 extended segment registers.

The DS2 and DS3 registers (which are specific to the NEC V55) act similar to regular x86 real mode segment registers except that they are left-shifted by 8 rather than 4, enabling access to 16MB of memory. Block transfer instructions, such as MOVBKW, can access the 16MB memory space by simultaneously prefixing with DS2 and DS3. [34]

V55 [32]
MOV r/m,DS28C /7
MOV DS3,r/m8E /6
MOV DS2,r/m8E /7
PUSH DS30F 76 [35]
POP DS30F 77
PUSH DS20F 7E
POP DS20F 7F
MOV DS3,r16,m320F 36 /rInstructions to load both extended segment register and general-purpose register at once, similar to 8086's LDS and LES instructions
MOV DS2,r16,m320F 3E /r
DS2:63Segment-override prefixes for the DS2 and DS3 extended segments.
DS3:D6
IRAM:F1Register File Override Prefix. Will cause memory operands to index into register file rather than general memory.
BSCH r/m8
BSCH r/m16
0F 3C /0
0F 3D /0
Count Trailing Zeroes and store result in CL. Sets ZF=1 for all-0s input.
RSTWDT imm8,imm80F 96 ib ibWatchdog Timer Manipulation Instruction.
BTCLRL imm8,imm8,cb0F 9D ib ib rel8Bit test and clear for second bank of special purpose registers (similar to BTCLR).
QHOUT imm160F E0 iwQueue manipulation instructions.
QOUT imm160F E1 iw
QTIN imm160F E2 iw
IDLE0F 9FPut CPU in idle mode.V55SC [36]
ALBIT0F 9ADedicated fax instructions.V55PI [32]
COLTRP0F 9B
MHENC0F 93
MRENC0F 97
SCHEOL0F 78
GETBIT0F 79
MHDEC0F 7C
MRDEC0F 7D
CNVTRP0F 7A
(no mnemonic)63Designated opcode for termination of the x86 emulation mode on the NEC V60. [37] V60, V70

Instructions specific to Cyrix and Geode CPUs

These instructions are present in Cyrix CPUs as well as NatSemi/AMD Geode CPUs derived from Cyrix microarchitectures (Geode GX and LX, but not NX). They are also present in Cyrix manufacturing partner CPUs from IBM, ST and TI, as well as the VIA Cyrix III ("Joshua" core only, not "Samuel") and a few SoCs such as STPC ATLAS and ZFMicro ZFx86. [38] Many of these opcodes have been reassigned to other instructions in later non-Cyrix CPUs.

InstructionOpcodeDescriptionAvailable on
SVDC m80,sreg0F 78 /rSave segment register and descriptor to memory as a 10-byte data structure.

The first 8 bytes are the descriptor, the last two bytes are the selector. [39]

System Management Mode instructions. [lower-alpha 1]

Not present on stepping A of Cx486SLC and Cx486DLC. [40]

Present on Cx486SLC/e [41] and all later Cyrix CPUs.

Present on all Cyrix-derived Geode CPUs.

RSDC sreg,m80 [lower-alpha 2] 0F 79 /rRestore segment register and descriptor from memory
SVLDT m800F 7A /0Save LDTR and descriptor
RSLDT m800F 7B /0Restore LDTR and descriptor
SVTS m800F 7C /0Save TSR and descriptor
RSTS m800F 7D /0Restore TSR and descriptor
SMINT [lower-alpha 3] 0F 7ESystem management software interrupt.

Uses 0F 7E encoding on Cyrix 486, 5x86, 6x86 and ZFx86.

Uses 0F 38 encoding on Cyrix 6x86MX, MII, MediaGX and Geode.

0F 38
RDSHR r/m320F 36 /0 [lower-alpha 4] Read SMM Header Pointer RegisterCyrix 6x86MX [43] and MII

VIA Cyrix III [46]

WRSHR r/m320F 37 /0 [lower-alpha 4] Write SMM Header Pointer Register
BB0_RESET0F 3AReset BLT Buffer Pointer 0 to baseCyrix MediaGX and MediaGXm [47]

NatSemi Geode GXm, GXLV, GX1

BB1_RESET0F 3BReset BLT Buffer Pointer 1 to base
CPU_WRITE0F 3CWrite to CPU internal special register (EBX=register-index, EAX=data)
CPU_READ0F 3DRead from CPU internal special register (EBX=register-index, EAX=data)
DMINT0F 39Debug Management Mode InterruptNatSemi Geode GX2

AMD Geode GX, LX [42]

RDM0F 3AReturn from Debug Management Mode
  1. The Cyrix SMM instructions also include RSM (0F AA; Return from System Management mode), however, RSM is not a Cyrix-specific instruction, and it continues to exist in modern non-Cyrix x86 processors.
  2. RSDC with CS as a destination register is only supported on NatSemi Geode GX2 and AMD Geode GX/LX [42] - on other processors, it causes #UD.
  3. Some assemblers/disassemblers, such as NASM, use the instruction mnemonic SMINTOLD for the 0F 7E encoding.
  4. 1 2 For the RDSHR and WRSHR instructions, Cyrix's documentation [43] specifies that the instruction accepts a ModR/M byte but does not specify the encoding of the ModR/M byte's reg field. NASM v0.98.31 and later uses /0 for these instructions, [44] while sandpile.org's opcode tables [45] indicate that the reg field is ignored for these instructions.

Cyrix EMMI instructions

These instructions were introduced in the Cyrix 6x86MX and MII processors, and were also present in the MediaGXm and Geode GX1 [48] processors. (In later non-Cyrix processors, all of their opcodes have been used for SSE or SSE2 instructions.)

These instructions are integer SIMD instructions acting on 64-bit vectors in MMX registers or memory. Each instruction takes two explicit operands, where the first one is an MMX register operand and the second one is either a memory operand or a second MMX register. In addition, several of the instructions take an implied operand, which is an MMX register implied from the first operand as follows:

First explicit operandmm0mm1mm2mm3mm4mm5mm6mm7
Implied operandmm1mm0mm3mm2mm5mm4mm7mm6

In the instruction descriptions in the below table, arg1 and arg2 refer to the two explicit operands of the instruction, and imp to the implied operand.

InstructionOpcodeDescription
PAVEB mm,mm/m640F 50 /rPacked average bytes: [lower-alpha 1]
arg1 <- (arg1+arg2) >> 1
PADDSIW mm,mm/m640F 51 /rPacked add signed words with saturation, using implied destination:
imp <- saturate_s16(arg1+arg2)
PMAGW mm,mm/m640F 52 /rPacked signed word magnitude maximum value:
if (abs(arg2) > abs(arg1)) then arg1 <- arg2
PDISTIB mm,m64 [lower-alpha 2] 0F 54 /rPacked unsigned byte distance and accumulate to implied destination, with saturation:
imp <- saturate_u8(imp + (abs(arg1-arg2)))
PSUBSIW mm,mm/m640F 55 /rPacked subtract signed words with saturation, using implied destination:
imp <- saturate_s16(arg1-arg2)
PMULHRW mm,mm/m64, [lower-alpha 3]
PMULHRWC mm,mm/m64
0F 59 /rPacked signed word multiply high with rounding:
arg1 <- (arg1*arg2+0x4000)>>15
PMULHRIW mm,mm/m640F 5D /rPacked signed word multiply high with rounding and implied destination:
imp <- (arg1*arg2+0x4000)>>15
PMACHRIW mm,m64 [lower-alpha 2] 0F 5E /rPacked signed word multiply high with rounding and accumulation to implied destination:
imp <- imp + ((arg1*arg2+0x4000)>>15)
PMVZB mm,m64 [lower-alpha 2] 0F 58 /rif (imp == 0) then arg1 <- arg2Packed conditional load from memory to MMX register.

Condition is evaluated on a per-byte-lane basis, by comparing byte lanes in the implied source to zero (with signed compare) − if the comparison passes, then the corresponding destination lane is loaded from memory, otherwise it keeps its original value.

PMVNZB mm,m64 [lower-alpha 2] 0F 5A /rif (imp != 0) then arg1 <- arg2
PMVLZB mm,m64 [lower-alpha 2] 0F 5B /rif (imp <  0) then arg1 <- arg2
PMVGEZB mm,m64 [lower-alpha 2] 0F 5C /rif (imp >= 0) then arg1 <- arg2
  1. Implementations differ on whether the PAVEB instruction treats the bytes as signed or unsigned. [49]
  2. 1 2 3 4 5 6 For PDISTIB, PMACHRIW and the PMV* instructions, the second explicit operand is required to be a memory operand − register operands are not supported.
  3. The Cyrix EMMI PMULHRW instruction has the same mnemonic as the 3DNow! PMULHRW instruction, however its opcode and function differ (the EMMI instruction right-shifts its multiply-result by 15 bits, while the 3DNow! instruction right-shifts by 16 bits).

    Some assemblers/disassemblers, such as NASM, resolve this ambiguity by using the mnemonic PMULHRWA for the 3DNow! instruction and PMULHRWC for the EMMI instruction.

Instructions specific to Chips and Technologies CPUs

The C&T F8680 PC/Chip is a system-on-a-chip featuring an 80186-compatible CPU core, with a few additional instructions to support the F8680-specific "SuperState R" [50] supervisor/system-management feature. Some of the added instructions for "SuperState R" are: [51]

InstructionOpcodeDescription
LFEAT AXFE F8Load datum into F8680 "CREG" configuration register (AH=register-index, AL=datum) [52]
STFEAT AL,imm8FE F0 ibRead F8680 status register into AL (imm8=register-index)

C&T also developed a 386-compatible processor known as the Super386. This processor supports, in addition to the basic Intel 386 instruction set, a number of instructions to support the Super386-specific "SuperState V" system-management feature. The added instructions for "SuperState V" are: [6]

InstructionOpcodeDescription
SCALL r/m0F 18 /0Call SMM interrupt handler [53] [54]
SRET0F 19Return from SMM interrupt handler
SRESUME0F 1AReturn from SMM with interrupts disabled for one instruction
SVECTOR0F 1BExit from SMM and issue a shutdown cycle
EPIC0F 1ELoad one of the six interrupt or I/O traps
RARF10F 3CRead from bank 1 of the register file (includes visible and invisible CPU registers)
RARF20F 3DRead from bank 2 of the register file
RARF30F 3ERead from bank 3 of the register file
LTLB0F F0Load TLB with page table entry
RCT0F F1Read cache tag
WCT0F F2Write cache tag
RCD0F F3Read cache data
WCD0F F4Write cache data
RTLBPA0F F5Read TLB data (physical address)
RTLBLA0F F6Read TLB tag (linear address)
LCFG0F F7Load configuration register
SCFG0F F8Store configuration register
RGPR0F F9Read general-purpose register or any bank of register file
RARF00F FARead from bank 0 of the register file
RARFE0F FBRead from extra bank of the register file
WGPR0F FDWrite general-purpose register or any bank of register file
WARFE0F FEWrite extra bank of the register file

Instructions specific to ALi/Nvidia/DM&P M6117 MCUs

The M6117 series of embedded microcontrollers feature a 386SX-class CPU core with a few M6117-specific additions to the Intel 386 instruction set. The ones documented for DM&P M6117D are: [55]

InstructionOpcodeDescription
BRKPMF1System management interrupt − enters "hyper state mode"
RETPMD6 E6Return from "hyper state mode"
LDUSR UGRS,EAXD6 CA 03 A0Set page address of SMI entry point
(mnemonic not listed)D6 C8 03 A0Read page address of SMI entry point
MOV PWRCR,EAXD6 FA 03 02Write to power control register

Instructions present in specific 80387 clones

Several 80387-class floating-point coprocessors provided extra instructions in addition to the standard 80387 ones − none of these are supported in later processors:

InstructionOpcodeDescriptionAvailable on
FRSTPMDB F4 [56]

or

DB E5 [9]

FPU Reset Protected Mode.

Instruction to signal to the FPU that the main CPU is exiting protected mode, similar to how the FSETPM instruction is used to signal to the FPU that the CPU is entering protected mode.

Different sources provide different encodings for this instruction.

Intel 287XL
FNSTDW AXDF E1Store FPU Device Word to AXIntel 387SL [9] [57]
FNSTSG AXDF E2Store FPU Signature Register to AX [lower-alpha 1]
FSBP0DB E8Select Coprocessor Register Bank 0IIT 2c87, 3c87 [9] [59]
FSBP1DB EBSelect Coprocessor Register Bank 1
FSBP2DB EASelect Coprocessor Register Bank 2
FSBP3DB E9 [60] Select Coprocessor Register Bank 3 (undocumented)
F4X4,

FMUL4X4

DB F1Multiply 4-component vector with 4x4 matrix. For proper operation, the matrix must be preloaded into Coprocessor Register banks 1 and 2 (unique to IIT FPUs), and the vector must be loaded into Coprocessor Register Bank 0. Example code is available. [59] [61]
FTSTPD9 E6Equivalent to FTST followed by a stack pop.Cyrix 387+ [61]
FRINT2DB FCRound st(0) to integer, with round-to-nearest rounding.Cyrix EMC87, 83s87, 83d87, 387+ [61] [9]
FRICHOPDD FCRound st(0) to integer, with round-to-zero rounding.
FRINEARDF FCRound st(0) to integer, with round-to-nearest ties-away-from-zero rounding.
  1. The FNSTSG AX instruction can be executed not just on the Intel 387SL FPU but on the Intel 387SX as well - executing the instruction immediately after an FNINIT will cause the instruction to return 0000h on 387SX, but a nonzero signature value on the 387SL. [58]

See also

Related Research Articles

x86 Family of instruction set architectures

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of Intel's 8-bit 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486 processors. Colloquially, their names were "186", "286", "386" and "486".

<span class="mw-page-title-main">MMX (instruction set)</span> Instruction set designed by Intel

MMX is a single instruction, multiple data (SIMD) instruction set architecture designed by Intel, introduced on January 8, 1997 with its Pentium P5 (microarchitecture) based line of microprocessors, named "Pentium with MMX Technology". It developed out of a similar unit introduced on the Intel i860, and earlier the Intel i750 video pixel processor. MMX is a processor supplementary capability that is supported on IA-32 processors by Intel and other vendors as of 1997. AMD also added MMX instruction set in its K6 processor.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

<span class="mw-page-title-main">Athlon 64</span> Series of CPUs by AMD

The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP. The Athlon 64 was the second processor to implement the AMD64 architecture and the first 64-bit processor targeted at the average consumer. Variants of the Athlon 64 have been produced for Socket 754, Socket 939, Socket 940, and Socket AM2. It was AMD's primary consumer CPU, and primarily competed with Intel's Pentium 4, especially the Prescott and Cedar Mill core revisions.

3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of floating-point vector operations using vector registers. This improvement enhances the performance of many graphics-intensive applications. The first microprocessor to implement 3DNow! was the AMD K6-2, introduced in 1998. In appropriate applications, this enhancement raised the speed by about 2–4 times.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

x86-64 64-bit version of x86 architecture

x86-64 is a 64-bit version of the x86 instruction set, first announced in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mode.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX. Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware and microcode implementations of common numerical tasks, allowing these tasks to be performed much faster than corresponding machine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of the tangent function and its inverse, for example.

A test register, in the Intel 80386 and Intel 80486 processor, was a register used by the processor, usually to do a self-test. Most of these registers were undocumented, and used by specialized software. The test registers were named TR3 to TR7. Regular programs don't usually require these registers to work. With the Pentium, the test registers were replaced by a variety of model-specific registers (MSRs).

On the x86 architecture, a debug register is a register used by a processor for program debugging. There are six debug registers, named DR0...DR7, with DR4 and DR5 as obsolete synonyms for DR6 and DR7. The debug registers allow programmers to selectively enable various debug conditions associated with a set of four debug addresses. Two of these registers are used to control debug features. These registers are accessed by variants of the MOV instruction. A debug register may be either the source operand or destination operand. The debug registers are privileged resources; the MOV instructions that access them can only be executed at privilege level zero. An attempt to read or write the debug registers when executing at any other privilege level causes a general protection fault.

The SSE5 was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.

Advanced Vector Extensions are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions, and a new coding scheme.

The XOP instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011. However AMD removed support for XOP from Zen (microarchitecture) onward.

The VEX prefix and VEX coding scheme are an extension to the IA-32 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

The EVEX prefix and corresponding coding scheme is an extension to the 32-bit x86 (IA-32) and 64-bit x86-64 (AMD64) instruction set architecture. EVEX is based on, but should not be confused with the MVEX prefix used by the Knights Corner processor.

Intel microcode is microcode that runs inside x86 processors made by Intel. Since the P6 microarchitecture introduced in the mid-1990s, the microcode programs can be patched by the operating system or BIOS firmware to work around bugs found in the CPU after release. Intel had originally designed microcode updates for processor debugging under its design for testing (DFT) initiative.

References

  1. Intel Itanium Architecture Software Developer's Manual, volume 4, (document number: 323208, revision 2.3, May 2010).
  2. 1 2 Intel SDM, volume 1, order no. 253665-083, mar 2024, chapter 2.5
  3. R. Singhal, Yes. Deprecated. (about VP2INTERSECT), Jul 19, 2023. Archived on Jul 23, 2023.
  4. Intel, Intel® Xeon Phi™ Coprocessor Instruction Set Architecture Reference Manual, sep 2012, order no. 327364-001. Archived on 4 Aug 2021.
  5. Intel, Reference Implementations for Intel® Architecture Approximation Instructions VRCP14, VRSQRT14, VRCP28, VRSQRT28, and VEXP2, id #671685, Dec 28, 2015. Archived on Sep 18, 2023.

    C code "RECIP28EXP2.c" archived on Sep 18, 2023.

  6. 1 2 Microprocessor Report, System Management Mode Explained (vol 6, no. 8, june 17, 1992) − includes a listing of the AMD/Cyrix SMM opcodes and the C&T Super386 "SuperState V" opcodes. Archived on 29 Jun 2022.
  7. "Am386®SX/SXL/SXLV High-Performance, Low-Power, Embedded Microprocessors" (PDF)., publication #21020, rev A, apr 1997 − has SMM instruction descriptions on pages 5 and 6.
  8. Intel vs AMD, "Case No.C-93-20301 PVT, Findings of fact and conclusions of law following "ICE" module of trial". Oct 7, 1994. Archived from the original on 10 May 2021.
  9. 1 2 3 4 5 6 Potemkin's Hackers Group, OPCODE.LST v4.51, 15 Oct 1999. Archived on 21 May 2001.
  10. Hans Peter Messmer, "The Indispensable PC Hardware Book" (ISBN 0201403994), chapter 10.6.1, pages 280-281
  11. Frank van Gilluwe, "The Undocumented PC, second edition", 1997, ISBN   0-201-47950-8, page 120
  12. AMD, 3DNow! Technology Manual, pub.no. 21928G/0, March 2000. Archived on 9 Oct 2018.
  13. AMD, AMD64 Architecture Programmer’s Manual Volume 5, pub.no.26569, rev 3.16, Nov 2021 − provides details on how PFRCPIT1, PFRSQIT1 and PFRCPIT2 perform their Newton-Raphson iterations on pages 118 to 125. Archived on 24 Sep 2023.
  14. AMD, Geode LX Processors Data Book, pub.no. 33234H, Feb 2009, page 673. Archived on 15 Mar 2019.
  15. "Windows 10 64-bit requirements: Does my CPU support CMPXCHG16b, PrefetchW and LAHF/SAHF?".
  16. Grzegorz Mazur, AMD 3DNow! undocumented instructions
  17. 1 2 "Undocumented 3DNow! Instructions". grafi.ii.pw.edu.pl. Archived from the original on 30 January 2003. Retrieved 22 February 2022.
  18. AMD, AMD64 Technology: 128-bit SSE5 Instruction Set, pub.no. 43479, rev 3.01, Aug 2007. Archived from the original on Jan 24, 2009.
  19. AMD, AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4, pub.no. 43479, rev 3.04, Nov 2009. Archived on Oct 11, 2018.
  20. Intel, Advanced Vector Extensions Programming Reference, order no. 319433-003, August 2008 − contains specifications of VPERMIL2PD and VPERMIL2PS on pages 416 and 425, as well as FMA4 instructions on pages 618 to 665. Archived on Sep 24, 2023.
  21. Intel, Advanced Vector Extensions Programming Reference, order no. 319433-004, December 2008 − does not contain specifications of VPERMIL2PD and VPERMIL2PS and has FMA3 instead of FMA4. Archived on Sep 24, 2023.
  22. Hollingsworth, Brent. "New "Bulldozer" and "Piledriver" instructions" (PDF). Advanced Micro Devices, Inc. Archived from the original (PDF) on 26 Jul 2014. Retrieved 11 December 2014.
  23. "Family 16h AMD A-Series Data Sheet" (PDF). amd.com. AMD. October 2013. Archived from the original (PDF) on 7 Nov 2013. Retrieved 2014-01-02.
  24. "AMD64 Architecture Programmer's Manual, Volume 3: General-Purpose and System Instructions" (PDF). amd.com. AMD. October 2013. Archived from the original (PDF) on 4 Jan 2014. Retrieved 2014-01-02.
  25. "tbmintrin.h from GCC 4.8". Archived from the original on 23 Feb 2017. Retrieved 2014-03-17.
  26. Xen-devel mailing list, x86/svm: Drop support for AMD's Lightweight Profiling, 20 May 2019
  27. 1 2 3 NEC, 16-bit V-series User's Manual, sep 2000. Archived on Dec 2, 2021.
  28. NEC, V30MZ Preliminary User's Manual, 1998, page 14. Archived on Dec 2, 2021.
  29. NEC 72291 FPU: an instruction listing can be found in the HP 64873 V-series Cross Assembler Reference, pages F-31 to F-34.
  30. NEC 16-bit V-series Microprocessor Data Book, 1991, p. 360-361
  31. Renesas Data Sheet MOS Integrated Circuit uPD70320. Archived on Jan 6, 2022.
  32. 1 2 3 Renesas, NEC V55PI 16-bit microprocessor Data Sheet, U11775E. Archived on Jul 27, 2023.
  33. NEC 16-bit V-series Microprocessor Data Book, 1991, p. 765-766
  34. "V55PI 16-BIT MICROPROCESSOR". pp. 21–22. Retrieved 2024-01-18.
  35. Renesas, NEC V55PI Users Manual Instruction, U10231J (Japanese). Opcodes for PUSH/POP DS2/DS3 listed in macro definitions on p. 378. Archived on Dec 11, 2022.
  36. NEC V55SC 16-bit Microprocessor Preliminary Data Sheet (O.D.No ID-8206A, March 1993), pages 70 and 127. Located on Apr 20, 2022 by searching for "nec v55sc" at datasheetarchive.com. Archived on Nov 22, 2022.
  37. NEC uPD70616 Programmer's Reference Manual (november 1986), p.287. Archived on Dec 5, 2006.
  38. ZFMicro, ZFx86 System-on-a-chip Data Book 1.0 Rev D, june 5, 2005, section 2.2.6.3, page 76. Archived on Feb 11, 2009.
  39. Texas Instruments, TI486 Microprocessor Reference Guide, 1993, section A.14, page 308
  40. Debbie Wiles, CPU identification, archived on 2004-06-04
  41. Cyrix 486SLC/e Data Sheet (1992), section 2.6.4
  42. 1 2 AMD, Geode LX Processors Data Book, Feb 2009, publication ID 33234H, section 8.3.4, pages 643-657. Archived on 3 Dec 2023.
  43. 1 2 Cyrix 6x86MX Data Book, section 2.15.3
  44. NASM 0.98.31 documentation at SourceForge, see sections B.275 and B.331. Archived on Jul 21, 2023.
  45. Sandpile, x86 architecture 2 byte opcodes. Archived on Nov 3, 2011.
  46. VIA, Cyrix III Processor Data Book, v1.00, Jan 25, 2000, p. 103.
  47. Cyrix MediaGX Data Book, section 4.1.5
  48. AMD, AMD Geode GX1 Processor Data Book, rev 5.0, dec 2003, p. 226. Archived on 20 Apr 2020.
  49. Cyrix, Application Note 108 − Cyrix Extensions to the Multimedia Instruction Set, rev 0.93, 9 sep 1998, page 7
  50. BYTE Magazine, november 1991, page 245
  51. Institute Of Oceanographic Sciences, Sonic buoy − Formatter Handbook contains some F8680 instruction macros on page 34. Archived on Nov 4, 2018.
  52. The F8680 PC/Chip System Design Guide contains descriptions of many of the F8680 CREG registers.
  53. Michal Necasek, More on the C&T Super386
  54. Corexor, Calling C&T SCALL safely, 5 Dec 2015. Archived on 27 Oct 2020.
  55. DM&P, M6117D : System on a chip, pages 31,34,68. Archived on Jul 20,2006.
  56. Intel "Intel287 XL/XLT Math Coprocessor", (oct 1992, order no 290376-003) p.33
  57. Intel "Intel387 SL Mobile Math Coprocessor" (feb 1992, order no 290427-001), appendix A. Located on Jan 7, 2022 by searching for "intel387 sl" at datasheetarchive.com. Archived on Jan 7, 2022.
  58. Desmond Yuen, Intel's SL Architecture: Designing Portable Applications, (1993, ISBN 0-07-911336-2) p.127
  59. 1 2 IIT 3c87 Advanced Math CoProcessor Data Book
  60. Harald Feldmann, Hamarsoft 86BUGS List
  61. 1 2 3 Norbert Juffa "Everything You Always Wanted To Know About Math Coprocessors", 01-oct-94 revision