List of x86 cryptographic instructions

Last updated January 02, 2025

Instructions that have been added to the x86 instruction set in order to assist efficient calculation of cryptographic primitives, such as e.g. AES encryption, SHA hash calculation and random number generation.

Intel AES instructions

6 new instructions.

Instruction	Encoding	Description
`AESENC xmm1,xmm2/m128`	`66 0F 38 DC /r`	Perform one round of an AES encryption flow
`AESENCLAST xmm1,xmm2/m128`	`66 0F 38 DD /r`	Perform the last round of an AES encryption flow
`AESDEC xmm1,xmm2/m128`	`66 0F 38 DE /r`	Perform one round of an AES decryption flow
`AESDECLAST xmm1,xmm2/m128`	`66 0F 38 DF /r`	Perform the last round of an AES decryption flow
`AESKEYGENASSIST xmm1,xmm2/m128,imm8`	`66 0F 3A DF /r ib`	Assist in AES round key generation
`AESIMC xmm1,xmm2/m128`	`66 0F 38 DB /r`	Assist in AES Inverse Mix Columns

CLMUL instructions

Instruction	Opcode	Description
`PCLMULQDQ xmm1,xmm2,imm8`	`66 0F 3A 44 /r ib`	Perform a carry-less multiplication of two 64-bit polynomials over the finite field GF(2^k).
`PCLMULLQLQDQ xmm1,xmm2/m128`	`66 0F 3A 44 /r 00`	Multiply the low halves of the two 128-bit operands.
`PCLMULHQLQDQ xmm1,xmm2/m128`	`66 0F 3A 44 /r 01`	Multiply the high half of the destination register by the low half of the source operand.
`PCLMULLQHQDQ xmm1,xmm2/m128`	`66 0F 3A 44 /r 10`	Multiply the low half of the destination register by the high half of the source operand.
`PCLMULHQHQDQ xmm1,xmm2/m128`	`66 0F 3A 44 /r 11`	Multiply the high halves of the two 128-bit operands.

RDRAND and RDSEED

Instruction	Encoding	Description	Added in
`RDRAND r16` `RDRAND r32`	`NFx 0F C7 /6`	Return a random number that has been generated with a CSPRNG (Cryptographically Secure Pseudo-Random Number Generator) compliant with NIST SP 800-90A.^[a]	Ivy Bridge, Silvermont, Excavator, Puma, ZhangJiang, Knights Landing,
`RDRAND r64`	`NFx REX.W 0F C7 /6`
`RDSEED r16` `RDSEED r32`	`NFx 0F C7 /7`	Return a random number that has been generated with a HRNG/TRNG (Hardware/"True" Random Number Generator) compliant with NIST SP 800-90B and C.^[a]	Broadwell, ZhangJiang, Knights Landing, Zen 1, Gracemont
`RDSEED r64`	`NFx REX.W 0F C7 /7`		Broadwell, ZhangJiang, Knights Landing, Zen 1, Gracemont

1 2 The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the issuing of these instructions – if this happens, then software may retry the instructions (although the number of retries should be limited, in order to ensure forward progress^[1]). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random number will also set the instruction's destination register to 0.

Intel SHA instructions

7 new instructions.

Instruction	Encoding	Description
`SHA1RNDS4 xmm1,xmm2/m128,imm8`	`NP 0F 3A CC /r ib`	Perform Four Rounds of SHA1 Operation
`SHA1NEXTE xmm1,xmm2/m128`	`NP 0F 38 C8 /r`	Calculate SHA1 State Variable E after Four Rounds
`SHA1MSG1 xmm1,xmm2/m128`	`NP 0F 38 C9 /r`	Perform an Intermediate Calculation for the Next Four SHA1 Message Dwords
`SHA1MSG2 xmm1,xmm2/m128`	`NP 0F 38 CA /r`	Perform a Final Calculation for the Next Four SHA1 Message Dwords
`SHA256RNDS2 xmm1,xmm2/m128,<XMM0>`	`NP 0F 38 CB /r`	Perform Two Rounds of SHA256 Operation
`SHA256MSG1 xmm1,xmm2/m128`	`NP 0F 38 CC /r`	Perform an Intermediate Calculation for the Next Four SHA256 Message Dwords
`SHA256MSG2 xmm1,xmm2/m128`	`NP 0F 38 CD /r`	Perform a Final Calculation for the Next Four SHA256 Message Dwords

Intel Key Locker instructions

These instructions, available in Tiger Lake and later Intel processors, are designed to enable encryption/decryption with an AES key without having access to any unencrypted copies of the key during the actual encryption/decryption process.

Key Locker subset	Instruction	Encoding	Description
KL Key Locker common instructions.	`LOADIWKEY xmm1,xmm2`	`F3 0F 38 DC /r`	Load internal wrapping key ("IWKey") from xmm1, xmm2 and `XMM0`. The two explicit operands (which must be register operands) specify a 256-bit encryption key. The implicit operand in `XMM0` specifies a 128-bit integrity key. `EAX` contains flags controlling operation of instruction.^[a] After being loaded, the IWKey cannot be directly read from software, but is used for the key wrapping done by `ENCODEKEY128/256` and checked by the Key Locker encode/decode instructions. `LOADIWKEY` is privileged and can run in Ring 0 only.

AESKLE AES Key Locker instructions.	`ENCODEKEY128 r32,r32`	`F3 0F 38 FA /r`	Wrap a 128-bit AES key from `XMM0` into a 384-bit key handle - and output this handle to `XMM0-2`.	Source operand specifies handle restrictions to build into the handle.^[b] Destination operand is initialized with information about the source and attributes of the key (this matches the value that was provided in EAX for the most recent invocation of `LOADIWKEY`) These instructions may also modify `XMM4-6` (zeroed out in existing implementations, but this should not be relied on).
	`ENCODEKEY256 r32,r32`	`F3 0F 3A FB /r`	Wrap a 256-bit AES key from `XMM1:XMM0` into a 512-bit key handle - and output this handle to `XMM0-3`.

	`AESENC128KL xmm,m384`	`F3 0F 38 DC /r`	Encrypt xmm using 128-bit AES key indicated by handle at `m384` and store result in xmm.^[c]
	`AESDEC128KL xmm,m384`	`F3 0F 38 DD /r`	Decrypt xmm using 128-bit AES key indicated by handle at `m384` and store result in xmm.^[c]
	`AESENC256KL xmm,m512`	`F3 0F 38 DE /r`	Encrypt xmm using 256-bit AES key indicated by handle at `m512` and store result in xmm.^[c]
	`AESDEC256KL xmm,m512`	`F3 0F 38 DF /r`	Decrypt xmm using 256-bit AES key indicated by handle at `m512` and store result in xmm.^[c]

AESKLE+WIDE_KL AES Wide Key Locker instructions. Perform encryption or decryption for eight 128-bit AES blocks at once.	`AESENCWIDE128KL m384`	`F3 0F 38 D8 /0`	Encrypt `XMM0-7` using 128-bit AES key indicated by handle at `m384` and store each resultant block back to its corresponding register.^[c]
	`AESDECWIDE128KL m384`	`F3 0F 38 D8 /1`	Decrypt `XMM0-7` using 128-bit AES key indicated by handle at `m384` and store each resultant block back to its corresponding register.^[c]
	`AESENCWIDE256KL m512`	`F3 0F 38 D8 /2`	Encrypt `XMM0-7` using 256-bit AES key indicated by handle at `m512` and store each resultant block back to its corresponding register.^[c]
	`AESDECWIDE256KL m512`	`F3 0F 38 D8 /3`	Decrypt `XMM0-7` using 256-bit AES key indicated by handle at `m512` and store each resultant block back to its corresponding register.^[c]

↑ The flags available for the LOADIWKEY instruction in the EAX register are:

Bits	Flags
0	1=Do not permit the wrapping key to be backed up to platform-scoped storage
4:1	KeySource field. The following values are supported: 0: use key input operands directly 1: XOR the key input operands with 384 bits from hardware RNG
31:5	Reserved, must be set to 0

↑ The handle restrictions available for the explicit source argument to ENCODEKEY128 and ENCODEKEY256 are:
Bits Flags
0 CPL0-only restriction
1 No-encrypt restriction
2 No-decrypt restriction
31:3 Reserved, must be set to 0
1 2 3 4 5 6 7 8 All of the AES Key Locker encode/decode instructions will check whether the handle is valid for the current IWKey and encode/decode data only if the handle is valid. These instructions will set the ZF flag to indicate whether the provided handle was valid (ZF=0) or not (ZF=1).

VIA/Zhaoxin PadLock instructions

The VIA/Zhaoxin PadLock instructions are instructions designed to apply cryptographic primitives in bulk, similar to the 8086 repeated string instructions. As such, unless otherwise specified, they take, as applicable, pointers to source data in ES:rSI and destination data in ES:rDI, and a data-size or count in rCX. Like the old string instructions, they are all designed to be interruptible.^[2]^[3]

PadLock subset	Instruction mnemonics^[a]	Encoding	Description	Added in

RNG Random Number Generation.	`XSTORE`, `XSTORE-RNG`	`NFx 0F A7 C0`	Store random bytes to ES:[rDI], and increment ES:rDI accordingly. `XSTORE` will store currently-available bytes, which may be from 0 to 8 bytes. `REP XSTORE` and `REP XRNG2` will write the number of random bytes specified by rCX, waiting for the random number generator when needed.^[b] EDX specifies a "quality factor".^[c]	Nehemiah (stepping 3)
	`REP XSTORE`, `REP XSTORE-RNG`	`F3 0F A7 C0`		Nehemiah (stepping 3)

	`REP XRNG2`	`F3 0F A7 F8`		ZhangJiang ^[d]

ACE Advanced Cryptography Engine.	`REP XCRYPT-ECB`	`F3 0F A7 C8`	Encrypt/Decrypt data, using the AES cipher in various block modes (ECB, CBC, CFB, OFB and CTR, respectively). rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, ES:rAX a pointer to an initialization vector for block modes that need it, and ES:rDX a pointer to a control word.^[e]	Nehemiah (stepping 8)
	`REP XCRYPT-CBC`	`F3 0F A7 D0`
	`REP XCRYPT-CFB`	`F3 0F A7 E0`
	`REP XCRYPT-OFB`	`F3 0F A7 E8`

ACE2^[f]	`REP XCRYPT-CTR`	`F3 0F A7 D8`		C7 "Esther" ^[7]

PHE Hash Engine.	`REP XSHA1`	`F3 0F A6 C8`	Compute a cryptographic hash (using the SHA-1 and SHA-256 functions, respectively). ES:rSI points to data to compute a hash for, ES:rDI points to a message digest and rCX specifies the number of bytes. rAX should be set to 0 at the start of a calculation.^[g]	Esther
	`REP XSHA256`	`F3 0F A6 D0`		Esther

	`REP XSHA384`	`F3 0F A6 D8`	Perform computation of a SHA-384/SHA-512 cryptographic hash. ES:rSI points to a series of 128-byte data chunks to perform hash computation for, ES:rDI points to a 64-byte digest to update, and ECX specifies the number of chunks to process.^[h]	ZhangJiang ^[d]
	`REP XSHA512`	`F3 0F A6 E0`		ZhangJiang ^[d]

PMM Montgomery/Modular Multiplication.	`REP MONTMUL`	`F3 0F A6 C0`^[i]	Perform Montgomery Multiplication. Takes an operand width in ECX (given as a number of bits – must be in range 256..32768 and divisible by 128) and pointer to a data structure in ES:ESI.^[j] When starting a new Montgomery Multiplication, EAX and the result buffer in memory must be filled with all-0s before executing the `REP MONTMUL` instruction. (Nonzero values are used to help resume the calculation if the instruction was interrupted.)	Esther

	`REP MONTMUL2`	`F3 0F A6 F0`	Perform modular multiplication/exponentiation. Takes pointers (all using the ES: segment) to bignum integers ⁠ $A,B,M,R$ ⁠ in registers rAX, rBX, rDX, rDI, respectively, where ⁠ $A$ ⁠ and ⁠ $B$ ⁠ are input numbers, ⁠ $M$ ⁠ is a modulus, and ⁠ $R$ ⁠ will be overwritten with the result. The operation performed is: `REP MONTMUL2`: $R:=\left(A*B\right){\text{ mod }}M$ `REP XMODEXP`: $R:=\left(A^{B}\right){\text{ mod }}M$ ECX provides the size of the bignums, in number of bits (256..32768, must be divisble by 128), and ES:rSI provides a pointer to a scratchpad area to use during the calculation.^[k]	ZhangJiang ^[d]
	`REP XMODEXP`	`F3 0F A6 F8`		ZhangJiang ^[d]

GMI^[9]^[10]^[11] Chinese national cryptographic algorithms. (Zhaoxin only.)	`CCS_HASH`, `CCS_SM3`^[l]	`F3 0F A6 E8`	Compute SM3 hash, similar to the `REP XSHA*` instructions. The rBX register is used to specify hash function (`20h` for SM3 being the only documented value).	ZhangJiang
	`CCS_ENCRYPT`, `CCS_SM4`^[l]	`F3 0F A7 F0`	Encrypt/Decrypt data, using the SM4 cipher in various block modes. rCX contains the number of 16-byte blocks to encrypt/decrypt, rBX contains a pointer to an encryption key, rDX a pointer to an initialization vector for block modes that need it, and rAX contains a control word.^[m]	ZhangJiang
	`SM2`^[12]	`F2 0F A6 C0`	Perform SM2 (public key cryptographic algorithm) function. The function to perform is specified in bits 5:0 of EDX^[n] - depending on function, rAX/rBX/rCX/rSI/rDI may provide additional input arguments. The instruction returns a status bit in EDX bit 6 (0=success, 1=failure) - depending on function, rAX, rCX and rDI may be modified as well.	KX-6000G

Footnotes

↑ For instruction mnemonics that are listed with a hyphen, different VIA PadLock documents differ with respect to whether the instruction names have a hyphen or not (e.g. version 1.0 of the ACE programming guide uses the hyphens,^[4] while v1.66 does not.^[2]) and assemblers may accept instruction mnemonics with or without the hyphen - e.g. GNU Binutils rev 2.17 and later accepts both.
Some assemblers may also consider the REP prefix optional for instructions other than XSTORE - with such assemblers, the PadLock instructions will be assembled with one F3 (REP) prefix byte regardless of whether the assembly instruction is written with REP or not. (The F3 prefix is mandatory for all PadLock instructions except XSTORE.)
↑ On some processors that support PadLock, the REP XSTORE instruction (but not REP XRNG2) may write not just the number of bytes specified in ECX, but up to 7 additional bytes as well.^[5]
↑ For the REP XRNG2 instruction, bits 1:0 of EDX are used to indicate whether the instruction should return hardware random numbers directly (EDX[1:0]==0) or return postprocessed numbers (EDX[1:0] ≠ 0).
1 2 3 As of 2024, the REP XRNG2, REP XSHA384, REP XSHA512, REP MONTMUL2 and REP XMODEXP instructions exist as documented instructions only on Zhaoxin processors.^[3]
A VIA-provided OpenSSL patch from 2011^[6] indicates that these instructions were present on the VIA Nano, however VIA has not published documentation for these instructions.

↑ The control word for REP XCRYPT* is a 16-byte (128-bit) data structure with the following layout:

Bits	Usage
3:0	AES round count
4	Digest mode enable (ACE2 only)
5	1=allow data that are not 16-byte aligned (ACE2 only)
6	Cipher: 0=AES, 1=undefined
7	Key schedule: 0=compute (128-bit key only), 1=load from memory
8	0=normal, 1=intermediate-result
9	0=encrypt, 1=decrypt
11:10	Key size: 00=128-bit,01=192-bit,10=256-bit, 11=reserved
127:12	Reserved, must be set to 0

If bit 5 is set in order to allow unaligned data, then the REP XCRYPT* instructions will use the 112 bytes directly after the control word as a scratchpad memory area for data realignment.

↑ In addition to the new REP XCRYPT-CTR instruction, ACE2 also adds extra features to the other REP XCRYPT instructions: a digest mode for the CBC and CFB instructions, and the ability to use input/output data that are not 16-byte aligned for the non-ECB instructions.
↑ On VIA Nano and later processors, setting rAX to an all-1s value for the REP XSHA* instructions will enable an alternate operation mode, where rCX specifies the number of 64-byte blocks, and where the standard FIPS-180-2 length extension procedure at the end of the hash calculation is omitted. This makes for a variant more suitable for data streaming than the original EAX=0 variant.^[8] This functionality also exists for CCS_HASH.
↑ The per-chunk calculation is identical for SHA-384 and SHA-512 - as a result of this, the REP XSHA384 and REP XSHA512 instructions perform identical operations.
↑ The REP MONTMUL instruction is only supported with an AddressSize of 32 bits - for this reason, the address-size override prefix (67h) is required in 16-bit and 64-bit modes, but disallowed in 32-bit mode.
↑ The data structure to REP MONTMUL contains six 32-bit elements, where the first one is a negated modular inverse of the bottom 32 bits of the modulus and the remaining 5 are pointers to various memory buffers (each of which uses the ES segment and must be 16-byte aligned):
Offset Data item
0 Negated modular inverse
4 Pointer to first multiplicand
8 Pointer to second multiplicand
12 Pointer to result buffer
16 Pointer to modulus
20 Pointer to 32-byte scratchpad
↑ Given a bignum size of N bits, the scratchpad memory area pointed to by ES:rSI for the REP MONTMUL2 and REP XMODEXP must have a size of at least $3*(N/8)+40$ bytes (e.g. for a 2048-bit bignum size, the scratchpad must be at least 808 bytes). Also, before starting either of these instructions, the 8 first bytes of this scratchpad must be zeroed out and the bignum size given in ECX must also be written as a 64-bit integer to the next 8 bytes.
1 2 The CCS instructions are listed with different mnemonics in different Zhaoxin sources - e.g. the CCS_SM3/CCS_SM4 mnemomics are used in a 2019 article,^[11] while CCS_HASH/CCS_ENCRYPT are used in a 2020 article.^[9]
↑ The CCS_ENCRYPT control word in rAX has the following format:
Bits Usage
0 0=Encrypt, 1=Decrypt
5:1 Must be 10000b for SM4.
6 ECB block mode
7 CBC block mode
8 CFB block mode
9 OFB block mode
10 CTR block mode
11 Digest enable
Remaining bits in rAX must be set to all-0s. Of bits 10:6 in rAX (block mode selection), exactly one bit must be set, or else behavior is undefined.

↑ The supported functions in bits 5:0 of EDX for the SM2 instruction are:

Value	Meaning
0x01	Encryption
0x02	Decryption
0x04	Signature
0x08	Verify signature
0x10	Key exchange 1
0x11	Key exchange 2 without hash
0x12	Key exchange 3 without hash
0x15	Key exchange 2 with hash
0x16	Key exchange 3 with hash
0x20	Preprocess1 to calculate hash value Z of user’s identification
0x21	Preprocess2 to calculate hash value e of hash value Z and message M

Related Research Articles

Blowfish is a symmetric-key block cipher, designed in 1993 by Bruce Schneier and included in many cipher suites and encryption products. Blowfish provides a good encryption rate in software, and no effective cryptanalysis of it has been found to date for smaller files. It is recommended Blowfish should not be used to encrypt files larger than 4GB in size, Twofish should be used instead.

<span class="mw-page-title-main">HMAC</span> Computer communications authentication algorithm

In cryptography, an HMAC is a specific type of message authentication code (MAC) involving a cryptographic hash function and a secret cryptographic key. As with any MAC, it may be used to simultaneously verify both the data integrity and authenticity of a message. An HMAC is a type of keyed hash function that can also be used in a key derivation scheme or a key stretching scheme.

<span class="mw-page-title-main">Tiny Encryption Algorithm</span> Block cipher

In cryptography, the Tiny Encryption Algorithm (TEA) is a block cipher notable for its simplicity of description and implementation, typically a few lines of code. It was designed by David Wheeler and Roger Needham of the Cambridge Computer Laboratory; it was first presented at the Fast Software Encryption workshop in Leuven in 1994, and first published in the proceedings of that workshop.

In cryptography, padding is any of a number of distinct practices which all include adding data to the beginning, middle, or end of a message prior to encryption. In classical cryptography, padding may include adding nonsense phrases to a message to obscure the fact that many messages end in predictable ways, e.g. sincerely yours.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

In cryptography, Optimal Asymmetric Encryption Padding (OAEP) is a padding scheme often used together with RSA encryption. OAEP was introduced by Bellare and Rogaway, and subsequently standardized in PKCS#1 v2 and RFC 2437.

SHA-2 is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression function itself built using the Davies–Meyer structure from a specialized block cipher.

In a Windows network, NT LAN Manager (NTLM) is a suite of Microsoft security protocols intended to provide authentication, integrity, and confidentiality to users. NTLM is the successor to the authentication protocol in Microsoft LAN Manager (LANMAN), an older Microsoft product. The NTLM protocol suite is implemented in a Security Support Provider, which combines the LAN Manager authentication protocol, NTLMv1, NTLMv2 and NTLM2 Session protocols in a single package. Whether these protocols are used or can be used on a system, which is governed by Group Policy settings, for which different versions of Windows have different default settings.

In cryptography, the Merkle–Damgård construction or Merkle–Damgård hash function is a method of building collision-resistant cryptographic hash functions from collision-resistant one-way compression functions. This construction was used in the design of many popular hash algorithms such as MD5, SHA-1, and SHA-2.

Panama is a cryptographic primitive which can be used both as a hash function and a stream cipher, but its hash function mode of operation has been broken and is not suitable for cryptographic use. Based on StepRightUp, it was designed by Joan Daemen and Craig Clapp and presented in the paper Fast Hashing and Stream Encryption with PANAMA on the Fast Software Encryption (FSE) conference 1998. The cipher has influenced several other designs, for example MUGI and SHA-3.

In cryptography, Galois/Counter Mode (GCM) is a mode of operation for symmetric-key cryptographic block ciphers which is widely adopted for its performance. GCM throughput rates for state-of-the-art, high-speed communication channels can be achieved with inexpensive hardware resources.

In cryptography, PKCS #1 is the first of a family of standards called Public-Key Cryptography Standards (PKCS), published by RSA Laboratories. It provides the basic definitions of and recommendations for implementing the RSA algorithm for public-key cryptography. It defines the mathematical properties of public and private keys, primitive operations for encryption and signatures, secure cryptographic schemes, and related ASN.1 syntax representations.

SHA-3 is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like structure of SHA-1 and SHA-2.

JH is a cryptographic hash function submitted to the NIST hash function competition by Hongjun Wu. Though chosen as one of the five finalists of the competition, in 2012 JH ultimately lost to NIST hash candidate Keccak. JH has a 1024-bit state, and works on 512-bit input blocks. Processing an input block consists of three steps:

XOR the input block into the left half of the state.
Apply a 42-round unkeyed permutation (encryption function) to the state. This consists of 42 repetitions of:
1. Break the input into 256 4-bit blocks, and map each through one of two 4-bit S-boxes, the choice being made by a 256-bit round-dependent key schedule. Equivalently, combine each input block with a key bit, and map the result through a 5→4 bit S-box.
2. Mix adjacent 4-bit blocks using a maximum distance separable code over GF(2⁴).
3. Permute 4-bit blocks so that they will be adjacent to different blocks in following rounds.
XOR the input block into the right half of the state.

The following tables compare general and technical information for a number of cryptographic hash functions. See the individual functions' articles for further information. This article is not all-inclusive or necessarily up-to-date. An overview of hash function security/cryptanalysis can be found at hash function security summary.

An AES instruction set is a set of instructions that are specifically designed to perform AES encryption and decryption operations efficiently. These instructions are typically found in modern processors and can greatly accelerate AES operations compared to software implementations. An AES instruction set includes instructions for key expansion, encryption, and decryption using various key sizes.

There are various implementations of the Advanced Encryption Standard, also known as Rijndael.

VIA PadLock is a central processing unit (CPU) instruction set extension to the x86 microprocessor instruction set architecture (ISA) found on processors produced by VIA Technologies and Zhaoxin. Introduced in 2003 with the VIA Centaur CPUs, the additional instructions provide hardware-accelerated random number generation (RNG), Advanced Encryption Standard (AES), SHA-1, SHA256, and Montgomery modular multiplication.

BLAKE is a cryptographic hash function based on Daniel J. Bernstein's ChaCha stream cipher, but a permuted copy of the input block, XORed with round constants, is added before each ChaCha round. Like SHA-2, there are two variants differing in the word size. ChaCha operates on a 4×4 array of words. BLAKE repeatedly combines an 8-word hash value with 16 message words, truncating the ChaCha result to obtain the next hash value. BLAKE-256 and BLAKE-224 use 32-bit words and produce digest sizes of 256 bits and 224 bits, respectively, while BLAKE-512 and BLAKE-384 use 64-bit words and produce digest sizes of 512 bits and 384 bits, respectively.

In cryptography, a pepper is a secret added to an input such as a password during hashing with a cryptographic hash function. This value differs from a salt in that it is not stored alongside a password hash, but rather the pepper is kept separate in some other medium, such as a Hardware Security Module. Note that the National Institute of Standards and Technology refers to this value as a secret key rather than a pepper. A pepper is similar in concept to a salt or an encryption key. It is like a salt in that it is a randomized value that is added to a password hash, and it is similar to an encryption key in that it should be kept secret.

References

↑ Intel, Digital Random Number Generator (DRNG) Software Implementation Guide rev 2.1, oct 17, 2018, sections 5.2 and 5.3. Archived on nov 19, 2021.
1 2 VIA, PadLock Programming Guide, rev 1.66, 4 Aug 2005. Archived from the original on 26 May 2010.
1 2 Binutils mailing list, (PATCH v1) x86: Support ZHAOXIN padlock instructions, 13 Dec 2024, see "padlock instruction set reference.pdf" attachment for Zhaoxin-provided documentation of the PadLock instructions. Archived on 19 Dec 2024; attachment archived on 19 Dec 2024.
↑ VIA, Nehemiah Advanced Cryptography Engine Programming Guide, v1.0, 2004. Archived from the original on 17 Sep 2004.
↑ VIA, Nehemiah Random Number Generator Programming Guide, v1.0, 2003, page 9. Archived from the original on 17 Sep 2004.
↑ openssl-dev mailing list, (PATCH) Update PadLock engine for VIA C7 and Nano CPUs, 10 Jun 2011. Archived on 30 Jan 2022.
↑ Michal Ludvig, VIA PadLock—Wicked Fast Encryption, Linux Journal, Apr 6, 2005. Archived on Jun 20, 2005.
↑ Stack Overflow, Streaming SHA calculation using VIA's Padlock Hashing Engine?, Aug 11, 2014. Archived on Jun 14, 2019.
The PadLock SDK (v3.1) referenced in the Stack Overflow answer can be downloaded from the Crypto++ wiki (accessed on Aug 11, 2023) or the Wayback Machine.
1 2 Zhaoxin, Core Technology | Instructions for the use of accelerated instructions for national encryption algorithm based on Zhaoxin processor (in Chinese), 8 Aug 2020. Archived on Jan 5, 2022.
↑ Zhaoxin, GMI User Manual v1.0 (in Chinese), 23 May 2016. Archived on Feb 28, 2022.
1 2 Zhaoxin, Research on hardware acceleration and application of national cryptographic algorithms based on Zhaoxin CPU (in Chinese), 3 Sep 2019. Archived on 11 Aug 2020.
↑ Binutils mailing list, (PATCH v1) x86: Support ZHAOXIN GMI instructions, 14 Oct 2024, see "ZX_GMI_Reference.docx" attachment for Zhaoxin-provided documentation of the SM2 instruction. Archived on 9 Nov 2024; attachment archived on 9 Nov 2024.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[rdrand_retry-2] 1 2 The RDRAND and RDSEED instructions may fail to obtain and return a random number if the CPU's random number generators cannot keep up with the issuing of these instructions – if this happens, then software may retry the instructions (although the number of retries should be limited, in order to ensure forward progress^[1]). The instructions set EFLAGS.CF to 1 if a random number was successfully obtained and 0 otherwise. Failure to obtain a random number will also set the instruction's destination register to 0.

[3] The flags available for the LOADIWKEY instruction in the EAX register are:
Bits Flags
0 1=Do not permit the wrapping key to be backed up to platform-scoped storage
4:1 KeySource field. The following values are supported:
0: use key input operands directly
1: XOR the key input operands with 384 bits from hardware RNG
31:5 Reserved, must be set to 0

[mwAYw] 0: use key input operands directly

[mwAY0] 1: XOR the key input operands with 384 bits from hardware RNG

[4] The handle restrictions available for the explicit source argument to ENCODEKEY128 and ENCODEKEY256 are:
Bits Flags
0 CPL0-only restriction
1 No-encrypt restriction
2 No-decrypt restriction
31:3 Reserved, must be set to 0

[aes_kl_check-5] 1 2 3 4 5 6 7 8 All of the AES Key Locker encode/decode instructions will check whether the handle is valid for the current IWKey and encode/decode data only if the handle is valid. These instructions will set the ZF flag to indicate whether the provided handle was valid (ZF=0) or not (ZF=1).

[9] For instruction mnemonics that are listed with a hyphen, different VIA PadLock documents differ with respect to whether the instruction names have a hyphen or not (e.g. version 1.0 of the ACE programming guide uses the hyphens,^[4] while v1.66 does not.^[2]) and assemblers may accept instruction mnemonics with or without the hyphen - e.g. GNU Binutils rev 2.17 and later accepts both.
Some assemblers may also consider the REP prefix optional for instructions other than XSTORE - with such assemblers, the PadLock instructions will be assembled with one F3 (REP) prefix byte regardless of whether the assembly instruction is written with REP or not. (The F3 prefix is mandatory for all PadLock instructions except XSTORE.)

[11] On some processors that support PadLock, the REP XSTORE instruction (but not REP XRNG2) may write not just the number of bytes specified in ECX, but up to 7 additional bytes as well.^[5]

[12] For the REP XRNG2 instruction, bits 1:0 of EDX are used to indicate whether the instruction should return hardware random numbers directly (EDX[1:0]==0) or return postprocessed numbers (EDX[1:0] ≠ 0).

[zx_padlock-14] 1 2 3 As of 2024, the REP XRNG2, REP XSHA384, REP XSHA512, REP MONTMUL2 and REP XMODEXP instructions exist as documented instructions only on Zhaoxin processors.^[3]
A VIA-provided OpenSSL patch from 2011^[6] indicates that these instructions were present on the VIA Nano, however VIA has not published documentation for these instructions.

[15] The control word for REP XCRYPT* is a 16-byte (128-bit) data structure with the following layout:
Bits Usage
3:0 AES round count
4 Digest mode enable (ACE2 only)
5 1=allow data that are not 16-byte aligned (ACE2 only)
6 Cipher: 0=AES, 1=undefined
7 Key schedule: 0=compute (128-bit key only), 1=load from memory
8 0=normal, 1=intermediate-result
9 0=encrypt, 1=decrypt
11:10 Key size: 00=128-bit,01=192-bit,10=256-bit, 11=reserved
127:12 Reserved, must be set to 0
If bit 5 is set in order to allow unaligned data, then the REP XCRYPT* instructions will use the 112 bytes directly after the control word as a scratchpad memory area for data realignment.

[16] In addition to the new REP XCRYPT-CTR instruction, ACE2 also adds extra features to the other REP XCRYPT instructions: a digest mode for the CBC and CFB instructions, and the ability to use input/output data that are not 16-byte aligned for the non-ECB instructions.

[19] On VIA Nano and later processors, setting rAX to an all-1s value for the REP XSHA* instructions will enable an alternate operation mode, where rCX specifies the number of 64-byte blocks, and where the standard FIPS-180-2 length extension procedure at the end of the hash calculation is omitted. This makes for a variant more suitable for data streaming than the original EAX=0 variant.^[8] This functionality also exists for CCS_HASH.

[20] The per-chunk calculation is identical for SHA-384 and SHA-512 - as a result of this, the REP XSHA384 and REP XSHA512 instructions perform identical operations.

[21] The REP MONTMUL instruction is only supported with an AddressSize of 32 bits - for this reason, the address-size override prefix (67h) is required in 16-bit and 64-bit modes, but disallowed in 32-bit mode.

[22] The data structure to REP MONTMUL contains six 32-bit elements, where the first one is a negated modular inverse of the bottom 32 bits of the modulus and the remaining 5 are pointers to various memory buffers (each of which uses the ES segment and must be 16-byte aligned):
Offset Data item
0 Negated modular inverse
4 Pointer to first multiplicand
8 Pointer to second multiplicand
12 Pointer to result buffer
16 Pointer to modulus
20 Pointer to 32-byte scratchpad

[23] Given a bignum size of N bits, the scratchpad memory area pointed to by ES:rSI for the REP MONTMUL2 and REP XMODEXP must have a size of at least $3*(N/8)+40$ bytes (e.g. for a 2048-bit bignum size, the scratchpad must be at least 808 bytes). Also, before starting either of these instructions, the 8 first bytes of this scratchpad must be zeroed out and the bignum size given in ECX must also be written as a 64-bit integer to the next 8 bytes.

[zhaoxin_multi-27] 1 2 The CCS instructions are listed with different mnemonics in different Zhaoxin sources - e.g. the CCS_SM3/CCS_SM4 mnemomics are used in a 2019 article,^[11] while CCS_HASH/CCS_ENCRYPT are used in a 2020 article.^[9]

[28] The CCS_ENCRYPT control word in rAX has the following format:
Bits Usage
0 0=Encrypt, 1=Decrypt
5:1 Must be 10000b for SM4.
6 ECB block mode
7 CBC block mode
8 CFB block mode
9 OFB block mode
10 CTR block mode
11 Digest enable
Remaining bits in rAX must be set to all-0s. Of bits 10:6 in rAX (block mode selection), exactly one bit must be set, or else behavior is undefined.

[30] The supported functions in bits 5:0 of EDX for the SM2 instruction are:
Value Meaning
0x01 Encryption
0x02 Decryption
0x04 Signature
0x08 Verify signature
0x10 Key exchange 1
0x11 Key exchange 2 without hash
0x12 Key exchange 3 without hash
0x15 Key exchange 2 with hash
0x16 Key exchange 3 with hash
0x20 Preprocess1 to calculate hash value Z of user’s identification
0x21 Preprocess2 to calculate hash value e of hash value Z and message M

[1] Intel, Digital Random Number Generator (DRNG) Software Implementation Guide rev 2.1, oct 17, 2018, sections 5.2 and 5.3. Archived on nov 19, 2021.

[padlock_consolidated-6] 1 2 VIA, PadLock Programming Guide, rev 1.66, 4 Aug 2005. Archived from the original on 26 May 2010.

[zhaoxin_padlock-7] 1 2 Binutils mailing list, (PATCH v1) x86: Support ZHAOXIN padlock instructions, 13 Dec 2024, see "padlock instruction set reference.pdf" attachment for Zhaoxin-provided documentation of the PadLock instructions. Archived on 19 Dec 2024; attachment archived on 19 Dec 2024.

[padlock_ace-8] VIA, Nehemiah Advanced Cryptography Engine Programming Guide, v1.0, 2004. Archived from the original on 17 Sep 2004.

[10] VIA, Nehemiah Random Number Generator Programming Guide, v1.0, 2003, page 9. Archived from the original on 17 Sep 2004.

[13] ssl-dev mailing list, (PATCH) Update PadLock engine for VIA C7 and Nano CPUs, 10 Jun 2011. Archived on 30 Jan 2022.

[17] Michal Ludvig, VIA PadLock—Wicked Fast Encryption, Linux Journal, Apr 6, 2005. Archived on Jun 20, 2005.

[18] Stack Overflow, Streaming SHA calculation using VIA's Padlock Hashing Engine?, Aug 11, 2014. Archived on Jun 14, 2019.
The PadLock SDK (v3.1) referenced in the Stack Overflow answer can be downloaded from the Crypto++ wiki (accessed on Aug 11, 2023) or the Wayback Machine.

[zhaoxin_gmi-24] 1 2 Zhaoxin, Core Technology | Instructions for the use of accelerated instructions for national encryption algorithm based on Zhaoxin processor (in Chinese), 8 Aug 2020. Archived on Jan 5, 2022.

[zhaoxin_gmi2-25] Zhaoxin, GMI User Manual v1.0 (in Chinese), 23 May 2016. Archived on Feb 28, 2022.

[zhaoxin_gmi3-26] 1 2 Zhaoxin, Research on hardware acceleration and application of national cryptographic algorithms based on Zhaoxin CPU (in Chinese), 3 Sep 2019. Archived on 11 Aug 2020.

[29] Binutils mailing list, (PATCH v1) x86: Support ZHAOXIN GMI instructions, 14 Oct 2024, see "ZX_GMI_Reference.docx" attachment for Zhaoxin-provided documentation of the SM2 instruction. Archived on 9 Nov 2024; attachment archived on 9 Nov 2024.

[a]

[1]

[a]

[b]

[c]

[2]

[3]

[a]

[b]

[c]

[d]

[e]

[f]

[7]

[g]

[h]

[i]

[j]

[k]

[9]

[10]

[11]

[l]

[m]

[12]

[n]

[4]

[5]

[6]

[8]

Bits	Flags
0	CPL0-only restriction
1	No-encrypt restriction
2	No-decrypt restriction
31:3	Reserved, must be set to 0

Offset	Data item
0	Negated modular inverse
4	Pointer to first multiplicand
8	Pointer to second multiplicand
12	Pointer to result buffer
16	Pointer to modulus
20	Pointer to 32-byte scratchpad

Bits	Usage
0	0=Encrypt, 1=Decrypt
5:1	Must be 10000b for SM4.
6	ECB block mode
7	CBC block mode
8	CFB block mode
9	OFB block mode
10	CTR block mode
11	Digest enable

x86 instruction listings
Part of a series on
Main (integer, system, x87) SIMD (MMX, SSE, AVX, FMA, AMX) Virtualization (VT-x, AMD-V, TDX) Cryptographic (e.g. RDRAND, AES-NI) Discontinued (e.g. 3DNow!, MPX, XOP)
v t e