Bit manipulation instructions are instructions that perform bit manipulation operations in hardware, rather than requiring several instructions for those operations as illustrated with examples in software. [1] Several leading as well as historic architectures have bit manipulation instructions including ARM, WDC 65C02, the TX-2 and the Power ISA. [2]
Bit manipulation is usually divided into subsets as individual instructions can be costly to implement in hardware when the target application has no justification. Conversely, if there is a justification then performance may suffer if the instruction is excluded. Carrying out the cost-benefit analysis is a complex task: one of the most comprehensive efforts in bit manipulation was a collaboration headed by Clare Wolfe, providing justifications, use-cases, c code, proofs and Verilog for each proposed RISC-V instruction. [3] [4]
Particular practical examples include Bit banging of GPIO using a low-cost Embedded controller such as the WDC 65C02, 8051 and Atmel PIC. At the slow clock rate of these CPUs, if bit-set/clear/test bit manipulation were not available the use of that low-cost CPU would, self-evidently, not be viable for the target application.
All the architectures below have instruction subsets and groups where the bit manipulation is provided in hardware. From the list it can be seen that DSPs and Embedded Microcontrollers have at least test/set/clear bit, however there are much more comprehensive instructions such as Count leading zeros, Popcount, Galois field arithmetic, Binary-coded decimal, bit-matrix multiply and transpose, byte-permute, bit permute including bit-reversal, specialised cryptographic instructions and many more.
BSR
Bit Scan Reverse - Returns bit index of highest set bit in input, effectively backwards count leading zeros, not defined for 0.BSF
Bit Scan Forward - Returns bit index of lowest set bit in input, effectively count trailing zeros, but not defined for 0.lzcnt
tzcnt
popcnt
pext
/pdep
ptest
and vptest
, given two inputs, do both an AND
operation and an ANDN
operation between them, and set the ZF and CF EFLAGS bits on whether the results of the AND and ANDN, respectively, are 0. This can be used to test if all masked bits are zero, all masked bits are set, or a mix.vpternlog
. Also noteworthy is a conflict detection instruction. VPCONFLICTD
GF2P8AFFINEQB
is effectively an 8x8 bit-matrix multiply in the Galois field GF(2^8). [5] Power ISA has a large range of bit manipulation instructions, [7] largely due to its history and relationship with IBM mainframes and the z/Architecture:
popcntb
is SWAR byte-level 8x8-bit but there is no 4x16-bit popcnth
yet there is 2x32-bit popcntw
and 64-bit scalar popcntd
. Likewise, prtyw
is SWAR half-word 4x16-bit but there is no prtyb
pextd
and bit-deposit pdepd
these drop and distribute bits in place according to a mask instead of the more usual technique of a offset and a length. [10] ; An unusual centrifuge instruction which moves masked-bits to the left and unmasked bits to the right, preserving their relative order in both instances. Most ISAs would have an operand expressing the number of sequential bits to extract, plus the length: cfuged
combines these into one general-purpose bitmask. [10] vgbbd
[11] which treats a 64-bit quantity as an 8x8 2D matrix, and performs a matrix transpose operation. Each bit 0 of each byte therefore becomes the first byte, each bit 1 of each byte becomes the second and so on.bpermd
) [12] which allows selection of up to eight individual bits from a 64-bit source, by treating each byte of a second 64-bit register as bit-indices into the first.xxeval
[13] similar to AVX-512 Cray patented BMM (Bit matrix multiply) in 1990 which could cope with up to 64x64-bit operands. [15] The closest equivalent today is the 8x8 GF(2) Affine Transform instruction of AVX512
The IBM 3090 introduced an optional vector facility [16] to the System/370-XA and Enterprise Systems Architecture/370 instruction sets. In addition to integer and floating-point vector arithmetic and logical operations on multiple integer and floating-point values, it introduced vector bit manipulation operations count leading zeros vczvm
and population count vcovm
. [17]
z/Architecture did not support the previous vector facility. [18] However, starting with the 11th edition of the z/Architecture Principles of Operation: [19] it supported the following instructions:
vclz
, count trailing zeros vctz
[20] [21] and vector population count vpopct
[22] vtm
[23] - sets a Condition Code based on comparing all elements of one register against a second vector as a mask: if all masked-comparisons are all-zero, if all are all-ones or a mix of both.vgfm
, [24] known as carryless multiply The DEC PDP-6 and PDP-10 had logical operations covering the full suite of 2-operand hardware lookup table (LUT2) Boolean functions [31] (rather than the 3-operand functions that AVX512 and Power ISA have).
Later models of the PDP-10 had instructions to convert between packed BCD and binary. [32]
Also present is unusual (variable-bit-length) byte load and store instructions that use byte pointers for memory operands: in modern terminology these are bit-field insert and extract. In addition to a word address, the bit length (S) and the bit offset (P) of the byte from which to load or into which to store are specified. These instructions can specify a byte size of 0-36, but a byte may not straddle a word boundary. [33] The string manipulation, [34] BCD/binary conversion, [35] and string editing [36] instructions in later models use byte pointers and have the same restrictions.
The GE-600 series and its successors had Gray-to-binary conversion; without such an instruction, converting from Gray code requires multiple steps. Binary-to-Gray is simply x^(x>>1)
and does not justify a dedicated instruction. Gray coding has significant practical applications.
In the standard extensions RISC-V has scalar bitwise operations including shift and arithmetic shift, but no rotate. The omissions are compensated for with additional extensions.
BIT
, RES
, and SET
instructions. These test, reset, and set individual bits in registers or memory pointed to by HL. [47]