Bitwise ternary logic instructions can logically implement all possible bitwise operations between three inputs (256 permutations). They take three registers as input and an 8-bit immediate field. Each bit in the output is generated using an 8-bit Lookup table of the three corresponding bits in the inputs to select one of the 8 positions in the 8-bit immediate. Since only 8 combinations are possible using three bits, this allow all possible 3-input bitwise operations to be performed. In mathematical terminology: each corresponding bit of the three inputs is a ternary Boolean function with a Hasse diagram of order n=8. [1] Also known as minterms.
A full table showing all 256 possible 3-operand logical bitwise instruction may be found in the Power ISA description of xxeval
. [2] An additional insight is that if the 8-bit immediate were an operand (register) then in FPGA terminology, bitwise ternary logical instructions would implement an array of Hardware LUT3s.
In pseudocode the output from three single-bit inputs is illustrated by using r2, r1 and r0 as three binary digits of a 3-bit index, to treat the 8-bit immediate as a lookup table and to simply return the indexed bit:
result := imm8(r2<<2 + r1<<1 + r0)
A readable implementation in Python of three single-bit inputs (r0 r1 and r2) is shown below:
defternlut8(r0,r1,r2,imm8):"""Implementation of a LUT3 (ternary lookup)"""# index will be in range 0 to 7lut_index=0# r0 sets bit0, r1 bit1, and r2 bit2ifr0:lut_index|=1<<0ifr1:lut_index|=1<<1ifr2:lut_index|=1<<2# return the requested indexed bit of imm8returnimm8&(1<<lut_index)!=0
If the input registers are 64-bit then the output is correspondingly 64-bit, and would be constructed from selecting each indexed bit of the three inputs to create the corresponding indexed bit of the output:
defternlut8_64bit(R0,R1,R2,imm8):"""Implementation of a 64-bit ternary lookup instruction"""result=0foriinrange(64):m=1<<i# single mask bit of inputsr0,r1,r2=(R0&m),(R1&m),(R2&m)result|=ternlut8(r0,r1,r2,imm8)<<ireturnresult
An example table of just three possible permutations out of the total 256 for the 8-bit immediate is shown below - Double-AND, Double-OR and Bitwise-blend. The immediate (the 8-bit lookup table) is named imm8, below. Note that the column has the value in binary of its corresponding header: imm8:0xCA is binary 11001010 in the "Bitwise blend" column:
A0 | A1 | A2 | Double AND (imm8=0x80) | Double OR (imm8=0xFE) | Bitwise blend (imm8=0xCA) |
---|---|---|---|---|---|
0 | 0 | 0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 | 1 | 1 |
0 | 1 | 0 | 0 | 1 | 0 |
0 | 1 | 1 | 0 | 1 | 1 |
1 | 0 | 0 | 0 | 1 | 0 |
1 | 0 | 1 | 0 | 1 | 0 |
1 | 1 | 0 | 0 | 1 | 1 |
1 | 1 | 1 | 1 | 1 | 1 |
The number of uses is significant: anywhere that three logical bitwise operations are used in algorithms. Carry-save, SHA-1 SHA-2, MD5, and exactly-one and exactly-two bitcounting used in Harley-Seal Popcount. [3] vpternlog
speeds up MD5 by 20% [4]
Although unusual due to the high cost in hardware this instruction is found in a number of instruction set architectures
vpternlog
[6] xxeval
. [7] vpternlog
: Tom Forsyth explains, amusingly, the Intel test engineers being happy to have one instruction to test rather than 256. [8] [9] [10] [11]