Sign extension

Last updated June 04, 2024

Sign extension (sometimes abbreviated as sext, particularly in mnemonics) is the operation, in computer arithmetic, of increasing the number of bits of a binary number while preserving the number's sign (positive/negative) and value. This is done by appending digits to the most significant side of the number, following a procedure dependent on the particular signed number representation used.

For example, if six bits are used to represent the number "00 1010" (decimal positive 10) and the sign extends operation increases the word length to 16 bits, then the new representation is simply "0000 0000 0000 1010". Thus, both the value and the fact that the value was positive are maintained.

If ten bits are used to represent the value "11 1111 0001" (decimal negative 15) using two's complement, and this is sign extended to 16 bits, the new representation is "1111 1111 1111 0001". Thus, by padding the left side with ones, the negative sign and the value of the original number are maintained.

In the Intel x86 instruction set, for example, there are two ways of doing sign extension:

using the instructions cbw, cwd, cwde, and cdq: convert the byte to word, word to doubleword, word to extended doubleword, and doubleword to quadword, respectively (in the x86 context a byte has 8 bits, a word 16 bits, a doubleword and extended doubleword 32 bits, and a quadword 64 bits);
using one of the sign extended moves, accomplished by the movsx ("move with sign extension") family of instructions.

Zero extension

A similar concept is zero extension (sometimes abbreviated as zext). In a move or convert operation, zero extension refers to setting the high bits of the destination to zero, rather than setting them to a copy of the most significant bit of the source. If the source of the operation is an unsigned number, then zero extension is usually the correct way to move it to a larger field while preserving its numeric value, while sign extension is correct for signed numbers.

In the x86 and x64 instruction sets, the movzx instruction ("move with zero extension") performs this function. For example, movzx ebx, al copies a byte from the al register to the low-order byte of ebx and then fills the remaining bytes of ebx with zeroes.

On x64, most instructions that write to the entirety of lower 32 bits of any of the general-purpose registers will zero the upper half of the destination register. For example, the instruction mov eax, 1234 will clear the upper 32 bits of the rax^{[lower-alpha 1]} register.

Related Research Articles

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

x86 is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the 8086 microprocessor and its 8088 variant. The 8086 was introduced in 1978 as a fully 16-bit extension of 8-bit Intel's 8080 microprocessor, with memory segmentation as a solution for addressing more memory than can be covered by a plain 16-bit address. The term "x86" came into being because the names of several successors to Intel's 8086 processor end in "86", including the 80186, 80286, 80386 and 80486. Colloquially, their names were "186", "286", "386" and "486".

In computer programming, an arithmetic shift is a shift operator, sometimes termed a signed shift. The two basic types are the arithmetic left shift and the arithmetic right shift. For binary numbers it is a bitwise operation that shifts all of the bits of its operand; every bit in the operand is simply moved a given number of bit positions, and the vacant bit-positions are filled in. Instead of being filled with all 0s, as in logical shift, when shifting to the right, the leftmost bit is replicated to fill in all the vacant positions.

x86 assembly language is the name for the family of assembly languages which provide some level of backward compatibility with CPUs back to the Intel 8008 microprocessor, which was launched in April 1972. It is used to produce object code for the x86 class of processors.

In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operations and directly supported by the processor. Most bitwise operations are presented as two-operand instructions where the result replaces one of the input operands.

Two's complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the greatest value as the sign to indicate whether the binary number is positive or negative; when the most significant bit is 1 the number is signed as negative and when the most significant bit is 0 the number is signed as positive. As a result, positive numbers are represented as themselves: 6 is 0110, zero is 1000, and -6 is 1010. Note that while the number of binary bits is fixed throughout a computation it is otherwise arbitrary.

A power of two is a number of the form $2 n$ where $n$ is an integer, that is, the result of exponentiation with number two as the base and integer $n$ as the exponent.

Hexadecimal floating point is a format for encoding floating-point numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture, as well as machines which were intended to be application-compatible with System/360.

The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor.

In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word is an important characteristic of any specific processor design or computer architecture.

The Intel BCD opcodes are a set of six x86 instructions that operate with binary-coded decimal numbers. The radix used for the representation of numbers in the x86 processors is 2. This is called a binary numeral system. However, the x86 processors do have limited support for the decimal numeral system.

Bit manipulation is the act of algorithmically manipulating bits or other pieces of data shorter than a word. Computer programming tasks that require bit manipulation include low-level device control, error detection and correction algorithms, data compression, encryption algorithms, and optimization. For most other tasks, modern programming languages allow the programmer to work directly with abstractions instead of bits that represent those abstractions.

In the x86 architecture, the CPUID instruction is a processor supplementary instruction allowing software to discover details of the processor. It was introduced by Intel in 1993 with the launch of the Pentium and SL-enhanced 486 processors.

This article describes the calling conventions used when programming x86 architecture microprocessors.

The PDP-11 architecture is a 16-bit CISC instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). It is implemented by central processing units (CPUs) and microprocessors used in PDP-11 minicomputers. It was in wide use during the 1970s, but was eventually overshadowed by the more powerful VAX architecture in the 1980s.

In computing, quadruple precision is a binary floating-point–based computer number format that occupies 16 bytes with precision at least twice the 53-bit double precision.

The IBM System/360 architecture is the model independent architecture for the entire S/360 line of mainframe computers, including but not limited to the instruction set architecture. The elements of the architecture are documented in the IBM System/360 Principles of Operation and the IBM System/360 I/O Interface Channel to Control Unit Original Equipment Manufacturers' Information manuals.

The ones' complement of a binary number is the value obtained by inverting (flipping) all the bits in the binary representation of the number. The name "ones' complement" refers to the fact that such an inverted value, if added to the original, would always produce an "all ones" number. This mathematical operation is primarily of interest in computer science, where it has varying effects depending on how a specific computer represents numbers.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200, and then later in a number of AMD and other Intel CPUs. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

References

Mano, Morris M.; Kime, Charles R. (2004). Logic and Computer Design Fundamentals (3rd ed.), pp 453. Pearson Prentice Hall. ISBN 0-13-140539-X.

Notes

↑ RAX - 64 bit accumulator

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] RAX - 64 bit accumulator

[lower-alpha 1]

Sign extension

Contents

Zero extension

See also

Related Research Articles

References

Notes