Sign bit

Last updated

In computer science, the sign bit is a bit in a signed number representation that indicates the sign of a number. Although only signed numeric data types have a sign bit, it is invariably located in the most significant bit position, [1] so the term may be used interchangeably with "most significant bit" in some contexts.

Contents

Almost always, if the sign bit is 0, the number is non-negative (positive or zero). [1] If the sign bit is 1 then the number is negative. Formats other than two's complement integers allow a signed zero: distinct "positive zero" and "negative zero" representations, the latter of which does not correspond to the mathematical concept of a negative number.

When using a complement representation, to convert a signed number to a wider format the additional bits must be filled with copies of the sign bit in order to preserve its numerical value, [2] :61–62 a process called sign extension or sign propagation. [3]

Sign bit weight in Two's Compliment

BitsValue using Two's Compliment
00000
00011
01117
1000-8
1001-7
1111-1

Two's Compliment is by far the most common format for signed integers. In Two's Compliment, the sign bit has the weight -2w-1 where w is equal to the bits position in the number. [1] With an 8-bit integer, the sign bit would have the value of -28-1, or -128. Due to this value being larger than all the other bits combined, having this bit set would ultimately make the number negative, thus changing the sign.

Sign bit weight in Ones' Compliment

BitsValue using One's Compliment
00000
00011
01117
1000-7
1001-6
1111-0

Ones' compliment is similar to Two's Compliment, but the sign bit has the weight -(2w-1 +1) where w is equal to the bits position in the number.[ citation needed ] With an 8-bit integer, the sign bit would have a value of -(28-1 +1), or -127. This allows for two types of zero: positive and negative, which is not possible with Two's compliment.

Sign bit in Sign Magnitude Integers

BitsValue using Sign Magnitude
00000
00011
01117
1000-0
1001-1
1111-7

Using Sign Magnitude, the sign bit directly determines the sign. If the sign bit is 0, the number is positive; if the sign bit is 1, the number is negative. [2] :52–54 Similarly with Ones' Compliment, this allows for both a positive and a negative zero.

Sign bit in Floating Point Numbers

Floating-point numbers, such as IEEE format, IBM format, VAX format, and even the format used by the Zuse Z1 and Z3 use a Sign and magnitude representation.[ citation needed ]

Related Research Articles

<span class="mw-page-title-main">Binary-coded decimal</span> System of digitally encoding numbers

In computing and electronic systems, binary-coded decimal (BCD) is a class of binary encodings of decimal numbers where each digit is represented by a fixed number of bits, usually four or eight. Sometimes, special bit patterns are used for a sign or other indications.

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers. Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.

<span class="mw-page-title-main">Arithmetic shift</span> Shift operator in computer programming

In computer programming, an arithmetic shift is a shift operator, sometimes termed a signed shift. The two basic types are the arithmetic left shift and the arithmetic right shift. For binary numbers it is a bitwise operation that shifts all of the bits of its operand; every bit in the operand is simply moved a given number of bit positions, and the vacant bit-positions are filled in. Instead of being filled with all 0s, as in logical shift, when shifting to the right, the leftmost bit is replicated to fill in all the vacant positions.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

In computing, NaN, standing for Not a Number, is a particular value of a numeric data type which is undefined as a number, such as the result of 0/0. Systematic use of NaNs was introduced by the IEEE 754 floating-point standard in 1985, along with the representation of other non-finite quantities such as infinities.

Two's complement is the most common method of representing signed integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the greatest place value as the sign to indicate whether the binary number is positive or negative. When the most significant bit is 1, the number is signed as negative; and when the most significant bit is 0 the number is signed as positive.

<span class="mw-page-title-main">Method of complements</span> Method of subtraction

In mathematics and computing, the method of complements is a technique to encode a symmetric range of positive and negative integers in a way that they can use the same algorithm for addition throughout the whole range. For a given number of places half of the possible representations of numbers encode the positive numbers, the other half represents their respective additive inverses. The pairs of mutually additive inverse numbers are called complements. Thus subtraction of any number is implemented by adding its complement. Changing the sign of any number is encoded by generating its complement, which can be done by a very simple and efficient algorithm. This method was commonly used in mechanical calculators and is still used in modern computers. The generalized concept of the radix complement is also valuable in number theory, such as in Midy's theorem.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In computing, signed number representations are required to encode negative numbers in binary number systems.

In computing, signedness is a property of data types representing numbers in computer programs. A numeric variable is signed if it can represent both positive and negative numbers, and unsigned if it can only represent non-negative numbers.

In the C programming language, data types constitute the semantics and characteristics of storage of data elements. They are expressed in the language syntax in form of declarations for memory locations or variables. Data types also determine the types of operations or methods of processing of data elements.

Signed zero is zero with an associated sign. In ordinary arithmetic, the number 0 does not have a sign, so that −0, +0 and 0 are equivalent. However, in computing, some number representations allow for the existence of two zeros, often denoted by −0 and +0, regarded as equal by the numerical comparison operations but with possible different behaviors in particular operations. This occurs in the sign-magnitude and ones' complement signed number representations for integers, and in most floating-point number representations. The number 0 is usually encoded as +0, but can still be represented by +0, −0, or 0.

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

<span class="mw-page-title-main">Sign (mathematics)</span> Number property of being positive or negative

In mathematics, the sign of a real number is its property of being either positive, negative, or 0.

Offset binary, also referred to as excess-K, excess-N, excess-e, excess code or biased representation, is a method for signed number representation where a signed number n is represented by the bit pattern corresponding to the unsigned number n+K, K being the biasing value or offset. There is no standard for offset binary, but most often the K for an n-bit binary word is K = 2n−1 (for example, the offset for a four-digit binary number would be 23=8). This has the consequence that the minimal negative value is represented by all-zeros, the "zero" value is represented by a 1 in the most significant bit and zero in all other bits, and the maximal positive value is represented by all-ones (conveniently, this is the same as using two's complement but with the most significant bit inverted). It also has the consequence that in a logical comparison operation, one gets the same result as with a true form numerical comparison operation, whereas, in two's complement notation a logical comparison will agree with true form numerical comparison operation if and only if the numbers being compared have the same sign. Otherwise the sense of the comparison will be inverted, with all negative values being taken as being larger than all positive values.

The ones' complement of a binary number is the value obtained by inverting (flipping) all the bits in the binary representation of the number. The name "ones' complement" refers to the fact that such an inverted value, if added to the original, would always produce an "all ones" number. This mathematical operation is primarily of interest in computer science, where it has varying effects depending on how a specific computer represents numbers.

The bfloat16 floating-point format is a computer number format occupying 16 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. This format is a shortened (16-bit) version of the 32-bit IEEE 754 single-precision floating-point format (binary32) with the intent of accelerating machine learning and near-sensor computing. It preserves the approximate dynamic range of 32-bit floating-point numbers by retaining 8 exponent bits, but supports only an 8-bit precision rather than the 24-bit significand of the binary32 format. More so than single-precision 32-bit floating-point numbers, bfloat16 numbers are unsuitable for integer calculations, but this is not their intended use. Bfloat16 is used to reduce the storage requirements and increase the calculation speed of machine learning algorithms.

References

  1. 1 2 3 "Intel® 64 and IA-32 Architectures Software Developer's Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4". Intel. Retrieved 2024-03-13.
  2. 1 2 Bryant, Randal E.; O'Hallaron, David R. (2003). "Chapter 2: Representing and Manipulating Information". Computer Systems: a Programmer's Perspective. Upper Saddle River, New Jersey: Prentice Hall. ISBN   0-13-034074-X.
  3. "Data Dictionary (Glossary and Algorithms)". Adroit Data Recovery Centre Pte Ltd. Archived from the original on 2017-04-19. Retrieved 2014-12-15.