Significand

Last updated

The significand [1] (also coefficient, [1] sometimes also argument, or more ambiguously mantissa, [2] fraction, [3] [4] [nb 1] or characteristic [5] [2] ) is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fractional number.

Contents

Example

The number 123.45 can be represented as a decimal floating-point number with the integer 12345 as the significand and a 10−2 power term, also called characteristics, [6] [7] [8] where −2 is the exponent (and 10 is the base). Its value is given by the following arithmetic:

123.45 = 12345 × 10−2.

The same value can also be represented in scientific notation with the significand 1.2345 as a fractional coefficient, and +2 as the exponent (and 10 as the base):

123.45 = 1.2345 × 10+2.

Schmid, however, called this representation with a significand ranging between 1.0 and 10 a modified normalized form. [7] [8]

For base 2, this 1.xxxx form is also called a normalized significand.

Finally, the value can be represented in the format given by the Language Independent Arithmetic standard and several programming language standards, including Ada, C, Fortran and Modula-2, as

123.45 = 0.12345 × 10+3.

Schmid called this representation with a significand ranging between 0.1 and 1.0 the true normalized form. [7] [8]

The hidden bit in floating point

For a normalized number, the most significant digit is always non-zero. When working in binary, this constraint uniquely determines this digit to always be 1. As such, it is not explicitly stored, being called the hidden bit .

The significand is characterized by its width in (binary) digits, and depending on the context, the hidden bit may or may not be counted toward the width. For example, the same IEEE 754 double-precision format is commonly described as having either a 53-bit significand, including the hidden bit, or a 52-bit significand,[ citation needed ] excluding the hidden bit. IEEE 754 defines the precision p to be the number of digits in the significand, including any implicit leading bit (e.g., p = 53 for the double-precision format), thus in a way independent from the encoding, and the term to express what is encoded (that is, the significand without its leading bit) is trailing significand field.

Floating-point mantissa

In 1946, Arthur Burks used the terms mantissa and characteristic to describe the two parts of a floating-point number (Burks [6] et al.) by analogy with the then-prevalent common logarithm tables: the characteristic is the integer part of the logarithm (i.e. the exponent), and the mantissa is the fractional part. The usage remains common among computer scientists today.

The term significand was introduced by George Forsythe and Cleve Moler in 1967 [9] [10] [11] [4] and is the word used in the IEEE standard [12] as the coefficient in front of a scientific notation number discussed above. The fractional part is called the fraction.

To understand both terms, notice that in binary, 1 + mantissa  significand, and the correspondence is exact when storing a power of two. This fact allows for a fast approximation of the base-2 logarithm, leading to algorithms e.g. for computing the fast square-root and fast inverse-square-root. The implicit leading 1 is nothing but the hidden bit in IEEE 754 floating point, and the bitfield storing the remainder is thus the mantissa.

However, whether or not the implicit 1 is included is a major point of confusion with both terms—and especially so with mantissa. In keeping with the original usage in the context of log tables, it should not be present.

For those contexts where 1 is considered included, William Kahan, [1] lead creator of IEEE 754, and Donald E. Knuth, prominent computer programmer and author of The Art of Computer Programming , [5] condemn the use of mantissa. This has led to declining use of the term mantissa in all contexts. In particular, the current IEEE 754 standard does not mention it.

See also

Notes

  1. The term fraction is used in IEEE 754-1985 with a different meaning: it is the fractional part of the significand, i.e. the significand without its explicit or implicit leading bit.

Related Research Articles

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

Double-precision floating-point format is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

In computer science, subnormal numbers are the subset of denormalized numbers that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest positive normal number is subnormal, while denormal can also refer to numbers outside that range.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

Hexadecimal floating point is a format for encoding floating-point numbers first introduced on the IBM System/360 computers, and supported on subsequent machines based on that architecture, as well as machines which were intended to be application-compatible with System/360.

In computing, minifloats are floating-point values represented with very few bits. Predictably, they are not well suited for general-purpose numerical calculations. They are used for special purposes, most often in computer graphics, where iterations are small and precision has aesthetic effects. Machine learning also uses similar formats like bfloat16. Additionally, they are frequently encountered as a pedagogical tool in computer-science courses to demonstrate the properties and structures of floating-point arithmetic and IEEE 754 numbers.

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions and binary (base-2) fractions.

The IEEE 754-2008 standard includes decimal floating-point number formats in which the significand and the exponent can be encoded in two ways, referred to as binary encoding and decimal encoding.

IEEE 754-2008 is a revision of the IEEE 754 standard for floating-point arithmetic. It was published in August 2008 and is a significant revision to, and replaces, the IEEE 754-1985 standard. The 2008 revision extended the previous standard where it was necessary, added decimal arithmetic and formats, tightened up certain areas of the original standard which were left undefined, and merged in IEEE 854 . In a few cases, where stricter definitions of binary floating-point arithmetic might be performance-incompatible with some existing implementation, they were made optional. In 2019, it was updated with a minor revision IEEE 754-2019.

In computing, half precision is a binary floating-point computer number format that occupies 16 bits in computer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particular image processing and neural networks.

In computing, quadruple precision is a binary floating-point–based computer number format that occupies 16 bytes with precision at least twice the 53-bit double precision.

Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

In computing, decimal32 is a decimal floating-point computer numbering format that occupies 4 bytes (32 bits) in computer memory. It is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations. Like the binary16 format, it is intended for memory saving storage.

In computing, decimal64 is a decimal floating-point computer numbering format that occupies 8 bytes in computer memory. It is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations.

decimal128 is a decimal floating-point computer number format that occupies 128 bits in computer memory. Formally introduced in IEEE 754-2008, it is intended for applications where it is necessary to emulate decimal rounding exactly, such as financial and tax computations.

In computing, octuple precision is a binary floating-point-based computer number format that occupies 32 bytes in computer memory. This 256-bit octuple precision is for applications requiring results in higher than quadruple precision. This format is rarely used and very few environments support it.

Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only be managed.

References

  1. 1 2 3 Kahan, William Morton (2002-04-19). "Names for Standardized Floating-Point Formats" (PDF). Archived (PDF) from the original on 2023-12-27. Retrieved 2023-12-27. […] m is the significand or coefficient or (wrongly) mantissa […] (8 pages)
  2. 1 2 Gosling, John B. (1980). "6.1 Floating-Point Notation / 6.8.5 Exponent Representation". In Sumner, Frank H. (ed.). Design of Arithmetic Units for Digital Computers. Macmillan Computer Science Series (1 ed.). Department of Computer Science, University of Manchester, Manchester, UK: The Macmillan Press Ltd. pp. 74, 91, 137–138. ISBN   0-333-26397-9. […] In floating-point representation, a number x is represented by two signed numbers m and e such that x = m·be where m is the mantissa, e the exponent and b the base. […] The mantissa is sometimes termed the characteristic and a version of the exponent also has this title from some authors. It is hoped that the terms here will be unambiguous. […] [w]e use a[n exponent] value which is shifted by half the binary range of the number. […] This special form is sometimes referred to as a biased exponent, since it is the conventional value plus a constant. Some authors have called it a characteristic, but this term should not be used, since CDC and others use this term for the mantissa. It is also referred to as an 'excess -' representation, where, for example, - is 64 for a 7-bit exponent (271 = 64). […] (NB. Gosling does not mention the term significand at all.)
  3. English Electric KDF9: Very high speed data processing system for Commerce, Industry, Science (PDF) (Product flyer). English Electric. c. 1961. Publication No. DP/103. 096320WP/RP0961. Archived (PDF) from the original on 2020-07-27. Retrieved 2020-07-27.
  4. 1 2 Savard, John J. G. (2018) [2005]. "Floating-Point Formats". quadibloc. A Note on Field Designations. Archived from the original on 2018-07-03. Retrieved 2018-07-16.
  5. 1 2 Knuth, Donald E. The Art of Computer Programming. Vol. 2. p. 214. ISBN   0-201-89684-2. […] Other names are occasionally used for this purpose, notably 'characteristic' and 'mantissa'; but it is an abuse of terminology to call the fraction part a mantissa, since that term has quite a different meaning in connection with logarithms. Furthermore the English word mantissa means 'a worthless addition.' […]
  6. 1 2 Burks, Arthur Walter; Goldstine, Herman H.; von Neumann, John (1963) [1946]. "5.3.". In Taub, A. H. (ed.). Preliminary discussion of the logical design of an electronic computing instrument (PDF) (Technical report, Institute for Advanced Study, Princeton, New Jersey, USA). Collected Works of John von Neumann. Vol. 5. New York, USA: The Macmillan Company. p. 42. Retrieved 2016-02-07. […] Several of the digital computers being built or planned in this country and England are to contain a so-called "floating decimal point". This is a mechanism for expressing each word as a characteristic and a mantissa—e.g. 123.45 would be carried in the machine as (0.12345,03), where the 3 is the exponent of 10 associated with the number. […]
  7. 1 2 3 Schmid, Hermann (1974). Decimal Computation (1 ed.). Binghamton, New York, USA: John Wiley & Sons, Inc. p.  204-205. ISBN   0-471-76180-X . Retrieved 2016-01-03.
  8. 1 2 3 Schmid, Hermann (1983) [1974]. Decimal Computation (1 (reprint) ed.). Malabar, Florida, USA: Robert E. Krieger Publishing Company. pp. 204–205. ISBN   0-89874-318-4 . Retrieved 2016-01-03. (NB. At least some batches of this reprint edition were misprints with defective pages 115–146.)
  9. Forsythe, George Elmer; Moler, Cleve Barry (September 1967). Computer Solution of Linear Algebraic Systems. Automatic Computation (1st ed.). New Jersey, USA: Prentice-Hall, Englewood Cliffs. ISBN   0-13-165779-8.
  10. Sterbenz, Pat H. (1974-05-01). Floating-Point Computation. Prentice-Hall Series in Automatic Computation (1 ed.). Englewood Cliffs, New Jersey, USA: Prentice Hall. ISBN   0-13-322495-3.
  11. Goldberg, David (March 1991). "What Every Computer Scientist Should Know About Floating-Point Arithmetic" (PDF). Computing Surveys . 23 (1). Xerox Palo Alto Research Center (PARC), Palo Alto, California, USA: Association for Computing Machinery, Inc.: 7. Archived (PDF) from the original on 2016-07-13. Retrieved 2016-07-13. […] This term was introduced by Forsythe and Moler [1967], and has generally replaced the older term mantissa. […] (NB. A newer edited version can be found here: )
  12. 754-2019 - IEEE Standard for Floating-Point Arithmetic. IEEE. 2019. doi:10.1109/IEEESTD.2019.8766229. ISBN   978-1-5044-5924-2.