Computer arithmetic

Last updated

Computer arithmetic is the scientific field that deals with representation of numbers on computers and corresponding implementations of the arithmetic operations. [1]

It includes:

In the cases where the size of the representation of a number is fixed (fixed-point, floating-point and interval arithmetic), the main concern is the control the computational error, as far as possible; see, for example IEEE 754.

In the other cases, where an exact result should be provided, the main concern is the practical efficiency, which is optimized by combining improvements of computational complexity with hardware specificities.

ARITH Symposium on Computer Arithmetic is an international symposium devoted to computer arithmetic.

Related Research Articles

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.

In computing, NaN, standing for Not a Number, is a particular value of a numeric data type which is undefined as a number, such as the result of 0/0. Systematic use of NaNs was introduced by the IEEE 754 floating-point standard in 1985, along with the representation of other non-finite quantities such as infinities.

Double-precision floating-point format is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

<span class="mw-page-title-main">Rounding</span> Replacing a number with a simpler value

Rounding or rounding off means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression √2 with 1.414.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

The significand is the first (left) part of a number in scientific notation or related concepts in floating-point representation, consisting of its significant digits. Depending on the interpretation of the exponent, the significand may represent an integer or a fractional number.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In computer science, arbitrary-precision arithmetic, also called bignum arithmetic, multiple-precision arithmetic, or sometimes infinite-precision arithmetic, indicates that calculations are performed on numbers whose digits of precision are potentially limited only by the available memory of the host system. This contrasts with the faster fixed-precision arithmetic found in most arithmetic logic unit (ALU) hardware, which typically offers between 8 and 64 bits of precision.

A residue numeral system (RNS) is a numeral system representing integers by their values modulo several pairwise coprime integers called the moduli. This representation is allowed by the Chinese remainder theorem, which asserts that, if M is the product of the moduli, there is, in an interval of length M, exactly one integer having any given set of modular values. The arithmetic of a residue numeral system is also called multi-modular arithmetic.

<span class="mw-page-title-main">GNU MPFR</span> C library for arbitrary-precision floating-point arithmetic

The GNU Multiple Precision Floating-Point Reliable Library is a GNU portable C library for arbitrary-precision binary floating-point computation with correct rounding, based on GNU Multi-Precision Library.

The term arithmetic underflow is a condition in a computer program where the result of a calculation is a number of more precise absolute value than the computer can actually represent in memory on its central processing unit (CPU).

Affine arithmetic (AA) is a model for self-validated numerical analysis. In AA, the quantities of interest are represented as affine combinations of certain primitive variables, which stand for sources of uncertainty in the data or approximations made during the computation.

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions and binary (base-2) fractions.

IEEE 754-2008 is a revision of the IEEE 754 standard for floating-point arithmetic. It was published in August 2008 and is a significant revision to, and replaces, the IEEE 754-1985 standard. The 2008 revision extended the previous standard where it was necessary, added decimal arithmetic and formats, tightened up certain areas of the original standard which were left undefined, and merged in IEEE 854 . In a few cases, where stricter definitions of binary floating-point arithmetic might be performance-incompatible with some existing implementation, they were made optional. In 2019, it was updated with a minor revision IEEE 754-2019.

In software engineering and numerical analysis, a binade is a set of numbers in a binary floating-point format that all have the same sign and exponent. In other words, a binade is the interval or for some integer value , that is, the set of real numbers or floating-point numbers of the same sign such that .

Unums are a family of number formats and arithmetic for implementing real numbers on a computer, proposed by John L. Gustafson in 2015. They are designed as an alternative to the ubiquitous IEEE 754 floating-point standard. The latest version is known as posits.

Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only be managed.

In computing, tapered floating point (TFP) is a format similar to floating point, but with variable-sized entries for the significand and exponent instead of the fixed-length entries found in normal floating-point formats. In addition to this, tapered floating-point formats provide a fixed-size pointer entry indicating the number of digits in the exponent entry. The number of digits of the significand entry results from the difference of the fixed total length minus the length of the exponent and pointer entries.

Validated numerics, or rigorous computation, verified computation, reliable computation, numerical verification is numerics including mathematically strict error evaluation, and it is one field of numerical analysis. For computation, interval arithmetic is used, and all results are represented by intervals. Validated numerics were used by Warwick Tucker in order to solve the 14th of Smale's problems, and today it is recognized as a powerful tool for the study of dynamical systems.

References

  1. Parhami, Behrooz (2003), "Number Representation and Computer Arithmetic" (PDF), Encyclopedia of Information Systems, Elsevier, pp. 217–333