2Sum

Last updated

2Sum [1] is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.

Contents

2Sum and its variant Fast2Sum were first published by Ole Møller in 1965. [2] Fast2Sum is often used implicitly in other algorithms such as compensated summation algorithms; [1] Kahan's summation algorithm was published first in 1965, [3] and Fast2Sum was later factored out of it by Dekker in 1971 for double-double arithmetic algorithms. [4] The names 2Sum and Fast2Sum appear to have been applied retroactively by Shewchuk in 1997. [5]

Algorithm

Given two floating-point numbers and , 2Sum computes the floating-point sum rounded to nearest and the floating-point error so that , where and respectively denote the addition and subtraction rounded to nearest. The error is itself a floating-point number.

Inputs floating-point numbers
Outputs rounded sum and exact error
  1. return

Provided the floating-point arithmetic is correctly rounded to nearest (with ties resolved any way), as is the default in IEEE 754, and provided the sum does not overflow and, if it underflows, underflows gradually, it can be proven that . [1] [6] [2]

A variant of 2Sum called Fast2Sum uses only three floating-point operations, for floating-point arithmetic in radix 2 or radix 3, under the assumption that the exponent of is at least as large as the exponent of , such as when : [1] [6] [7] [4]

Inputs radix-2 or radix-3 floating-point numbers and , of which at least one is zero, or which respectively have normalized exponents
Outputs rounded sum and exact error
  1. return

Even if the conditions are not satisfied, 2Sum and Fast2Sum often provide reasonable approximations to the error, i.e. , which enables algorithms for compensated summation, dot-product, etc., to have low error even if the inputs are not sorted or the rounding mode is unusual. [1] [2] More complicated variants of 2Sum and Fast2Sum also exist for rounding modes other than round-to-nearest. [1]

See also

Related Research Articles

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

IEEE 754-1985 is a historic industry standard for representing floating-point numbers in computers, officially adopted in 1985 and superseded in 2008 by IEEE 754-2008, and then again in 2019 by minor revision IEEE 754-2019. During its 23 years, it was the most widely used format for floating-point computation. It was implemented in software, in the form of floating-point libraries, and in hardware, in the instructions of many CPUs and FPUs. The first integrated circuit to implement the draft of what was to become IEEE 754-1985 was the Intel 8087.

Double-precision floating-point format is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

<span class="mw-page-title-main">Rounding</span> Replacing a number with a simpler value

Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression 2 with 1.414.

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach. This is done by keeping a separate running compensation, in effect extending the precision of the sum by the precision of the compensation variable.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error. When using approximation equations or algorithms, especially when using finitely many digits to represent real numbers, one of the goals of numerical analysis is to estimate computation errors. Computation errors, also called numerical errors, include both truncation errors and roundoff errors.

Significance arithmetic is a set of rules for approximating the propagation of uncertainty in scientific or statistical calculations. These rules can be used to find the appropriate number of significant figures to use to represent the result of a calculation. If a calculation is done without analysis of the uncertainty involved, a result that is written with too many significant figures can be taken to imply a higher precision than is known, and a result that is written with too few significant figures results in an avoidable loss of precision. Understanding these rules requires a good understanding of the concept of significant and insignificant figures.

The term arithmetic underflow is a condition in a computer program where the result of a calculation is a number of more precise absolute value than the computer can actually represent in memory on its central processing unit (CPU).

In computer science and numerical analysis, unit in the last place or unit of least precision (ulp) is the spacing between two consecutive floating-point numbers, i.e., the value the least significant digit represents if it is 1. It is used as a measure of accuracy in numeric calculations.

Machine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subject of computational science. The quantity is also called macheps and it has the symbols Greek epsilon .

Extended precision refers to floating-point number formats that provide greater precision than the basic floating-point formats. Extended precision formats support a basic format by minimizing roundoff and overflow errors in intermediate values of expressions on the base format. In contrast to extended precision, arbitrary-precision arithmetic refers to implementations of much larger numeric types using special software.

Decimal floating-point (DFP) arithmetic refers to both a representation and operations on decimal floating-point numbers. Working directly with decimal (base-10) fractions can avoid the rounding errors that otherwise typically occur when converting between decimal fractions and binary (base-2) fractions.

In mathematics, Pythagorean addition is a binary operation on the real numbers that computes the length of the hypotenuse of a right triangle, given its two sides. According to the Pythagorean theorem, for a triangle with sides and , this length can be calculated as

A digital differential analyzer (DDA), also sometimes called a digital integrating computer, is a digital implementation of a differential analyzer. The integrators in a DDA are implemented as accumulators, with the numeric result converted back to a pulse rate by the overflow of the accumulator.

In numerical analysis, catastrophic cancellation is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers.

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence. Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good while having much lower computational cost—it can be implemented so as to have nearly the same cost as naive summation.

Floating-point error mitigation is the minimization of errors caused by the fact that real numbers cannot, in general, be accurately represented in a fixed space. By definition, floating-point error cannot be eliminated, and, at best, can only be managed.

In computing, tapered floating point (TFP) is a format similar to floating point, but with variable-sized entries for the significand and exponent instead of the fixed-length entries found in normal floating-point formats. In addition to this, tapered floating-point formats provide a fixed-size pointer entry indicating the number of digits in the exponent entry. The number of digits of the significand entry results from the difference of the fixed total length minus the length of the exponent and pointer entries.

In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974.

References

  1. 1 2 3 4 5 6 Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Cham, Switzerland: Birkhäuser. pp. 104–111. doi:10.1007/978-3-319-76526-6. ISBN   978-3-319-76525-9. Archived from the original on 2023-04-28. Retrieved 2020-09-20.
  2. 1 2 3 Møller, Ole (March 1965). "Quasi double-precision in floating point addition". BIT Numerical Mathematics. 5: 37–50. doi:10.1007/BF01975722. S2CID   119991676.
  3. Kahan, W. (January 1965). "Further remarks on reducing truncation errors". Communications of the ACM. Association for Computing Machinery. 8 (1): 40. doi: 10.1145/363707.363723 . ISSN   0001-0782. S2CID   22584810.
  4. 1 2 Dekker, T.J. (June 1971). "A floating-point technique for extending the available precision". Numerische Mathematik. 18 (3): 224–242. doi: 10.1007/BF01397083 . S2CID   63218464. Archived from the original on 2020-07-19. Retrieved 2020-09-24.
  5. Shewchuk, Jonathan Richard (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates". Discrete & Computational Geometry . 18 (3): 305–363. doi: 10.1007/PL00009321 .
  6. 1 2 Knuth, Donald E. (1998). The Art of Computer Programming, Volume II: Seminumerical Algorithms (3rd ed.). Addison–Wesley. p. 236. ISBN   978-0-201-89684-8. Archived from the original on 2017-07-16. Retrieved 2020-09-20.
  7. Sterbenz, Pat H. (1974). Floating-Point Computation. Englewood Cliffs, NJ, United States: Prentice-Hall. pp. 138–143. ISBN   0-13-322495-3.