Kahan summation algorithm

Last updated

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, [1] significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach. This is done by keeping a separate running compensation (a variable to accumulate small errors), in effect extending the precision of the sum by the precision of the compensation variable.

Contents

In particular, simply summing numbers in sequence has a worst-case error that grows proportional to , and a root mean square error that grows as for random inputs (the roundoff errors form a random walk). [2] With compensated summation, using a compensation variable with sufficiently high precision the worst-case error bound is effectively independent of , so a large number of values can be summed with an error that only depends on the floating-point precision of the result. [2]

The algorithm is attributed to William Kahan; [3] Ivo Babuška seems to have come up with a similar algorithm independently (hence KahanBabuška summation). [4] Similar, earlier techniques are, for example, Bresenham's line algorithm, keeping track of the accumulated error in integer operations (although first documented around the same time [5] ) and the delta-sigma modulation. [6]

The algorithm

In pseudocode, the algorithm will be:

function KahanSum(input)     // Prepare the accumulator.     var sum = 0.0     // A running compensation for lost low-order bits.     var c = 0.0     // The array input has elements indexed input[1] to input[input.length].     for i = 1 to input.length do         // c is zero the first time around.         var y = input[i] - c         // Alas, sum is big, y small, so low-order digits of y are lost.                  var t = sum + y         // (t - sum) cancels the high-order part of y;         // subtracting y recovers negative (low part of y)         c = (t - sum) - y         // Algebraically, c should always be zero. Beware         // overly-aggressive optimizing compilers!         sum = t     // Next time around, the lost low part will be added to y in a fresh attempt.     next i      return sum

This algorithm can also be rewritten to use the Fast2Sum algorithm: [7]

function KahanSum2(input)     // Prepare the accumulator.     var sum = 0.0     // A running compensation for lost low-order bits.     var c = 0.0     // The array input has elements indexed      for i = 1 to input.length do         // c is zero the first time around.         var y = input[i] + c         // sum + c is an approximation to the exact sum.         (sum,c) = Fast2Sum(sum,y)     // Next time around, the lost low part will be added to y in a fresh attempt.     next i     return sum

Worked example

The algorithm does not mandate any specific choice of radix, only for the arithmetic to "normalize floating-point sums before rounding or truncating" [3] . Computers typically use binary arithmetic, but to make the example easier to read, it will be given in decimal. Suppose we are using six-digit decimal floating-point arithmetic, sum has attained the value 10000.0, and the next two values of input[i] are 3.14159 and 2.71828. The exact result is 10005.85987, which rounds to 10005.9. With a plain summation, each incoming value would be aligned with sum, and many low-order digits would be lost (by truncation or rounding). The first result, after rounding, would be 10003.1. The second result would be 10005.81828 before rounding and 10005.8 after rounding. This is not correct.

However, with compensated summation, we get the correctly rounded result of 10005.9.

Assume that c has the initial value zero. Trailing zeros shown where they are significant for the six-digit floating-point number.

  y = 3.14159 - 0.00000             y = input[i] - c   t = 10000.0 + 3.14159             t = sum + y     = 10003.14159                   Normalization done, next round off to six digits.     = 10003.1                       Few digits from input[i] met those of sum. Many digits have been lost!   c = (10003.1 - 10000.0) - 3.14159 c = (t - sum) - y  (Note: Parenthesis must be evaluated first!)     = 3.10000 - 3.14159             The assimilated part of y minus the original full y.     = -0.0415900                    Because c is close to zero, normalization retains many digits after the floating point.  sum = 10003.1                       sum = t

The sum is so large that only the high-order digits of the input numbers are being accumulated. But on the next step, c, an approximation of the running error, counteracts the problem.

  y = 2.71828 - (-0.0415900)        Most digits meet, since c is of a size similar to y.     = 2.75987                       The shortfall (low-order digits lost) of previous iteration successfully reinstated.   t = 10003.1 + 2.75987             But still only few meet the digits of sum.     = 10005.85987                   Normalization done, next round to six digits.     = 10005.9                       Again, many digits have been lost, but c helped nudge the round-off.   c = (10005.9 - 10003.1) - 2.75987 Estimate the accumulated error, based on the adjusted y.     = 2.80000 - 2.75987             As expected, the low-order parts can be retained in c with no or minor round-off effects.     = 0.0401300                     In this iteration, t was a bit too high, the excess will be subtracted off in next iteration. sum = 10005.9                       Exact result is 10005.85987, sum is correct, rounded to 6 digits.

The algorithm performs summation with two accumulators: sum holds the sum, and c accumulates the parts not assimilated into sum, to nudge the low-order part of sum the next time around. Thus the summation proceeds with "guard digits" in c, which is better than not having any, but is not as good as performing the calculations with double the precision of the input. However, simply increasing the precision of the calculations is not practical in general; if input is already in double precision, few systems supply quadruple precision, and if they did, input could then be in quadruple precision.

Accuracy

A careful analysis of the errors in compensated summation is needed to appreciate its accuracy characteristics. While it is more accurate than naive summation, it can still give large relative errors for ill-conditioned sums.

Suppose that one is summing values , for . The exact sum is

(computed with infinite precision).

With compensated summation, one instead obtains , where the error is bounded by [2]

where is the machine precision of the arithmetic being employed (e.g. for IEEE standard double-precision floating point). Usually, the quantity of interest is the relative error , which is therefore bounded above by

In the expression for the relative error bound, the fraction is the condition number of the summation problem. Essentially, the condition number represents the intrinsic sensitivity of the summation problem to errors, regardless of how it is computed. [8] The relative error bound of every (backwards stable) summation method by a fixed algorithm in fixed precision (i.e. not those that use arbitrary-precision arithmetic, nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number. [2] An ill-conditioned summation problem is one in which this ratio is large, and in this case even compensated summation can have a large relative error. For example, if the summands are uncorrelated random numbers with zero mean, the sum is a random walk, and the condition number will grow proportional to . On the other hand, for random inputs with nonzero mean the condition number asymptotes to a finite constant as . If the inputs are all non-negative, then the condition number is 1.

Given a condition number, the relative error of compensated summation is effectively independent of . In principle, there is the that grows linearly with , but in practice this term is effectively zero: since the final result is rounded to a precision , the term rounds to zero, unless is roughly or larger. [2] In double precision, this corresponds to an of roughly , much larger than most sums. So, for a fixed condition number, the errors of compensated summation are effectively , independent of .

In comparison, the relative error bound for naive summation (simply adding the numbers in sequence, rounding at each step) grows as multiplied by the condition number. [2] This worst-case error is rarely observed in practice, however, because it only occurs if the rounding errors are all in the same direction. In practice, it is much more likely that the rounding errors have a random sign, with zero mean, so that they form a random walk; in this case, naive summation has a root mean square relative error that grows as multiplied by the condition number. [9] This is still much worse than compensated summation, however. However, if the sum can be performed in twice the precision, then is replaced by , and naive summation has a worst-case error comparable to the term in compensated summation at the original precision.

By the same token, the that appears in above is a worst-case bound that occurs only if all the rounding errors have the same sign (and are of maximal possible magnitude). [2] In practice, it is more likely that the errors have random sign, in which case terms in are replaced by a random walk, in which case, even for random inputs with zero mean, the error grows only as (ignoring the term), the same rate the sum grows, canceling the factors when the relative error is computed. So, even for asymptotically ill-conditioned sums, the relative error for compensated summation can often be much smaller than a worst-case analysis might suggest.

Further enhancements

Neumaier [10] introduced an improved version of Kahan algorithm, which he calls an "improved Kahan–Babuška algorithm", which also covers the case when the next term to be added is larger in absolute value than the running sum, effectively swapping the role of what is large and what is small. In pseudocode, the algorithm is:

function KahanBabushkaNeumaierSum(input)     var sum = 0.0     var c = 0.0                       // A running compensation for lost low-order bits.      for i = 1 to input.length dovar t = sum + input[i]         if |sum| >= |input[i]| then             c += (sum - t) + input[i] // If sum is bigger, low-order digits of input[i] are lost.         else             c += (input[i] - t) + sum // Else low-order digits of sum are lost.         endif         sum = t     next i      return sum + c                    // Correction only applied once in the very end.

This enhancement is similar to the replacement of Fast2Sum by 2Sum in Kahan's algorithm rewritten with Fast2Sum.

For many sequences of numbers, both algorithms agree, but a simple example due to Peters [11] shows how they can differ. For summing in double precision, Kahan's algorithm yields 0.0, whereas Neumaier's algorithm yields the correct value 2.0.

Higher-order modifications of better accuracy are also possible. For example, a variant suggested by Klein, [12] which he called a second-order "iterative Kahan–Babuška algorithm". In pseudocode, the algorithm is:

function KahanBabushkaKleinSum(input)     var sum = 0.0     var cs  = 0.0     var ccs = 0.0     var c   = 0.0     var cc  = 0.0      for i = 1 to input.length dovar t = sum + input[i]         if |sum| >= |input[i]| then             c = (sum - t) + input[i]         else             c = (input[i] - t) + sum         endif         sum = t         t = cs + c         if |cs| >= |c| then             cc = (cs - t) + c         else             cc = (c - t) + cs         endif         cs = t         ccs = ccs + cc     end loopreturn sum + cs + ccs

Alternatives

Although Kahan's algorithm achieves error growth for summing n numbers, only slightly worse growth can be achieved by pairwise summation: one recursively divides the set of numbers into two halves, sums each half, and then adds the two sums. [2] This has the advantage of requiring the same number of arithmetic operations as the naive summation (unlike Kahan's algorithm, which requires four times the arithmetic and has a latency of four times a simple summation) and can be calculated in parallel. The base case of the recursion could in principle be the sum of only one (or zero) numbers, but to amortize the overhead of recursion, one would normally use a larger base case. The equivalent of pairwise summation is used in many fast Fourier transform (FFT) algorithms and is responsible for the logarithmic growth of roundoff errors in those FFTs. [13] In practice, with roundoff errors of random signs, the root mean square errors of pairwise summation actually grow as . [9]

Another alternative is to use arbitrary-precision arithmetic, which in principle need no rounding at all with a cost of much greater computational effort. A way of performing correctly rounded sums using arbitrary precision is to extend adaptively using multiple floating-point components. This will minimize computational cost in common cases where high precision is not needed. [14] [11] Another method that uses only integer arithmetic, but a large accumulator, was described by Kirchner and Kulisch; [15] a hardware implementation was described by Müller, Rüb and Rülling. [16]

Possible invalidation by compiler optimization

In principle, a sufficiently aggressive optimizing compiler could destroy the effectiveness of Kahan summation: for example, if the compiler simplified expressions according to the associativity rules of real arithmetic, it might "simplify" the second step in the sequence

t = sum + y;
c = (t - sum) - y;

to

c = ((sum + y) - sum) - y;

and then to

c = 0;

thus eliminating the error compensation. [17] In practice, many compilers do not use associativity rules (which are only approximate in floating-point arithmetic) in simplifications, unless explicitly directed to do so by compiler options enabling "unsafe" optimizations, [18] [19] [20] [21] although the Intel C++ Compiler is one example that allows associativity-based transformations by default. [22] The original K&R C version of the C programming language allowed the compiler to re-order floating-point expressions according to real-arithmetic associativity rules, but the subsequent ANSI C standard prohibited re-ordering in order to make C better suited for numerical applications (and more similar to Fortran, which also prohibits re-ordering), [23] although in practice compiler options can re-enable re-ordering, as mentioned above.

A portable way to inhibit such optimizations locally is to break one of the lines in the original formulation into two statements, and make two of the intermediate products volatile:

function KahanSum(input)     var sum = 0.0     var c = 0.0      for i = 1 to input.length dovar y = input[i] - c         volatile var t = sum + y         volatile var z = t - sum         c = z - y         sum = t     next i      return sum

Support by libraries

In general, built-in "sum" functions in computer languages typically provide no guarantees that a particular summation algorithm will be employed, much less Kahan summation.[ citation needed ] The BLAS standard for linear algebra subroutines explicitly avoids mandating any particular computational order of operations for performance reasons, [24] and BLAS implementations typically do not use Kahan summation.

The standard library of the Python computer language specifies an fsum function for accurate summation. Starting with Python 3.12, the built-in "sum()" function uses the Neumaier summation. [25]

In the Julia language, the default implementation of the sum function does pairwise summation for high accuracy with good performance, [26] but an external library provides an implementation of Neumaier's variant named sum_kbn for the cases when higher accuracy is needed. [27]

In the C# language, HPCsharp nuget package implements the Neumaier variant and pairwise summation: both as scalar, data-parallel using SIMD processor instructions, and parallel multi-core. [28]

See also

Related Research Articles

In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input. Very frequently, one is solving the inverse problem: given one is solving for x, and thus the condition number of the (local) inverse must be used.

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

<span class="mw-page-title-main">Rounding</span> Replacing a number with a simpler value

Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression √2 with 1.414.

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal for the theorem to apply, nor do they need to be independent and identically distributed.

Additive white Gaussian noise (AWGN) is a basic noise model used in information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found in the diverse floating-point implementations that made them difficult to use reliably and portably. Many hardware floating-point units use the IEEE 754 standard.

In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error. When using approximation equations or algorithms, especially when using finitely many digits to represent real numbers, one of the goals of numerical analysis is to estimate computation errors. Computation errors, also called numerical errors, include both truncation errors and roundoff errors.

In statistics, a studentized residual is the dimensionless ratio resulting from the division of a residual by an estimate of its standard deviation, both expressed in the same units. It is a form of a Student's t-statistic, with the estimate of error varying between points.

<span class="mw-page-title-main">Approximation theory</span> Theory of getting acceptably close inexact mathematical calculations

In mathematics, approximation theory is concerned with how functions can best be approximated with simpler functions, and with quantitatively characterizing the errors introduced thereby. What is meant by best and simpler will depend on the application.

In information theory, Shannon's source coding theorem establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Significance arithmetic is a set of rules for approximating the propagation of uncertainty in scientific or statistical calculations. These rules can be used to find the appropriate number of significant figures to use to represent the result of a calculation. If a calculation is done without analysis of the uncertainty involved, a result that is written with too many significant figures can be taken to imply a higher precision than is known, and a result that is written with too few significant figures results in an avoidable loss of precision. Understanding these rules requires a good understanding of the concept of significant and insignificant figures.

Methods of computing square roots are algorithms for approximating the non-negative square root of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.

Machine epsilon or machine precision is an upper bound on the relative approximation error due to rounding in floating point arithmetic. This value characterizes computer arithmetic in the field of numerical analysis, and by extension in the subject of computational science. The quantity is also called macheps and it has the symbols Greek epsilon .

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

In mathematics, Pythagorean addition is a binary operation on the real numbers that computes the length of the hypotenuse of a right triangle, given its two sides. According to the Pythagorean theorem, for a triangle with sides and , this length can be calculated as

In numerical analysis, catastrophic cancellation is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers.

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence. Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good while having much lower computational cost—it can be implemented so as to have nearly the same cost as naive summation.

As with other spreadsheets, Microsoft Excel works only to limited accuracy because it retains only a certain number of figures to describe numbers. With some exceptions regarding erroneous values, infinities, and denormalized numbers, Excel calculates in double-precision floating-point format from the IEEE 754 specification. Although Excel allows display of up to 30 decimal places, its precision for any specific number is no more than 15 significant figures, and calculations may have an accuracy that is even less due to five issues: round off, truncation, and binary storage, accumulation of the deviations of the operands in calculations, and worst: cancellation at subtractions resp. 'Catastrophic cancellation' at subtraction of values with similar magnitude.

<span class="mw-page-title-main">Bias–variance tradeoff</span> Property of a model

In statistics and machine learning, the bias–variance tradeoff describes the relationship between a model's complexity, the accuracy of its predictions, and how well it can make predictions on previously unseen data that were not used to train the model. In general, as we increase the number of tunable parameters in a model, it becomes more flexible, and can better fit a training data set. It is said to have lower error, or bias. However, for more flexible models, there will tend to be greater variance to the model fit each time we take a set of samples to create a new training data set. It is said that there is greater variance in the model's estimated parameters.

2Sum is a floating-point algorithm for computing the exact round-off error in a floating-point addition operation.

References

  1. Strictly, there exist other variants of compensated summation as well: see Higham, Nicholas (2002). Accuracy and Stability of Numerical Algorithms (2 ed). SIAM. pp. 110–123. ISBN   978-0-89871-521-7.
  2. 1 2 3 4 5 6 7 8 Higham, Nicholas J. (1993), "The accuracy of floating point summation", SIAM Journal on Scientific Computing , 14 (4): 783–799, Bibcode:1993SJSC...14..783H, CiteSeerX   10.1.1.43.3535 , doi:10.1137/0914050, S2CID   14071038 .
  3. 1 2 Kahan, William (January 1965), "Further remarks on reducing truncation errors" (PDF), Communications of the ACM , 8 (1): 40, doi:10.1145/363707.363723, S2CID   22584810, archived from the original (PDF) on 9 February 2018.
  4. Babuska, I.: Numerical stability in mathematical analysis. Inf. Proc. ˇ 68, 11–23 (1969)
  5. Bresenham, Jack E. (January 1965). "Algorithm for computer control of a digital plotter" (PDF). IBM Systems Journal. 4 (1): 25–30. doi:10.1147/sj.41.0025. S2CID   41898371.
  6. Inose, H.; Yasuda, Y.; Murakami, J. (September 1962). "A Telemetering System by Code Manipulation – ΔΣ Modulation". IRE Transactions on Space Electronics and Telemetry. SET-8: 204–209. doi:10.1109/IRET-SET.1962.5008839. S2CID   51647729.
  7. Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018) [2010]. Handbook of Floating-Point Arithmetic (2 ed.). Birkhäuser. p. 179. doi:10.1007/978-3-319-76526-6. ISBN   978-3-319-76525-9. LCCN   2018935254.
  8. Trefethen, Lloyd N.; Bau, David (1997). Numerical Linear Algebra. Philadelphia: SIAM. ISBN   978-0-89871-361-9.
  9. 1 2 Manfred Tasche and Hansmartin Zeuner, Handbook of Analytic-Computational Methods in Applied Mathematics, Boca Raton, FL: CRC Press, 2000.
  10. Neumaier, A. (1974). "Rundungsfehleranalyse einiger Verfahren zur Summation endlicher Summen" [Rounding Error Analysis of Some Methods for Summing Finite Sums](PDF). Zeitschrift für Angewandte Mathematik und Mechanik (in German). 54 (1): 39–51. Bibcode:1974ZaMM...54...39N. doi:10.1002/zamm.19740540106.
  11. 1 2 Hettinger, R. "Improve accuracy of builtin sum() for float inputs · Issue #100425 · python/cpython". GitHub - CPython v3.12 Added Features. Retrieved 7 October 2023.
  12. A., Klein (2006). "A generalized Kahan–Babuška-Summation-Algorithm". Computing. Springer-Verlag. 76 (3–4): 279–293. doi:10.1007/s00607-005-0139-x. S2CID   4561254.
  13. Johnson, S.G.; Frigo, M. C. Sidney Burns (ed.). "Fast Fourier Transforms: Implementing FFTs in Practice". Archived from the original on Dec 20, 2008.
  14. Richard Shewchuk, Jonathan (October 1997). "Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates" (PDF). Discrete & Computational Geometry. 18 (3): 305–363. doi:10.1007/PL00009321. S2CID   189937041.
  15. Kirchner, R.; Kulisch, U. (June 1988). "Accurate arithmetic for vector processors". Journal of Parallel and Distributed Computing. 5 (3): 250–270. doi:10.1016/0743-7315(88)90020-2.
  16. Muller, M.; Rub, C.; Rulling, W. (1991). Exact accumulation of floating-point numbers. Proceedings 10th IEEE Symposium on Computer Arithmetic. pp. 64–69. doi:10.1109/ARITH.1991.145535.
  17. Goldberg, David (March 1991), "What every computer scientist should know about floating-point arithmetic" (PDF), ACM Computing Surveys , 23 (1): 5–48, doi:10.1145/103162.103163, S2CID   222008826 .
  18. GNU Compiler Collection manual, version 4.4.3: 3.10 Options That Control Optimization, -fassociative-math (Jan. 21, 2010).
  19. Compaq Fortran User Manual for Tru64 UNIX and Linux Alpha Systems Archived 2011-06-07 at the Wayback Machine , section 5.9.7 Arithmetic Reordering Optimizations (retrieved March 2010).
  20. Börje Lindh, Application Performance Optimization, Sun BluePrints OnLine (March 2002).
  21. Eric Fleegal, "Microsoft Visual C++ Floating-Point Optimization", Microsoft Visual Studio Technical Articles (June 2004).
  22. Martyn J. Corden, "Consistency of floating-point results using the Intel compiler", Intel technical report (Sep. 18, 2009).
  23. MacDonald, Tom (1991). "C for Numerical Computing". Journal of Supercomputing. 5 (1): 31–48. doi:10.1007/BF00155856. S2CID   27876900.
  24. BLAS Technical Forum, section 2.7 (August 21, 2001), Archived on Wayback Machine.
  25. What's New in Python 3.12.
  26. RFC: use pairwise summation for sum, cumsum, and cumprod, github.com/JuliaLang/julia pull request #4039 (August 2013).
  27. KahanSummation library in Julia.
  28. HPCsharp nuget package of high performance algorithms.