Pairwise summation

Last updated

In numerical analysis, pairwise summation, also called cascade summation, is a technique to sum a sequence of finite-precision floating-point numbers that substantially reduces the accumulated round-off error compared to naively accumulating the sum in sequence. [1] Although there are other techniques such as Kahan summation that typically have even smaller round-off errors, pairwise summation is nearly as good (differing only by a logarithmic factor) while having much lower computational costit can be implemented so as to have nearly the same cost (and exactly the same number of arithmetic operations) as naive summation.

Contents

In particular, pairwise summation of a sequence of n numbers xn works by recursively breaking the sequence into two halves, summing each half, and adding the two sums: a divide and conquer algorithm. Its worst-case roundoff errors grow asymptotically as at most O log n), where ε is the machine precision (assuming a fixed condition number, as discussed below). [1] In comparison, the naive technique of accumulating the sum in sequence (adding each xi one at a time for i = 1, ..., n) has roundoff errors that grow at worst as On). [1] Kahan summation has a worst-case error of roughly O(ε), independent of n, but requires several times more arithmetic operations. [1] If the roundoff errors are random, and in particular have random signs, then they form a random walk and the error growth is reduced to an average of for pairwise summation. [2]

A very similar recursive structure of summation is found in many fast Fourier transform (FFT) algorithms, and is responsible for the same slow roundoff accumulation of those FFTs. [2] [3]

The algorithm

In pseudocode, the pairwise summation algorithm for an array x of length n ≥ 0 can be written:

s = pairwise(x[1...n])       ifnNbase case: naive summation for a sufficiently small arrays = 0           fori = 1 to ns = s + x[i]       elsedivide and conquer: recursively sum two halves of the arraym = floor(n / 2)           s = pairwise(x[1...m]) + pairwise(x[m+1...n])       end if

For some sufficiently small N, this algorithm switches to a naive loop-based summation as a base case, whose error bound is O(Nε). [4] The entire sum has a worst-case error that grows asymptotically as O log n) for large n, for a given condition number (see below).

In an algorithm of this sort (as for divide and conquer algorithms in general [5] ), it is desirable to use a larger base case in order to amortize the overhead of the recursion. If N = 1, then there is roughly one recursive subroutine call for every input, but more generally there is one recursive call for (roughly) every N/2 inputs if the recursion stops at exactly n = N. By making N sufficiently large, the overhead of recursion can be made negligible (precisely this technique of a large base case for recursive summation is employed by high-performance FFT implementations [3] ).

Regardless of N, exactly n1 additions are performed in total, the same as for naive summation, so if the recursion overhead is made negligible then pairwise summation has essentially the same computational cost as for naive summation.

A variation on this idea is to break the sum into b blocks at each recursive stage, summing each block recursively, and then summing the results, which was dubbed a "superblock" algorithm by its proposers. [6] The above pairwise algorithm corresponds to b = 2 for every stage except for the last stage which is b = N.

Accuracy

Suppose that one is summing n values xi, for i = 1, ..., n. The exact sum is:

(computed with infinite precision).

With pairwise summation for a base case N = 1, one instead obtains , where the error is bounded above by: [1]

where ε is the machine precision of the arithmetic being employed (e.g. ε  1016 for standard double precision floating point). Usually, the quantity of interest is the relative error , which is therefore bounded above by:

In the expression for the relative error bound, the fraction (Σ|xi|/|Σxi|) is the condition number of the summation problem. Essentially, the condition number represents the intrinsic sensitivity of the summation problem to errors, regardless of how it is computed. [7] The relative error bound of every (backwards stable) summation method by a fixed algorithm in fixed precision (i.e. not those that use arbitrary-precision arithmetic, nor algorithms whose memory and time requirements change based on the data), is proportional to this condition number. [1] An ill-conditioned summation problem is one in which this ratio is large, and in this case even pairwise summation can have a large relative error. For example, if the summands xi are uncorrelated random numbers with zero mean, the sum is a random walk and the condition number will grow proportional to . On the other hand, for random inputs with nonzero mean the condition number asymptotes to a finite constant as . If the inputs are all non-negative, then the condition number is 1.

Note that the denominator is effectively 1 in practice, since is much smaller than 1 until n becomes of order 21/ε, which is roughly 101015 in double precision.

In comparison, the relative error bound for naive summation (simply adding the numbers in sequence, rounding at each step) grows as multiplied by the condition number. [1] In practice, it is much more likely that the rounding errors have a random sign, with zero mean, so that they form a random walk; in this case, naive summation has a root mean square relative error that grows as and pairwise summation has an error that grows as on average. [2]

Software implementations

Pairwise summation is the default summation algorithm in NumPy [8] and the Julia technical-computing language, [9] where in both cases it was found to have comparable speed to naive summation (thanks to the use of a large base case).

Other software implementations include the HPCsharp library [10] for the C# language and the standard library summation [11] in D.

Related Research Articles

<span class="mw-page-title-main">Fast Fourier transform</span> O(N log N) discrete Fourier transform algorithm

A fast Fourier transform (FFT) is an algorithm that computes the Discrete Fourier Transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse factors. As a result, it manages to reduce the complexity of computing the DFT from , which arises if one simply applies the definition of DFT, to , where n is the data size. The difference in speed can be enormous, especially for long data sets where n may be in the thousands or millions. In the presence of round-off error, many FFT algorithms are much more accurate than evaluating the DFT definition directly or indirectly. There are many different FFT algorithms based on a wide range of published theories, from simple complex-number arithmetic to group theory and number theory.

In computer science, divide and conquer is an algorithm design paradigm. A divide-and-conquer algorithm recursively breaks down a problem into two or more sub-problems of the same or related type, until these become simple enough to be solved directly. The solutions to the sub-problems are then combined to give a solution to the original problem.

Levinson recursion or Levinson–Durbin recursion is a procedure in linear algebra to recursively calculate the solution to an equation involving a Toeplitz matrix. The algorithm runs in Θ(n2) time, which is a strong improvement over Gauss–Jordan elimination, which runs in Θ(n3).

Bruun's algorithm is a fast Fourier transform (FFT) algorithm based on an unusual recursive polynomial-factorization approach, proposed for powers of two by G. Bruun in 1978 and generalized to arbitrary even composite sizes by H. Murakami in 1996. Because its operations involve only real coefficients until the last computation stage, it was initially proposed as a way to efficiently compute the discrete Fourier transform (DFT) of real data. Bruun's algorithm has not seen widespread use, however, as approaches based on the ordinary Cooley–Tukey FFT algorithm have been successfully adapted to real data with at least as much efficiency. Furthermore, there is evidence that Bruun's algorithm may be intrinsically less accurate than Cooley–Tukey in the face of finite numerical precision.

In numerical analysis, the Kahan summation algorithm, also known as compensated summation, significantly reduces the numerical error in the total obtained by adding a sequence of finite-precision floating-point numbers, compared to the obvious approach. This is done by keeping a separate running compensation, in effect extending the precision of the sum by the precision of the compensation variable.

In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error. When using approximation equations or algorithms, especially when using finitely many digits to represent real numbers, one of the goals of numerical analysis is to estimate computation errors. Computation errors, also called numerical errors, include both truncation errors and roundoff errors.

In computer science, a soft heap is a variant on the simple heap data structure that has constant amortized time complexity for 5 types of operations. This is achieved by carefully "corrupting" (increasing) the keys of at most a constant number of values in the heap.

In computer science, a selection algorithm is an algorithm for finding the th smallest value in a collection of ordered values, such as numbers. The value that it finds is called the th order statistic. Selection includes as special cases the problems of finding the minimum, median, and maximum element in the collection. Selection algorithms include quickselect, and the median of medians algorithm. When applied to a collection of values, these algorithms take linear time, as expressed using big O notation. For data that is already structured, faster algorithms may be possible; as an extreme case, selection in an already-sorted array takes time .

In information theory, Shannon's source coding theorem establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Methods of computing square roots are algorithms for approximating the non-negative square root of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

The split-radix FFT is a fast Fourier transform (FFT) algorithm for computing the discrete Fourier transform (DFT), and was first described in an initially little-appreciated paper by R. Yavne (1968) and subsequently rediscovered simultaneously by various authors in 1984. In particular, split radix is a variant of the Cooley–Tukey FFT algorithm that uses a blend of radices 2 and 4: it recursively expresses a DFT of length N in terms of one smaller DFT of length N/2 and two smaller DFTs of length N/4.

Adaptive Simpson's method, also called adaptive Simpson's rule, is a method of numerical integration proposed by G.F. Kuncir in 1962. It is probably the first recursive adaptive algorithm for numerical integration to appear in print, although more modern adaptive methods based on Gauss–Kronrod quadrature and Clenshaw–Curtis quadrature are now generally preferred. Adaptive Simpson's method uses an estimate of the error we get from calculating a definite integral using Simpson's rule. If the error exceeds a user-specified tolerance, the algorithm calls for subdividing the interval of integration in two and applying adaptive Simpson's method to each subinterval in a recursive manner. The technique is usually much more efficient than composite Simpson's rule since it uses fewer function evaluations in places where the function is well-approximated by a cubic function.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In mathematics, the Johnson–Lindenstrauss lemma is a result named after William B. Johnson and Joram Lindenstrauss concerning low-distortion embeddings of points from high-dimensional into low-dimensional Euclidean space. The lemma states that a set of points in a high-dimensional space can be embedded into a space of much lower dimension in such a way that distances between the points are nearly preserved. In the classical proof of the lemma, the embedding is a random orthogonal projection.

In computer science, streaming algorithms are algorithms for processing data streams in which the input is presented as a sequence of items and can be examined in only a few passes, typically just one. These algorithms are designed to operate with limited memory, generally logarithmic in the size of the stream and/or in the maximum value in the stream, and may also have limited processing time per item.

The randomized weighted majority algorithm is an algorithm in machine learning theory for aggregating expert predictions to a series of decision problems. It is a simple and effective method based on weighted voting which improves on the mistake bound of the deterministic weighted majority algorithm. In fact, in the limit, its prediction rate can be arbitrarily close to that of the best-predicting expert.

The Harrow–Hassidim–Lloyd algorithm or HHL algorithm is a quantum algorithm for numerically solving a system of linear equations, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.

In mathematics, the walk-on-spheres method (WoS) is a numerical probabilistic algorithm, or Monte-Carlo method, used mainly in order to approximate the solutions of some specific boundary value problem for partial differential equations (PDEs). The WoS method was first introduced by Mervin E. Muller in 1956 to solve Laplace's equation, and was since then generalized to other problems.

In quantum information and computation, the Solovay–Kitaev theorem says that if a set of single-qubit quantum gates generates a dense subgroup of SU(2), then that set can be used to approximate any desired quantum gate with a short sequence of gates that can also be found efficiently. This theorem is considered one of the most significant results in the field of quantum computation and was first announced by Robert M. Solovay in 1995 and independently proven by Alexei Kitaev in 1997. Michael Nielsen and Christopher M. Dawson have noted its importance in the field.

References

  1. 1 2 3 4 5 6 7 Higham, Nicholas J. (1993), "The accuracy of floating point summation", SIAM Journal on Scientific Computing , 14 (4): 783–799, Bibcode:1993SJSC...14..783H, CiteSeerX   10.1.1.43.3535 , doi:10.1137/0914050
  2. 1 2 3 Manfred Tasche and Hansmartin Zeuner Handbook of Analytic-Computational Methods in Applied Mathematics Boca Raton, FL: CRC Press, 2000).
  3. 1 2 S. G. Johnson and M. Frigo, "Implementing FFTs in practice, in Fast Fourier Transforms , edited by C. Sidney Burrus (2008).
  4. Higham, Nicholas (2002). Accuracy and Stability of Numerical Algorithms (2 ed). SIAM. pp. 81–82.
  5. Radu Rugina and Martin Rinard, "Recursion unrolling for divide and conquer programs," in Languages and Compilers for Parallel Computing, chapter 3, pp. 34–48. Lecture Notes in Computer Science vol. 2017 (Berlin: Springer, 2001).
  6. Anthony M. Castaldo, R. Clint Whaley, and Anthony T. Chronopoulos, "Reducing floating-point error in dot product using the superblock family of algorithms," SIAM J. Sci. Comput., vol. 32, pp. 1156–1174 (2008).
  7. L. N. Trefethen and D. Bau, Numerical Linear Algebra (SIAM: Philadelphia, 1997).
  8. ENH: implement pairwise summation, github.com/numpy/numpy pull request #3685 (September 2013).
  9. RFC: use pairwise summation for sum, cumsum, and cumprod, github.com/JuliaLang/julia pull request #4039 (August 2013).
  10. https://github.com/DragonSpit/HPCsharp HPCsharp nuget package of high performance C# algorithms
  11. "std.algorithm.iteration - D Programming Language". dlang.org. Retrieved 2021-04-23.