Catastrophic cancellation

Last updated May 19, 2024

In numerical analysis, catastrophic cancellation^[1]^[2] is the phenomenon that subtracting good approximations to two nearby numbers may yield a very bad approximation to the difference of the original numbers.

For example, if there are two studs, one $L_{1}=253.51\,{\text{cm}}$ long and the other $L_{2}=252.49\,{\text{cm}}$ long, and they are measured with a ruler that is good only to the centimeter, then the approximations could come out to be ${\tilde {L}}_{1}=254\,{\text{cm}}$ and ${\tilde {L}}_{2}=252\,{\text{cm}}$ . These may be good approximations, in relative error, to the true lengths: the approximations are in error by less than 2% of the true lengths, $|L_{1}-{\tilde {L}}_{1}|/|L_{1}|<2\%$ .

However, if the approximate lengths are subtracted, the difference will be ${\tilde {L}}_{1}-{\tilde {L}}_{2}=254\,{\text{cm}}-252\,{\text{cm}}=2\,{\text{cm}}$ , even though the true difference between the lengths is $L_{1}-L_{2}=253.51\,{\text{cm}}-252.49\,{\text{cm}}=1.02\,{\text{cm}}$ . The difference of the approximations, $2\,{\text{cm}}$ , is in error by almost 100% of the magnitude of the difference of the true values, $1.02\,{\text{cm}}$ .

Catastrophic cancellation is not affected by how large the inputs are—it applies just as much to large and small inputs. It depends only on how large the difference is, and on the error of the inputs. Exactly the same error would arise by subtracting $52\,{\text{cm}}$ from $54\,{\text{cm}}$ as approximations to $52.49\,{\text{cm}}$ and $53.51\,{\text{cm}}$ , or by subtracting $2.00052\,{\text{km}}$ from $2.00054\,{\text{km}}$ as approximations to $2.0005249\,{\text{km}}$ and $2.0005351\,{\text{km}}$ .

Catastrophic cancellation may happen even if the difference is computed exactly, as in the example above—it is not a property of any particular kind of arithmetic like floating-point arithmetic; rather, it is inherent to subtraction, when the inputs are approximations themselves. Indeed, in floating-point arithmetic, when the inputs are close enough, the floating-point difference is computed exactly, by the Sterbenz lemma —there is no rounding error introduced by the floating-point subtraction operation.

Formal analysis

Formally, catastrophic cancellation happens because subtraction is ill-conditioned at nearby inputs: even if approximations ${\tilde {x}}=x(1+\delta _{x})$ and ${\tilde {y}}=y(1+\delta _{y})$ have small relative errors $|\delta _{x}|=|x-{\tilde {x}}|/|x|$ and $|\delta _{y}|=|y-{\tilde {y}}|/|y|$ from true values $x$ and $y$ , respectively, the relative error of the difference ${\tilde {x}}-{\tilde {y}}$ of the approximations from the difference $x-y$ of the true values is inversely proportional to the difference of the true values:

{\begin{aligned}{\tilde {x}}-{\tilde {y}}&=x(1+\delta _{x})-y(1+\delta _{y})=x-y+x\delta _{x}-y\delta _{y}\\&=x-y+(x-y){\frac {x\delta _{x}-y\delta _{y}}{x-y}}\\&=(x-y){\biggr (}1+{\frac {x\delta _{x}-y\delta _{y}}{x-y}}{\biggr )}.\end{aligned}}

Thus, the relative error of the exact difference ${\tilde {x}}-{\tilde {y}}$ of the approximations from the difference $x-y$ of the true values is

\left|{\frac {x\delta _{x}-y\delta _{y}}{x-y}}\right|.

which can be arbitrarily large if the true values $x$ and $y$ are close.

In numerical algorithms

Subtracting nearby numbers in floating-point arithmetic does not always cause catastrophic cancellation, or even any error—by the Sterbenz lemma, if the numbers are close enough the floating-point difference is exact. But cancellation may amplify errors in the inputs that arose from rounding in other floating-point arithmetic.

Example: Difference of squares

Given numbers $x$ and $y$ , the naive attempt to compute the mathematical function $x^{2}-y^{2}$ by the floating-point arithmetic $\operatorname {fl} (\operatorname {fl} (x^{2})-\operatorname {fl} (y^{2}))$ is subject to catastrophic cancellation when $x$ and $y$ are close in magnitude, because the subtraction can expose the rounding errors in the squaring. The alternative factoring $(x+y)(x-y)$ , evaluated by the floating-point arithmetic $\operatorname {fl} (\operatorname {fl} (x+y)\cdot \operatorname {fl} (x-y))$ , avoids catastrophic cancellation because it avoids introducing rounding error leading into the subtraction.^[2]

For example, if $x=1+2^{-29}\approx 1.0000000018626451$ and $y=1+2^{-30}\approx 1.0000000009313226$ , then the true value of the difference $x^{2}-y^{2}$ is $2^{-29}\cdot (1+2^{-30}+2^{-31})\approx 1.8626451518330422\times 10^{-9}$ . In IEEE 754 binary64 arithmetic, evaluating the alternative factoring $(x+y)(x-y)$ gives the correct result exactly (with no rounding), but evaluating the naive expression $x^{2}-y^{2}$ gives the floating-point number $2^{-29}=1.8626451{\underline {4923095703125}}\times 10^{-9}$ , of which less than half the digits are correct and the other (underlined) digits reflect the missing terms $2^{-59}+2^{-60}$ , lost due to rounding when calculating the intermediate squared values.

Example: Complex arcsine

When computing the complex arcsine function, one may be tempted to use the logarithmic formula directly:

\arcsin(z)=i\log {\bigl (}{\sqrt {1-z^{2}}}-iz{\bigr )}.

However, suppose $z=iy$ for $y\ll 0$ . Then ${\sqrt {1-z^{2}}}\approx -y$ and $iz=-y$ ; call the difference between them $\varepsilon$ —a very small difference, nearly zero. If ${\sqrt {1-z^{2}}}$ is evaluated in floating-point arithmetic giving

\operatorname {fl} {\Bigl (}{\sqrt {\operatorname {fl} (1-\operatorname {fl} (z^{2}))}}{\Bigr )}={\sqrt {1-z^{2}}}(1+\delta )

with any error $\delta \neq 0$ , where $\operatorname {fl} (\cdots )$ denotes floating-point rounding, then computing the difference

{\sqrt {1-z^{2}}}(1+\delta )-iz

of two nearby numbers, both very close to $-y$ , may amplify the error $\delta$ in one input by a factor of $1/\varepsilon$ —a very large factor because $\varepsilon$ was nearly zero. For instance, if $z=-1234567i$ , the true value of $\arcsin(z)$ is approximately $-14.71937803983977i$ , but using the naive logarithmic formula in IEEE 754 binary64 arithmetic may give $-14.719{\underline {644263563968}}i$ , with only five out of sixteen digits correct and the remainder (underlined) all incorrect.

In the case of $z=iy$ for $y<0$ , using the identity $\arcsin(z)=-\arcsin(-z)$ avoids cancellation because ${\textstyle {\sqrt {1-(-z)^{2}}}={\sqrt {1-z^{2}}}\approx -y}$ but $i(-z)=-iz=y$ , so the subtraction is effectively addition with the same sign which does not cancel.

Example: Radix conversion

Numerical constants in software programs are often written in decimal, such as in the C fragment doublex=1.000000000000001; to declare and initialize an IEEE 754 binary64 variable named x. However, $1.000000000000001$ is not a binary64 floating-point number; the nearest one, which x will be initialized to in this fragment, is $1.0000000000000011102230246251565404236316680908203125=1+5\cdot 2^{-52}$ . Although the radix conversion from decimal floating-point to binary floating-point only incurs a small relative error, catastrophic cancellation may amplify it into a much larger one:

doublex=1.000000000000001;// rounded to 1 + 5*2^{-52}doubley=1.000000000000002;// rounded to 1 + 9*2^{-52}doublez=y-x;// difference is exactly 4*2^{-52}

The difference $1.000000000000002-1.000000000000001$ is $0.000000000000001=1.0\times 10^{-15}$ . The relative errors of x from $1.000000000000001$ and of y from $1.000000000000002$ are both below $10^{-15}=0.0000000000001\%$ , and the floating-point subtraction y - x is computed exactly by the Sterbenz lemma.

But even though the inputs are good approximations, and even though the subtraction is computed exactly, the difference of the approximations ${\tilde {y}}-{\tilde {x}}=(1+9\cdot 2^{-52})-(1+5\cdot 2^{-52})=4\cdot 2^{-52}\approx 8.88\times 10^{-16}$ has a relative error of over $11\%$ from the difference $1.0\times 10^{-15}$ of the original values as written in decimal: catastrophic cancellation amplified a tiny error in radix conversion into a large error in the output.

Benign cancellation

Cancellation is sometimes useful and desirable in numerical algorithms. For example, the 2Sum and Fast2Sum algorithms both rely on such cancellation after a rounding error in order to exactly compute what the error was in a floating-point addition operation as a floating-point number itself.

The function $\log(1+x)$ , if evaluated naively at points $0<x\lll 1$ , will lose most of the digits of $x$ in rounding $\operatorname {fl} (1+x)$ . However, the function $\log(1+x)$ itself is well-conditioned at inputs near $0$ . Rewriting it as

\log(1+x)=x{\frac {\log(1+x)}{(1+x)-1}}

exploits cancellation in ${\hat {x}}:=\operatorname {fl} (1+x)-1$ to avoid the error from $\log(1+x)$ evaluated directly.^[2] This works because the cancellation in the numerator $\log(\operatorname {fl} (1+x))={\hat {x}}+O({\hat {x}}^{2})$ and the cancellation in the denominator ${\hat {x}}=\operatorname {fl} (1+x)-1$ counteract each other; the function $\mu (\xi )=\log(1+\xi )/\xi$ is well-enough conditioned near zero that $\mu ({\hat {x}})$ gives a good approximation to $\mu (x)$ , and thus $x\cdot \mu ({\hat {x}})$ gives a good approximation to $x\cdot \mu (x)=\log(1+x)$ .

Related Research Articles

In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input. Very frequently, one is solving the inverse problem: given $one is solving for x, and thus the condition number of the (local) inverse must be used.$

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

In numerical analysis, polynomial interpolation is the interpolation of a given bivariate data set by the polynomial of lowest possible degree that passes through the points of the dataset.

In thermodynamics, the Helmholtz free energy is a thermodynamic potential that measures the useful work obtainable from a closed thermodynamic system at a constant temperature (isothermal). The change in the Helmholtz energy during a process is equal to the maximum amount of work that the system can perform in a thermodynamic process in which temperature is held constant. At constant temperature, the Helmholtz free energy is minimized at equilibrium.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

The great-circle distance, orthodromic distance, or spherical distance is the distance along a great circle.

In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error. When using approximation equations or algorithms, especially when using finitely many digits to represent real numbers, one of the goals of numerical analysis is to estimate computation errors. Computation errors, also called numerical errors, include both truncation errors and roundoff errors.

In control theory and signal processing, a linear, time-invariant system is said to be minimum-phase if the system and its inverse are causal and stable.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

In mathematics, the lemniscate elliptic functions are elliptic functions related to the arc length of the lemniscate of Bernoulli. They were first studied by Giulio Fagnano in 1718 and later by Leonhard Euler and Carl Friedrich Gauss, among others.

In any quantitative science, the terms relative change and relative difference are used to compare two quantities while taking into account the "sizes" of the things being compared, i.e. dividing by a standard or reference or starting value. The comparison is expressed as a ratio and is a unitless number. By multiplying these ratios by 100 they can be expressed as percentages so the terms percentage change, percent(age) difference, or relative percentage difference are also commonly used. The terms "change" and "difference" are used interchangeably.

The differentiation of trigonometric functions is the mathematical process of finding the derivative of a trigonometric function, or its rate of change with respect to a variable. For example, the derivative of the sine function is written sin′(a) = cos(a), meaning that the rate of change of sin(x) at a particular angle x = a is given by the cosine of that angle.

In computational learning theory, Rademacher complexity, named after Hans Rademacher, measures richness of a class of sets with respect to a probability distribution. The concept can also be extended to real valued functions.

In mathematics, Pythagorean addition is a binary operation on the real numbers that computes the length of the hypotenuse of a right triangle, given its two sides. According to the Pythagorean theorem, for a triangle with sides $and, this length can be calculated as$

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

In statistics, Whittle likelihood is an approximation to the likelihood function of a stationary Gaussian time series. It is named after the mathematician and statistician Peter Whittle, who introduced it in his PhD thesis in 1951. It is commonly used in time series analysis and signal processing for parameter estimation and signal detection.

In quantum information and computation, the Solovay–Kitaev theorem says that if a set of single-qubit quantum gates generates a dense subgroup of SU(2), then that set can be used to approximate any desired quantum gate with a short sequence of gates that can also be found efficiently. This theorem is considered one of the most significant results in the field of quantum computation and was first announced by Robert M. Solovay in 1995 and independently proven by Alexei Kitaev in 1997. Michael Nielsen and Christopher M. Dawson have noted its importance in the field.

In floating-point arithmetic, the Sterbenz lemma or Sterbenz's lemma is a theorem giving conditions under which floating-point differences are computed exactly. It is named after Pat H. Sterbenz, who published a variant of it in 1974.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

↑ Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Gewerbestrasse 11, 6330 Cham, Switzerland: Birkhäuser. p. 102. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.{{cite book}}: CS1 maint: location (link)
1 2 3 Goldberg, David (March 1991). "What every computer scientist should know about floating-point arithmetic". ACM Computing Surveys. 23 (1). New York, NY, United States: Association for Computing Machinery: 5–48. doi:10.1145/103162.103163. ISSN 0360-0300. S2CID 222008826 . Retrieved 2020-09-17.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[handbook-1] Muller, Jean-Michel; Brunie, Nicolas; de Dinechin, Florent; Jeannerod, Claude-Pierre; Joldes, Mioara; Lefèvre, Vincent; Melquiond, Guillaume; Revol, Nathalie; Torres, Serge (2018). Handbook of Floating-Point Arithmetic (2nd ed.). Gewerbestrasse 11, 6330 Cham, Switzerland: Birkhäuser. p. 102. doi:10.1007/978-3-319-76526-6. ISBN 978-3-319-76525-9.{{cite book}}: CS1 maint: location (link)

[goldberg-2] 1 2 3 Goldberg, David (March 1991). "What every computer scientist should know about floating-point arithmetic". ACM Computing Surveys. 23 (1). New York, NY, United States: Association for Computing Machinery: 5–48. doi:10.1145/103162.103163. ISSN 0360-0300. S2CID 222008826 . Retrieved 2020-09-17.

[1]

[2]