Numeric precision in Microsoft Excel

Last updated

As with other spreadsheets, Microsoft Excel works only to limited accuracy because it retains only a certain number of figures to describe numbers (it has limited precision). With some exceptions regarding erroneous values, infinities, and denormalized numbers, Excel calculates in double-precision floating-point format from the IEEE 754 specification [1] (besides numbers, Excel uses a few other data types [2] ). Although Excel allows display of up to 30 decimal places, its precision for any specific number is no more than 15  significant figures, and calculations may have an accuracy that is even less due to five issues: round off, [lower-alpha 1] truncation, and binary storage, accumulation of the deviations of the operands in calculations, and worst: cancellation at subtractions resp. 'Catastrophic cancellation' at subtraction of values with similar magnitude.

Contents

Accuracy and binary storage

Excel maintains 15 figures in its numbers, but they are not always accurate; mathematically, the bottom line should be the same as the top line, in 'fp-math' the step '1 + 1/9000' leads to a rounding up as the first bit of the 14 bit tail '10111000110010' of the mantissa falling off the table when adding 1 is a '1', this up-rounding is not undone when subtracting the 1 again, since there is no information about the origin of the values in this step. Thus the 're-subtracting' of 1 leaves a mantissa ending in '100000000000000' instead of '010111000110010', representing a value of '1.1111111111117289E-4' rounded by Excel to 15 significant digits: '1.11111111111173E-4'. Excel fifteen figure.PNG
Excel maintains 15 figures in its numbers, but they are not always accurate; mathematically, the bottom line should be the same as the top line, in 'fp-math' the step '1 + 1/9000' leads to a rounding up as the first bit of the 14 bit tail '10111000110010' of the mantissa falling off the table when adding 1 is a '1', this up-rounding is not undone when subtracting the 1 again, since there is no information about the origin of the values in this step. Thus the 're-subtracting' of 1 leaves a mantissa ending in '100000000000000' instead of '010111000110010', representing a value of '1.1111111111117289E-4' rounded by Excel to 15 significant digits: '1.11111111111173E-4'.
Of course mathematical 1 + x - 1 = x , 'floating point math' is sometimes a little different, that is not to be blamed on Excel. The discrepancy indicates the error. All errors are beyond the 15 th significant digit of the intermediate 1+x value, all errors are in high value digits of the final result, that is the problematic effect of 'cancellation'. Excel errors.PNG
Of course mathematical 1 + x − 1 = x , 'floating point math' is sometimes a little different, that is not to be blamed on Excel. The discrepancy indicates the error. All errors are beyond the 15 th significant digit of the intermediate 1+x value, all errors are in high value digits of the final result, that is the problematic effect of 'cancellation'.

In the top figure the fraction 1/9000 in Excel is displayed. Although this number has a decimal representation that is an infinite string of ones, Excel displays only the leading 15 figures. In the second line, the number one is added to the fraction, and again Excel displays only 15 figures. In the third line, one is subtracted from the sum using Excel. Because the sum has only eleven 1s after the decimal, the true difference when ‘1’ is subtracted is three 0s followed by a string of eleven 1s. However, the difference reported by Excel is three 0s followed by a 15 digit string of thirteen 1s and two extra erroneous digits. Thus, the numbers Excel calculates with are not the numbers that it displays. Moreover, the error in Excel's answer is not simply round-off error, it is an effect in floating point calculations called 'cancellation'.

The inaccuracy in Excel calculations is more complicated than errors due to a precision of 15 significant figures. Excel's storage of numbers in binary format also affects its accuracy. [3] To illustrate, the lower figure tabulates the simple addition 1 + x − 1 for several values of x. All the values of x begin at the 15 th decimal, so Excel must take them into account. Before calculating the sum 1 + x , Excel first approximates x as a binary number. If this binary version of x is a simple power of 2, the 15 digit decimal approximation to x is stored in the sum, and the top two examples of the figure indicate recovery of x without error. In the third example, x is a more complicated binary number, x = 1.110111⋯111 × 2−49 (15 bits altogether). Here the 'IEEE 754 double value' resulting of the 15 bit figure is 3.330560653658221E-15, which is rounded by Excel for the 'user interface' to 15 digits 3.33056065365822E-15, and then displayed with 30 decimals digits gets one 'fake zero' added, thus the 'binary' and 'decimal' values in the sample are identical only in display, the values associated with the cells are different (1.1101111111111100000000000000000000000000000000000000 × 2−49 vs. 1.1101111111111011111111111111111111111111111111111101 × 2−49). Similar is done by other spreadsheets, the handling of the different amount of decimal digits which can be exactly stored in the 53 bit mantissa of a 'double' (e.g. 16 digits between 1 and 8, but only 15 between 1/2 and 1 and between 8 and 10) is somewhat difficult and solved 'suboptimal'. In the fourth example, x is a decimal number not equivalent to a simple binary (although it agrees with the binary of the third example to the precision displayed). The decimal input is approximated by a binary and then that decimal is used. These two middle examples in the figure show that some error is introduced.

The last two examples illustrate what happens if x is a rather small number. In the second from last example, x = 1.110111⋯111 × 2−50 ; 15 bits altogether. The binary is replaced very crudely by a single power of 2 (in this example, 2−49) and its decimal equivalent is used. In the bottom example, a decimal identical with the binary above to the precision shown, is nonetheless approximated differently from the binary, and is eliminated by truncation to 15 significant figures, making no contribution to 1 + x − 1 , leading to x = 0 . [lower-alpha 2]

For x′s that are not simple powers of 2, a noticeable error in 1 + x − 1 can occur even when x is quite large. For example, if x = 1/1000 , then 1 + x − 1 = 9.9999999999989 × 10−4 , an error in the 13 th significant figure. In this case, if Excel simply added and subtracted the decimal numbers, avoiding the conversion to binary and back again to decimal, no round-off error would occur and accuracy actually would be better. Excel has the option to "Set precision as displayed". [lower-alpha 3] With this option, depending upon circumstance, accuracy may turn out to be better or worse, but you will know exactly what Excel is doing. (Only the selected precision is retained, and one cannot recover extra digits by reversing this option.) Some similar examples can be found at this link. [4]

In short, a variety of accuracy behavior is introduced by the combination of representing a number with a limited number of binary digits, along with truncating numbers beyond the fifteenth significant figure. [5] Excel's treatment of numbers beyond 15 significant figures sometimes contributes better accuracy to the final few significant figures of a computation than working directly with only 15 significant figures, and sometimes not.

For the reasoning behind the conversion to binary representation and back to decimal, and for more detail about accuracy in Excel and VBA consult these links. [6]

1. The shortcomings in the = 1 + x - 1 tasks are a combination of 'fp-math weaknesses' and 'how Excel handles it', especially Excel's rounding. Excel does some rounding and / or 'snap to zero' for most of its results, in average chopping the last 3 bits of the IEEE double representation. This behavior can be switched of by setting the formula in parentheses: = ( 1 + 2^-52 - 1 ). You will see that even that small value survives. Smaller values will pass away as there are only 53 bits to represent the value, for this case 1.0000000000 0000000000 0000000000 0000000000 0000000000 01, the first representing the 1, and the last the 2^-52.

2. It is not only clean powers of two surviving, but any combination of values constructed of bits which will be within the 53 bits once the decimal 1 is added. As most decimal values do not have a clean finite representation in binary they will suffer from 'round off' and 'cancellation' in tasks like the above.

E.g. decimal 0.1 has the IEEE double representation 0 (1).1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010 × 2^(-4); when added to 140737488355328.0 (which is 2+47) it will lose all of its bits, except the first two. Thus from '= ( 140737488355328.0 + 0.1 - 140737488355328.0) it will come back as 0.09375 instead of 0.1 when calculated with www.weitz.de/ieee (64 bit) as well as in Excel with the parentheses around the formula. This effect mostly can be managed by meaningful rounding, which Excel does not apply: It is up to the user.

Needless to say, other spreadsheets have similar problems, LibreOffice Calc uses a more aggressive rounding, while gnumeric tries to keep precision and make as well the precision as the 'lack of' visible for the user.

Examples where precision is no indicator of accuracy

Statistical functions

Error in Excel 2007 calculation of standard deviation. All four columns have the same deviation of 0.5 Excel Std Dev Error.PNG
Error in Excel 2007 calculation of standard deviation. All four columns have the same deviation of 0.5

Accuracy in Excel-provided functions can be an issue. Altman et al. (2004) provide this example: [7] The population standard deviation given by:

is mathematically equivalent to:

However, the first form keeps better numerical accuracy for large values of x, because squares of differences between x and x leads to less round-off than the differences between the much larger numbers Σ(x2) and x)2 . The built-in Excel function STDEVP, however, uses the less accurate formulation because it is faster computationally. [5]

Both the "compatibility" function STDEVP and the "consistency" function STDEV.P in Excel 2010 return the 0.5 population standard deviation for the given set of values. However, numerical inaccuracy still can be shown using this example by extending the existing figure to include 1015, whereupon the erroneous standard deviation found by Excel 2010 will be zero.

Subtraction of Subtraction Results

Doing simple subtractions may lead to errors as two cells may display the same numeric value while storing two separate values. An example of this occurs in a sheet where the following cells are set to the following numeric values:

and the following cells contain the following formulas

Both cells and display . However, if cell contains the formula then does not display as would be expected, but displays instead.

The above is not limited to subtractions, try = 1 + 1.405*2^(-48) in one cell, Excel rounds the display to 1,00000000000000000000, and = 0.9 + 225179982494413×2^(-51) in another, same display [lower-alpha 4] above, different rounding for value and display, violates one of the elementary requirements in Goldberg (1991) [8] who states:

... 'it is important to make sure that its use is transparent to the user. For example, on a calculator, if the internal representation of a displayed value is not rounded to the same precision as the display, then the result of further operations will depend on the hidden digits and appear unpredictable to the user' ...

The problem is not limited to Excel, e.g. LibreOffice calc acts similarly.

Round-off error

User computations must be carefully organized to ensure round-off error does not become an issue. An example occurs in solving a quadratic equation:

The solutions (the roots) of this equation are exactly determined by the quadratic formula:

When one of these roots is very large compared to the other, that is, when the square root is close to the value b, the evaluation of the root corresponding to subtraction of the two terms becomes very inaccurate due to round-off (cancellation?).

It is possible to determine the round-off error by using the Taylor series formula for the square root: [9]

Consequently,

indicating that, as b becomes larger, the first surviving term, say ε:

becomes smaller and smaller. The numbers for b and the square root become nearly the same, and the difference becomes small:

Under these circumstances, all the significant figures go into expressing b. For example, if the precision is 15 figures, and these two numbers, b and the square root, are the same to 15 figures, the difference will be zero instead of the difference ε.

A better accuracy can be obtained from a different approach, outlined below. [lower-alpha 5] If we denote the two roots by r1 and r2, the quadratic equation can be written:

When the root r1 >> r2, the sum (r1 + r2) ≈ r1 and comparison of the two forms shows approximately:

while

Thus, we find the approximate form:

These results are not subject to round-off error, but they are not accurate unless b2 is large compared to ac.

Excel graph of the difference between two evaluations of the smallest root of a quadratic: direct evaluation using the quadratic formula (accurate at smaller b) and an approximation for widely spaced roots (accurate for larger b). The difference reaches a minimum at the large dots, and round-off causes squiggles in the curves beyond this minimum. Excel quadratic error.PNG
Excel graph of the difference between two evaluations of the smallest root of a quadratic: direct evaluation using the quadratic formula (accurate at smaller b) and an approximation for widely spaced roots (accurate for larger b). The difference reaches a minimum at the large dots, and round-off causes squiggles in the curves beyond this minimum.

The bottom line is that in doing this calculation using Excel, as the roots become farther apart in value, the method of calculation will have to switch from direct evaluation of the quadratic formula to some other method so as to limit round-off error. The point to switch methods varies according to the size of coefficients a and b.

In the figure, Excel is used to find the smallest root of the quadratic equation x2 + bx + c = 0 for c = 4 and c = 4 × 105. The difference between direct evaluation using the quadratic formula and the approximation described above for widely spaced roots is plotted vs.b. Initially the difference between the methods declines because the widely spaced root method becomes more accurate at larger b-values. However, beyond some b-value the difference increases because the quadratic formula (good for smaller b-values) becomes worse due to round-off, while the widely spaced root method (good for large b-values) continues to improve. The point to switch methods is indicated by large dots, and is larger for larger c-values. At large b-values, the upward sloping curve is Excel's round-off error in the quadratic formula, whose erratic behavior causes the curves to squiggle.

A different field where accuracy is an issue is the area of numerical computing of integrals and the solution of differential equations. Examples are Simpson's rule, the Runge–Kutta method, and the Numerov algorithm for the Schrödinger equation. [10] Using Visual Basic for Applications, any of these methods can be implemented in Excel. Numerical methods use a grid where functions are evaluated. The functions may be interpolated between grid points or extrapolated to locate adjacent grid points. These formulas involve comparisons of adjacent values. If the grid is spaced very finely, round-off error will occur, and the less the precision used, the worse the round-off error. If spaced widely, accuracy will suffer. If the numerical procedure is thought of as a feedback system, this calculation noise may be viewed as a signal that is applied to the system, which will lead to instability unless the system is carefully designed. [11]

Accuracy within VBA

Although Excel nominally works with 8-byte numbers by default, VBA has a variety of data types. The Double data type is 8 bytes, the Integer data type is 2 bytes, and the general purpose 16 byte Variant data type can be converted to a 12 byte Decimal data type using the VBA conversion function CDec. [12] Choice of variable types in a VBA calculation involves consideration of storage requirements, accuracy and speed.

Footnotes

  1. Round-off is the loss of accuracy when numbers that differ by small amounts are subtracted. Because each number has only fifteen significant digits, their difference is inaccurate when there aren't enough significant digits to express the difference.
  2. To input a number as binary, the number is submitted as a string of powers of 2: 2^(−50)*(2^0 + 2^−1 + ⋯). To input a number as decimal, the decimal number is typed in directly.
  3. This option is found on the "Excel options"
  4. Rounding is different in the range above 1 vs. below 1, which impacts on most decimal or binary magnitude changes.
  5. This approximate method is used often in the design of feedback amplifiers, where the two roots represent the response times of the system. See the article on step response.

Related Research Articles

<span class="mw-page-title-main">Floating-point arithmetic</span> Computer approximation for real numbers

In computing, floating-point arithmetic (FP) is arithmetic that represents subsets of real numbers using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. Numbers of this form are called floating-point numbers. For example, 12.345 is a floating-point number in base ten with five digits of precision:

<span class="mw-page-title-main">Logarithm</span> Inverse of the exponential function

In mathematics, the logarithm is the inverse function to exponentiation. That means that the logarithm of a number x to the base b is the exponent to which b must be raised to produce x. For example, since 1000 = 103, the logarithm base 10 of 1000 is 3, or log10 (1000) = 3. The logarithm of x to base b is denoted as logb (x), or without parentheses, logbx, or even without the explicit base, log x, when no confusion is possible, or when the base does not matter such as in big O notation.

In algebra, a quadratic equation is any equation that can be rearranged in standard form as

<span class="mw-page-title-main">Rounding</span> Replacing a number with a simpler value

Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression √2 with 1.414.

In mathematics and computer science, truncation is limiting the number of digits right of the decimal point.

In the mathematical subfield of numerical analysis, numerical stability is a generally desirable property of numerical algorithms. The precise definition of stability depends on the context. One is numerical linear algebra and the other is algorithms for solving ordinary and partial differential equations by discrete approximation.

In mathematics, a quadratic irrational number is an irrational number that is the solution to some quadratic equation with rational coefficients which is irreducible over the rational numbers. Since fractions in the coefficients of a quadratic equation can be cleared by multiplying both sides by their least common denominator, a quadratic irrational is an irrational root of some quadratic equation with integer coefficients. The quadratic irrational numbers, a subset of the complex numbers, are algebraic numbers of degree 2, and can therefore be expressed as

Significant figures, also referred to as significant digits or sig figs, are specific digits within a number written in positional notation that carry both reliability and necessity in conveying a particular quantity. When presenting the outcome of a measurement, if the number of digits exceeds what the measurement instrument can resolve, only the number of digits within the resolution's capability are dependable and therefore considered significant.

The IEEE Standard for Floating-Point Arithmetic is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE).

In computing, a roundoff error, also called rounding error, is the difference between the result produced by a given algorithm using exact arithmetic and the result produced by the same algorithm using finite-precision, rounded arithmetic. Rounding errors are due to inexactness in the representation of real numbers and the arithmetic operations done with them. This is a form of quantization error. When using approximation equations or algorithms, especially when using finitely many digits to represent real numbers, one of the goals of numerical analysis is to estimate computation errors. Computation errors, also called numerical errors, include both truncation errors and roundoff errors.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

In number theory, the integer square root (isqrt) of a non-negative integer n is the non-negative integer m which is the greatest integer less than or equal to the square root of n,

Significance arithmetic is a set of rules for approximating the propagation of uncertainty in scientific or statistical calculations. These rules can be used to find the appropriate number of significant figures to use to represent the result of a calculation. If a calculation is done without analysis of the uncertainty involved, a result that is written with too many significant figures can be taken to imply a higher precision than is known, and a result that is written with too few significant figures results in an avoidable loss of precision. Understanding these rules requires a good understanding of the concept of significant and insignificant figures.

Methods of computing square roots are algorithms for approximating the non-negative square root of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.

A division algorithm is an algorithm which, given two integers N and D, computes their quotient and/or remainder, the result of Euclidean division. Some are applied by hand, while others are employed by digital circuit designs and software.

Approximations of <span class="texhtml mvar" style="font-style:italic;">π</span> Varying methods used to calculate π

Approximations for the mathematical constant pi in the history of mathematics reached an accuracy within 0.04% of the true value before the beginning of the Common Era. In Chinese mathematics, this was improved to approximations correct to what corresponds to about seven decimal digits by the 5th century.

In numerical analysis, Aitken's delta-squared process or Aitken extrapolation is a series acceleration method, used for accelerating the rate of convergence of a sequence. It is named after Alexander Aitken, who introduced this method in 1926. Its early form was known to Seki Kōwa and was found for rectification of the circle, i.e. the calculation of π. It is most useful for accelerating the convergence of a sequence that is converging linearly.

<span class="mw-page-title-main">Real number</span> Number representing a continuous quantity

In mathematics, a real number is a number that can be used to measure a continuous one-dimensional quantity such as a distance, duration or temperature. Here, continuous means that pairs of values can have arbitrarily small differences. Every real number can be almost uniquely represented by an infinite decimal expansion.

Single-precision floating-point format is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.

<span class="mw-page-title-main">Square root of 6</span> Positive real number which when multiplied by itself gives 6

The square root of 6 is the positive real number that, when multiplied by itself, gives the natural number 6. It is more precisely called the principal square root of 6, to distinguish it from the negative number with the same property. This number appears in numerous geometric and number-theoretic contexts. It can be denoted in surd form as:

References

  1. "Floating-point arithmetic may give inaccurate results in Excel". Microsoft support. June 30, 2010. Revision 8.2; article ID: 78113. Retrieved 2010-07-02.
  2. Dalton, Steve (2007). "Table 2.3: Worksheet data types and limits". Financial Applications Using Excel Add-in Development in C/C++ (2nd ed.). Wiley. pp. 13–14. ISBN   978-0-470-02797-4.
  3. de Levie, Robert (2004). "Algorithmic accuracy". Advanced Excel for scientific data analysis. Oxford University Press. p. 44. ISBN   0-19-515275-1.
  4. "Excel addition strangeness". office-watch.com.
  5. 1 2 de Levie, Robert (2004). Advanced Excel for scientific data analysis. Oxford University Press. pp. 45–46. ISBN   0-19-515275-1.
  6. Altman, Micah; Gill, Jeff; McDonald, Michael (2004). "§2.1.1 Revealing example: Computing the coefficient standard deviation". Numerical Issues in Statistical Computing for the Social Scientist. Wiley-IEEE. p. 12. ISBN   0-471-23633-0.
  7. Goldberg, David (March 1991). "What every computer scientist should know about floating point". Computing Surveys (edited reprint). E19957-01 / 806-3568 via Sun Microsystems. — more or less 'the holy book' of fp-math
  8. Gradshteyn, I.S.; Ryzhik, I.M.; Geronimus, Yu.V.; Tseytlin, M.Yu.; Jeffrey, A. (2015) [October 2014]. "1.112. Power series". In Zwillinger, Daniel; Moll, Victor Hugo (eds.). Tables of Integrals, Series, and Products. Translated by Scripta Technica, Inc. (8 ed.). Academic Press, Inc. p. 25. ISBN   978-0-12-384933-5. LCCN   2014010276. ISBN   0-12-384933-0
  9. Blom, Anders (2002). Computer algorithms for solving the Schrödinger and Poisson equations (PDF) (Report). Department of Physics. Lund University.
  10. Hamming, R.W. (1986). "Chapter 21 – Indefinite integrals – feedback". Numerical Methods for Scientists and Engineers (2nd ed.). Courier Dover Publications. p. 357. ISBN   0-486-65241-6. — This book discusses round-off, truncation and stability extensively. For example, see chapter 21, page 357.
  11. Walkenbach, John (2010). "Defining data types". Excel 2010 Power Programming with VBA. Wiley. pp. 198 ff & Table 8-1. ISBN   978-0-470-47535-5.