Condition number

Last updated April 19, 2024

In numerical analysis, the condition number of a function measures how much the output value of the function can change for a small change in the input argument. This is used to measure how sensitive a function is to changes or errors in the input, and how much error in the output results from an error in the input. Very frequently, one is solving the inverse problem: given $f(x)=y,$ one is solving for x, and thus the condition number of the (local) inverse must be used.^[1]^[2]

The condition number is derived from the theory of propagation of uncertainty, and is formally defined as the value of the asymptotic worst-case relative change in output for a relative change in input. The "function" is the solution of a problem and the "arguments" are the data in the problem. The condition number is frequently applied to questions in linear algebra, in which case the derivative is straightforward but the error could be in many different directions, and is thus computed from the geometry of the matrix. More generally, condition numbers can be defined for non-linear functions in several variables.

A problem with a low condition number is said to be well-conditioned, while a problem with a high condition number is said to be ill-conditioned. In non-mathematical terms, an ill-conditioned problem is one where, for a small change in the inputs (the independent variables) there is a large change in the answer or dependent variable. This means that the correct solution/answer to the equation becomes hard to find. The condition number is a property of the problem. Paired with the problem are any number of algorithms that can be used to solve the problem, that is, to calculate the solution. Some algorithms have a property called backward stability ; in general, a backward stable algorithm can be expected to accurately solve well-conditioned problems. Numerical analysis textbooks give formulas for the condition numbers of problems and identify known backward stable algorithms.

As a rule of thumb, if the condition number $\kappa (A)=10^{k}$ , then you may lose up to $k$ digits of accuracy on top of what would be lost to the numerical method due to loss of precision from arithmetic methods.^[3] However, the condition number does not give the exact value of the maximum inaccuracy that may occur in the algorithm. It generally just bounds it with an estimate (whose computed value depends on the choice of the norm to measure the inaccuracy).

General definition in the context of error analysis

Given a problem $f$ and an algorithm ${\tilde {f}}$ with an input $x$ and output ${\tilde {f}}(x),$ the error is $\delta f(x):=f(x)-{\tilde {f}}(x),$ the absolute error is $\|\delta f(x)\|=\left\|f(x)-{\tilde {f}}(x)\right\|$ and the relative error is $\|\delta f(x)\|/\|f(x)\|=\left\|f(x)-{\tilde {f}}(x)\right\|/\|f(x)\|.$

In this context, the absolute condition number of a problem $f$ is^{[ clarification needed ]}

\lim _{\varepsilon \rightarrow 0^{+}}\,\sup _{\|\delta x\|\,\leq \,\varepsilon }{\frac {\|\delta f(x)\|}{\|\delta x\|}}

and the relative condition number is^{[ clarification needed ]}

\lim _{\varepsilon \rightarrow 0^{+}}\,\sup _{\|\delta x\|\,\leq \,\varepsilon }{\frac {\|\delta f(x)\|/\|f(x)\|}{\|\delta x\|/\|x\|}}.

Matrices

For example, the condition number associated with the linear equation Ax = b gives a bound on how inaccurate the solution x will be after approximation. Note that this is before the effects of round-off error are taken into account; conditioning is a property of the matrix, not the algorithm or floating-point accuracy of the computer used to solve the corresponding system. In particular, one should think of the condition number as being (very roughly) the rate at which the solution x will change with respect to a change in b. Thus, if the condition number is large, even a small error in b may cause a large error in x. On the other hand, if the condition number is small, then the error in x will not be much bigger than the error in b.

The condition number is defined more precisely to be the maximum ratio of the relative error in x to the relative error in b.

Let e be the error in b. Assuming that A is a nonsingular matrix, the error in the solution A⁻¹b is A⁻¹e. The ratio of the relative error in the solution to the relative error in b is

{\frac {\left\|A^{-1}e\right\|}{\left\|A^{-1}b\right\|}}/{\frac {\|e\|}{\|b\|}}={\frac {\left\|A^{-1}e\right\|}{\|e\|}}{\frac {\|b\|}{\left\|A^{-1}b\right\|}}.

The maximum value (for nonzero b and e) is then seen to be the product of the two operator norms as follows:

{\begin{aligned}\max _{e,b\neq 0}\left\{{\frac {\left\|A^{-1}e\right\|}{\|e\|}}{\frac {\|b\|}{\left\|A^{-1}b\right\|}}\right\}&=\max _{e\neq 0}\left\{{\frac {\left\|A^{-1}e\right\|}{\|e\|}}\right\}\,\max _{b\neq 0}\left\{{\frac {\|b\|}{\left\|A^{-1}b\right\|}}\right\}\\&=\max _{e\neq 0}\left\{{\frac {\left\|A^{-1}e\right\|}{\|e\|}}\right\}\,\max _{x\neq 0}\left\{{\frac {\|Ax\|}{\|x\|}}\right\}\\&=\left\|A^{-1}\right\|\,\|A\|.\end{aligned}}

The same definition is used for any consistent norm, i.e. one that satisfies

\kappa (A)=\left\|A^{-1}\right\|\,\left\|A\right\|\geq \left\|A^{-1}A\right\|=1.

When the condition number is exactly one (which can only happen if A is a scalar multiple of a linear isometry), then a solution algorithm can find (in principle, meaning if the algorithm introduces no errors of its own) an approximation of the solution whose precision is no worse than that of the data.

However, it does not mean that the algorithm will converge rapidly to this solution, just that it will not diverge arbitrarily because of inaccuracy on the source data (backward error), provided that the forward error introduced by the algorithm does not diverge as well because of accumulating intermediate rounding errors.^{[ clarification needed ]}

The condition number may also be infinite, but this implies that the problem is ill-posed (does not possess a unique, well-defined solution for each choice of data; that is, the matrix is not invertible), and no algorithm can be expected to reliably find a solution.

The definition of the condition number depends on the choice of norm, as can be illustrated by two examples.

If $\|\cdot \|$ is the matrix norm induced by the (vector) Euclidean norm (sometimes known as the L² norm and typically denoted as $\|\cdot \|_{2}$ ), then

\kappa (A)={\frac {\sigma _{\text{max}}(A)}{\sigma _{\text{min}}(A)}},

where $\sigma _{\text{max}}(A)$ and $\sigma _{\text{min}}(A)$ are maximal and minimal singular values of $A$ respectively. Hence:

If $A$ is normal, then $\kappa (A)={\frac {\left|\lambda _{\text{max}}(A)\right|}{\left|\lambda _{\text{min}}(A)\right|}},$ where $\lambda _{\text{max}}(A)$ and $\lambda _{\text{min}}(A)$ are maximal and minimal (by moduli) eigenvalues of $A$ respectively.
If $A$ is unitary, then $\kappa (A)=1.$

The condition number with respect to L² arises so often in numerical linear algebra that it is given a name, the condition number of a matrix.

If $\|\cdot \|$ is the matrix norm induced by the $L^{\infty }$ (vector) norm and $A$ is lower triangular non-singular (i.e. $a_{ii}\neq 0$ for all $i$ ), then

\kappa (A)\geq {\frac {\max _{i}{\big (}|a_{ii}|{\big )}}{\min _{i}{\big (}|a_{ii}|{\big )}}}

recalling that the eigenvalues of any triangular matrix are simply the diagonal entries.

The condition number computed with this norm is generally larger than the condition number computed relative to the Euclidean norm, but it can be evaluated more easily (and this is often the only practicably computable condition number, when the problem to solve involves a non-linear algebra^{[ clarification needed ]}, for example when approximating irrational and transcendental functions or numbers with numerical methods).

If the condition number is not significantly larger than one, the matrix is well-conditioned, which means that its inverse can be computed with good accuracy. If the condition number is very large, then the matrix is said to be ill-conditioned. Practically, such a matrix is almost singular, and the computation of its inverse, or solution of a linear system of equations is prone to large numerical errors.

A matrix that is not invertible is often said to have a condition number equal to infinity. Alternatively, it can be defined as $\kappa (A)=\|A\|\|A^{\dagger }\|$ , where $A^{\dagger }$ is the Moore-Penrose pseudoinverse. For square matrices, this unfortunately makes the condition number discontinuous, but it is a useful definition for rectangular matrices, which are never invertible but are still used to define systems of equations.

Nonlinear

Condition numbers can also be defined for nonlinear functions, and can be computed using calculus. The condition number varies with the point; in some cases one can use the maximum (or supremum) condition number over the domain of the function or domain of the question as an overall condition number, while in other cases the condition number at a particular point is of more interest.

One variable

The condition number of a differentiable function $f$ in one variable as a function is $\left|xf'/f\right|$ . Evaluated at a point $x$ , this is

\left|{\frac {xf'(x)}{f(x)}}\right|=\left|{\frac {(\log f)'}{(\log x)'}}\right|.

Note that this is the absolute value of the elasticity of a function in economics.

Most elegantly, this can be understood as (the absolute value of) the ratio of the logarithmic derivative of $f$ , which is $(\log f)'=f'/f$ , and the logarithmic derivative of $x$ , which is $(\log x)'=x'/x=1/x$ , yielding a ratio of $xf'/f$ . This is because the logarithmic derivative is the infinitesimal rate of relative change in a function: it is the derivative $f'$ scaled by the value of $f$ . Note that if a function has a zero at a point, its condition number at the point is infinite, as infinitesimal changes in the input can change the output from zero to positive or negative, yielding a ratio with zero in the denominator, hence infinite relative change.

More directly, given a small change $\Delta x$ in $x$ , the relative change in $x$ is $[(x+\Delta x)-x]/x=(\Delta x)/x$ , while the relative change in $f(x)$ is $[f(x+\Delta x)-f(x)]/f(x)$ . Taking the ratio yields

{\frac {[f(x+\Delta x)-f(x)]/f(x)}{(\Delta x)/x}}={\frac {x}{f(x)}}{\frac {f(x+\Delta x)-f(x)}{(x+\Delta x)-x}}={\frac {x}{f(x)}}{\frac {f(x+\Delta x)-f(x)}{\Delta x}}.

The last term is the difference quotient (the slope of the secant line), and taking the limit yields the derivative.

Condition numbers of common elementary functions are particularly important in computing significant figures and can be computed immediately from the derivative. A few important ones are given below:

Name	Symbol	Condition number
Addition / subtraction	$x+a$	$\left\|{\frac {x}{x+a}}\right\|$
Scalar multiplication	$ax$	$1$
Division	$1/x$	$1$
Polynomial	$x^{n}$	$\|n\|$
Exponential function	$e^{x}$	$\|x\|$
Natural logarithm function	$\ln(x)$	$\left\|{\frac {1}{\ln(x)}}\right\|$
Sine function	$\sin(x)$	$\|x\cot(x)\|$
Cosine function	$\cos(x)$	$\|x\tan(x)\|$
Tangent function	$\tan(x)$	$\|x(\tan(x)+\cot(x))\|$
Inverse sine function	$\arcsin(x)$	${\frac {x}{{\sqrt {1-x^{2}}}\arcsin(x)}}$
Inverse cosine function	$\arccos(x)$	${\frac {\|x\|}{{\sqrt {1-x^{2}}}\arccos(x)}}$
Inverse tangent function	$\arctan(x)$	${\frac {x}{(1+x^{2})\arctan(x)}}$

Several variables

Condition numbers can be defined for any function $f$ mapping its data from some domain (e.g. an $m$ -tuple of real numbers $x$ ) into some codomain (e.g. an $n$ -tuple of real numbers $f(x)$ ), where both the domain and codomain are Banach spaces. They express how sensitive that function is to small changes (or small errors) in its arguments. This is crucial in assessing the sensitivity and potential accuracy difficulties of numerous computational problems, for example, polynomial root finding or computing eigenvalues.

The condition number of $f$ at a point $x$ (specifically, its relative condition number^[4]) is then defined to be the maximum ratio of the fractional change in $f(x)$ to any fractional change in $x$ , in the limit where the change $\delta x$ in $x$ becomes infinitesimally small:^[4]

\lim _{\varepsilon \to 0^{+}}\sup _{\|\delta x\|\leq \varepsilon }\left[\left.{\frac {\left\|f(x+\delta x)-f(x)\right\|}{\|f(x)\|}}\right/{\frac {\|\delta x\|}{\|x\|}}\right],

where $\|\cdot \|$ is a norm on the domain/codomain of $f$ .

If $f$ is differentiable, this is equivalent to:^[4]

{\frac {\|J(x)\|}{\|f(x)\|/\|x\|}},

where $J(x)$ denotes the Jacobian matrix of partial derivatives of $f$ at $x$ , and $\|J(x)\|$ is the induced norm on the matrix.

Related Research Articles

In numerical analysis, Newton's method is the iterative algorithm for solving a nonlinear equation

In the mathematical subfield of numerical analysis, numerical stability is a generally desirable property of numerical algorithms. The precise definition of stability depends on the context. One is numerical linear algebra and the other is algorithms for solving ordinary and partial differential equations by discrete approximation.

Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.

In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse.

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

The approximation error in a data value is the discrepancy between an exact value and some approximation to it. This error can be expressed as an absolute error or as a relative error.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems $. It was first discovered by the Polish mathematician Stefan Kaczmarz, and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique (ART). ART includes the positivity constraint, making it nonlinear.$

In mathematics, the Bauer–Fike theorem is a standard result in the perturbation theory of the eigenvalue of a complex-valued diagonalizable matrix. In its substance, it states an absolute upper bound for the deviation of one perturbed matrix eigenvalue from a properly chosen eigenvalue of the exact matrix. Informally speaking, what it says is that the sensitivity of the eigenvalues is estimated by the condition number of the matrix of eigenvectors.

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

The method of iteratively reweighted least squares (IRLS) is used to solve certain optimization problems with objective functions of the form of a p-norm:

In geometry, the Chebyshev center of a bounded set $having non-empty interior is the center of the minimal-radius ball enclosing the entire set, or alternatively the center of largest inscribed ball of .$

Iterative refinement is an iterative method proposed by James H. Wilkinson to improve the accuracy of numerical solutions to systems of linear equations.

<span class="mw-page-title-main">Matrix completion</span>

Matrix completion is the task of filling in the missing entries of a partially observed matrix, which is equivalent to performing data imputation in statistics. A wide range of datasets are naturally organized in matrix form. One example is the movie-ratings matrix, as appears in the Netflix problem: Given a ratings matrix in which each entry $represents the rating of movie by customer, if customer has watched movie and is otherwise missing, we would like to predict the remaining entries in order to make good recommendations to customers on what to watch next. Another example is the document-term matrix: The frequencies of words used in a collection of documents can be represented as a matrix, where each entry corresponds to the number of times the associated term appears in the indicated document.$

The multiplicative weights update method is an algorithmic technique most commonly used for decision making and prediction, and also widely deployed in game theory and algorithm design. The simplest use case is the problem of prediction from expert advice, in which a decision maker needs to iteratively decide on an expert whose advice to follow. The method assigns initial weights to the experts, and updates these weights multiplicatively and iteratively according to the feedback of how well an expert performed: reducing it in case of poor performance, and increasing it otherwise. It was discovered repeatedly in very diverse fields such as machine learning, optimization, theoretical computer science, and game theory.

The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975.

References

↑ Belsley, David A.; Kuh, Edwin; Welsch, Roy E. (1980). "The Condition Number". Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons. pp. 100–104. ISBN 0-471-05856-4.
↑ Pesaran, M. Hashem (2015). "The Multicollinearity Problem". Time Series and Panel Data Econometrics. New York: Oxford University Press. pp. 67–72 [p. 70]. ISBN 978-0-19-875998-0.
↑ Cheney; Kincaid (2008). Numerical Mathematics and Computing. p. 321. ISBN 978-0-495-11475-8.
1 2 3 Trefethen, L. N.; Bau, D. (1997). Numerical Linear Algebra. SIAM. ISBN 978-0-89871-361-9.

External links

Condition Number of a Matrix at Holistic Numerical Methods Institute
MATLAB library function to determine condition number
Condition number – Encyclopedia of Mathematics
Who Invented the Matrix Condition Number? by Nick Higham

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Belsley, David A.; Kuh, Edwin; Welsch, Roy E. (1980). "The Condition Number". Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. New York: John Wiley & Sons. pp. 100–104. ISBN 0-471-05856-4.

[2] Pesaran, M. Hashem (2015). "The Multicollinearity Problem". Time Series and Panel Data Econometrics. New York: Oxford University Press. pp. 67–72 [p. 70]. ISBN 978-0-19-875998-0.

[Numerical_Mathematics_and_Computing,_by_Cheney_and_Kincaid-3] Cheney; Kincaid (2008). Numerical Mathematics and Computing. p. 321. ISBN 978-0-495-11475-8.

[TrefethenBau-4] 1 2 3 Trefethen, L. N.; Bau, D. (1997). Numerical Linear Algebra. SIAM. ISBN 978-0-89871-361-9.

[1]

[2]

[3]

[4]