Woodbury matrix identity

Last updated

In mathematics (specifically linear algebra), the Woodbury matrix identity, named after Max A. Woodbury, [1] [2] says that the inverse of a rank-k correction of some matrix can be computed by doing a rank-k correction to the inverse of the original matrix. Alternative names for this formula are the matrix inversion lemma, Sherman–Morrison–Woodbury formula or just Woodbury formula. However, the identity appeared in several papers before the Woodbury report. [3] [4]

Contents

The Woodbury matrix identity is [5]

where A, U, C and V are conformable matrices: A is n×n, C is k×k, U is n×k, and V is k×n. This can be derived using blockwise matrix inversion.

While the identity is primarily used on matrices, it holds in a general ring or in an Ab-category.

The Woodbury matrix identity allows cheap computation of inverses and solutions to linear equations. However, little is known about the numerical stability of the formula. There are no published results concerning its error bounds. Anecdotal evidence [6] suggests that it may diverge even for seemingly benign examples (when both the original and modified matrices are well-conditioned).

Discussion

To prove this result, we will start by proving a simpler one. Replacing A and C with the identity matrix I, we obtain another identity which is a bit simpler:

To recover the original equation from this reduced identity, set and .

This identity itself can be viewed as the combination of two simpler identities. We obtain the first identity from

thus,

and similarly

The second identity is the so-called push-through identity [7]

that we obtain from

after multiplying by on the right and by on the left.

Putting all together,

where the first and second equality come from the first and second identity, respectively.

Special cases

When are vectors, the identity reduces to the Sherman–Morrison formula.

In the scalar case, the reduced version is simply

Inverse of a sum

If n = k and U = V = In is the identity matrix, then

Continuing with the merging of the terms of the far right-hand side of the above equation results in Hua's identity

Another useful form of the same identity is

which, unlike those above, is valid even if is singular, and has a recursive structure that yields

if the spectral radius of is less than one. That is, if the above sum converges then it is equal to .

This form can be used in perturbative expansions where B is a perturbation of A.

Variations

Binomial inverse theorem

If A, B, U, V are matrices of sizes n×n, k×k, n×k, k×n, respectively, then

provided A and B + BVA−1UB are nonsingular. Nonsingularity of the latter requires that B−1 exist since it equals B(I + VA−1UB) and the rank of the latter cannot exceed the rank of B. [7]

Since B is invertible, the two B terms flanking the parenthetical quantity inverse in the right-hand side can be replaced with (B−1)−1, which results in the original Woodbury identity.

A variation for when B is singular and possibly even non-square: [7]

Formulas also exist for certain cases in which A is singular. [8]

Pseudoinverse with positive semidefinite matrices

In general Woodbury's identity is not valid if one or more inverses are replaced by (Moore–Penrose) pseudoinverses. However, if and are positive semidefinite, and (implying that is itself positive semidefinite), then the following formula provides a generalization: [9] [10]

where can be written as because any positive semidefinite matrix is equal to for some .

Derivations

Direct proof

The formula can be proven by checking that times its alleged inverse on the right side of the Woodbury identity gives the identity matrix:

Alternative proofs

Algebraic proof

First consider these useful identities,

Now,

Derivation via blockwise elimination

Deriving the Woodbury matrix identity is easily done by solving the following block matrix inversion problem

Expanding, we can see that the above reduces to

which is equivalent to . Eliminating the first equation, we find that , which can be substituted into the second to find . Expanding and rearranging, we have , or . Finally, we substitute into our , and we have . Thus,

We have derived the Woodbury matrix identity.

Derivation from LDU decomposition

We start by the matrix

By eliminating the entry under the A (given that A is invertible) we get

Likewise, eliminating the entry above C gives

Now combining the above two, we get

Moving to the right side gives

which is the LDU decomposition of the block matrix into an upper triangular, diagonal, and lower triangular matrices.

Now inverting both sides gives

We could equally well have done it the other way (provided that C is invertible) i.e.

Now again inverting both sides,

Now comparing elements (1, 1) of the RHS of (1) and (2) above gives the Woodbury formula

Applications

This identity is useful in certain numerical computations where A1 has already been computed and it is desired to compute (A + UCV)1. With the inverse of A available, it is only necessary to find the inverse of C−1 + VA−1U in order to obtain the result using the right-hand side of the identity. If C has a much smaller dimension than A, this is more efficient than inverting A + UCV directly. A common case is finding the inverse of a low-rank update A + UCV of A (where U only has a few columns and V only a few rows), or finding an approximation of the inverse of the matrix A + B where the matrix B can be approximated by a low-rank matrix UCV, for example using the singular value decomposition.

This is applied, e.g., in the Kalman filter and recursive least squares methods, to replace the parametric solution, requiring inversion of a state vector sized matrix, with a condition equations based solution. In case of the Kalman filter this matrix has the dimensions of the vector of observations, i.e., as small as 1 in case only one new observation is processed at a time. This significantly speeds up the often real time calculations of the filter.

In the case when C is the identity matrix I, the matrix is known in numerical linear algebra and numerical partial differential equations as the capacitance matrix. [4]

See also

Notes

  1. Max A. Woodbury, Inverting modified matrices, Memorandum Rept. 42, Statistical Research Group, Princeton University, Princeton, NJ, 1950, 4pp MR 38136
  2. Max A. Woodbury, The Stability of Out-Input Matrices. Chicago, Ill., 1949. 5 pp. MR 32564
  3. Guttmann, Louis (1946). "Enlargement methods for computing the inverse matrix". Ann. Math. Statist. 17 (3): 336–343. doi: 10.1214/aoms/1177730946 .
  4. 1 2 Hager, William W. (1989). "Updating the inverse of a matrix". SIAM Review. 31 (2): 221–239. doi:10.1137/1031049. JSTOR   2030425. MR   0997457.
  5. Higham, Nicholas (2002). Accuracy and Stability of Numerical Algorithms (2nd ed.). SIAM. p.  258. ISBN   978-0-89871-521-7. MR   1927606.
  6. "MathOverflow discussion". MathOverflow.
  7. 1 2 3 Henderson, H. V.; Searle, S. R. (1981). "On deriving the inverse of a sum of matrices" (PDF). SIAM Review. 23 (1): 53–60. doi:10.1137/1023004. hdl: 1813/32749 . JSTOR   2029838.
  8. Kurt S. Riedel, "A Sherman–Morrison–Woodbury Identity for Rank Augmenting Matrices with Application to Centering", SIAM Journal on Matrix Analysis and Applications, 13 (1992)659-662, doi : 10.1137/0613040 preprint MR 1152773
  9. Bernstein, Dennis S. (2018). Scalar, Vector, and Matrix Mathematics: Theory, Facts, and Formulas (Revised and expanded ed.). Princeton: Princeton University Press. p. 638. ISBN   9780691151205.
  10. Schott, James R. (2017). Matrix analysis for statistics (Third ed.). Hoboken, New Jersey: John Wiley & Sons, Inc. p. 219. ISBN   9781119092483.

Related Research Articles

<span class="mw-page-title-main">Lie algebra</span> Algebraic structure used in analysis

In mathematics, a Lie algebra is a vector space together with an operation called the Lie bracket, an alternating bilinear map , that satisfies the Jacobi identity. In other words, a Lie algebra is an algebra over a field for which the multiplication operation is alternating and satisfies the Jacobi identity. The Lie bracket of two vectors and is denoted . A Lie algebra is typically a non-associative algebra. However, every associative algebra gives rise to a Lie algebra, consisting of the same vector space with the commutator Lie bracket, .

<span class="mw-page-title-main">Group (mathematics)</span> Set with associative invertible operation

In mathematics, a group is a set with an operation that satisfies the following constraints: the operation is associative and has an identity element, and every element of the set has an inverse element.

<span class="mw-page-title-main">Pauli matrices</span> Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three 2 × 2 complex matrices that are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

<span class="mw-page-title-main">Orthogonal group</span> Type of group in mathematics

In mathematics, the orthogonal group in dimension n, denoted O(n), is the group of distance-preserving transformations of a Euclidean space of dimension n that preserve a fixed point, where the group operation is given by composing transformations. The orthogonal group is sometimes called the general orthogonal group, by analogy with the general linear group. Equivalently, it is the group of n × n orthogonal matrices, where the group operation is given by matrix multiplication (an orthogonal matrix is a real matrix whose inverse equals its transpose). The orthogonal group is an algebraic group and a Lie group. It is compact.

<span class="mw-page-title-main">Symplectic group</span> Mathematical group

In mathematics, the name symplectic group can refer to two different, but closely related, collections of mathematical groups, denoted Sp(2n, F) and Sp(n) for positive integer n and field F (usually C or R). The latter is called the compact symplectic group and is also denoted by . Many authors prefer slightly different notations, usually differing by factors of 2. The notation used here is consistent with the size of the most common matrices which represent the groups. In Cartan's classification of the simple Lie algebras, the Lie algebra of the complex group Sp(2n, C) is denoted Cn, and Sp(n) is the compact real form of Sp(2n, C). Note that when we refer to the (compact) symplectic group it is implied that we are talking about the collection of (compact) symplectic groups, indexed by their dimension n.

<span class="mw-page-title-main">Unitary group</span> Group of unitary matrices

In mathematics, the unitary group of degree n, denoted U(n), is the group of n × n unitary matrices, with the group operation of matrix multiplication. The unitary group is a subgroup of the general linear group GL(n, C). Hyperorthogonal group is an archaic name for the unitary group, especially over finite fields. For the group of unitary matrices with determinant 1, see Special unitary group.

In linear algebra, an n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that

In mathematics, and in particular linear algebra, the Moore–Penrose inverse of a matrix is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. When referring to a matrix, the term pseudoinverse, without further specification, is often used to indicate the Moore–Penrose inverse. The term generalized inverse is sometimes used as a synonym for pseudoinverse.

In linear algebra and the theory of matrices, the Schur complement of a block matrix is defined as follows.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

In linear algebra, the Sherman–Morrison formula, named after Jack Sherman and Winifred J. Morrison, computes the inverse of a "rank-1 update" to a matrix whose inverse has previously been computed. That is, given an invertible matrix and the outer product of vectors and the formula cheaply computes an updated matrix inverse

In mathematics, the square root of a matrix extends the notion of square root from numbers to matrices. A matrix B is said to be a square root of A if the matrix product BB is equal to A.

In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized inverse of a matrix is to obtain a matrix that can serve as an inverse in some sense for a wider class of matrices than invertible matrices. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix .

In mathematics, in the field of control theory, a Sylvester equation is a matrix equation of the form:

In mathematics, in particular linear algebra, the matrix determinant lemma computes the determinant of the sum of an invertible matrix A and the dyadic product, uvT, of a column vector u and a row vector vT.

In mathematics, the logarithmic norm is a real-valued functional on operators, and is derived from either an inner product, a vector norm, or its induced operator norm. The logarithmic norm was independently introduced by Germund Dahlquist and Sergei Lozinskiĭ in 1958, for square matrices. It has since been extended to nonlinear operators and unbounded operators as well. The logarithmic norm has a wide range of applications, in particular in matrix theory, differential equations and numerical analysis. In the finite-dimensional setting, it is also referred to as the matrix measure or the Lozinskiĭ measure.

<span class="mw-page-title-main">Gyrovector space</span> Mathematical space used to study hyperbolic geometry

A gyrovector space is a mathematical concept proposed by Abraham A. Ungar for studying hyperbolic geometry in analogy to the way vector spaces are used in Euclidean geometry. Ungar introduced the concept of gyrovectors that have addition based on gyrogroups instead of vectors which have addition based on groups. Ungar developed his concept as a tool for the formulation of special relativity as an alternative to the use of Lorentz transformations to represent compositions of velocities. This is achieved by introducing "gyro operators"; two 3d velocity vectors are used to construct an operator, which acts on another 3d velocity.

In linear algebra, a square nonnegative matrix of order is said to be productive, or to be a Leontief matrix, if there exists a nonnegative column matrix such as is a positive matrix.