Schur complement

Last updated

In linear algebra and the theory of matrices, the Schur complement of a block matrix is defined as follows.

Contents

Suppose p, q are nonnegative integers, and suppose A, B, C, D are respectively p × p, p × q, q × p, and q × q matrices of complex numbers. Let

so that M is a (p + q) × (p + q) matrix.

If D is invertible, then the Schur complement of the block D of the matrix M is the p × p matrix defined by

If A is invertible, the Schur complement of the block A of the matrix M is the q × q matrix defined by

In the case that A or D is singular, substituting a generalized inverse for the inverses on M/A and M/D yields the generalized Schur complement.

The Schur complement is named after Issai Schur [1] who used it to prove Schur's lemma, although it had been used previously. [2] Emilie Virginia Haynsworth was the first to call it the Schur complement. [3] The Schur complement is a key tool in the fields of numerical analysis, statistics, and matrix analysis. The Schur complement is sometimes referred to as the Feshbach map after a physicist Herman Feshbach. [4]

Background

The Schur complement arises when performing a block Gaussian elimination on the matrix M. In order to eliminate the elements below the block diagonal, one multiplies the matrix M by a block lower triangular matrix on the right as follows:

where Ip denotes a p×p identity matrix. As a result, the Schur complement appears in the upper-left p×p block.

Continuing the elimination process beyond this point (i.e., performing a block Gauss–Jordan elimination),

leads to an LDU decomposition of M, which reads

Thus, the inverse of M may be expressed involving D1 and the inverse of Schur's complement, assuming it exists, as

The above relationship comes from the elimination operations that involve D1 and M/D. An equivalent derivation can be done with the roles of A and D interchanged. By equating the expressions for M1 obtained in these two different ways, one can establish the matrix inversion lemma, which relates the two Schur complements of M: M/D and M/A (see "Derivation from LDU decomposition" in Woodbury matrix identity § Alternative proofs).

Properties

provided that AD  BC is non-zero.
whenever this inverse exists.
, respectively
,
which generalizes the determinant formula for 2 × 2 matrices.

Application to solving linear equations

The Schur complement arises naturally in solving a system of linear equations such as [7]

.

Assuming that the submatrix is invertible, we can eliminate from the equations, as follows.

.

Substituting this expression into the second equation yields

.

We refer to this as the reduced equation obtained by eliminating from the original equation. The matrix appearing in the reduced equation is called the Schur complement of the first block in :

.

Solving the reduced equation, we obtain

.

Substituting this into the first equation yields

.

We can express the above two equation as:

.

Therefore, a formulation for the inverse of a block matrix is:

.

In particular, we see that the Schur complement is the inverse of the block entry of the inverse of .

In practice, one needs to be well-conditioned in order for this algorithm to be numerically accurate.

This method is useful in electrical engineering to reduce the dimension of a network's equations. It is especially useful when element(s) of the source vector are zero. For example, when or is zero, we can eliminate the associated rows of the coefficient matrix without any changes to the rest of the source vector. If is null then the above equation for reduces to , thus reducing the dimension of the coefficient matrix while leaving unmodified. This is used to advantage in electrical engineering where it is referred to as node elimination or Kron reduction.

Applications to probability theory and statistics

Suppose the random column vectors X, Y live in Rn and Rm respectively, and the vector (X, Y) in Rn + m has a multivariate normal distribution whose covariance is the symmetric positive-definite matrix

where is the covariance matrix of X, is the covariance matrix of Y and is the covariance matrix between X and Y.

Then the conditional covariance of X given Y is the Schur complement of C in : [8]

If we take the matrix above to be, not a covariance of a random vector, but a sample covariance, then it may have a Wishart distribution. In that case, the Schur complement of C in also has a Wishart distribution.[ citation needed ]

Conditions for positive definiteness and semi-definiteness

Let X be a symmetric matrix of real numbers given by

Then

The first and third statements can be derived [7] by considering the minimizer of the quantity

as a function of v (for fixed u).

Furthermore, since

and similarly for positive semi-definite matrices, the second (respectively fourth) statement is immediate from the first (resp. third) statement.

There is also a sufficient and necessary condition for the positive semi-definiteness of X in terms of a generalized Schur complement. [2] Precisely,

where denotes a generalized inverse of .

See also

Related Research Articles

In mathematics, a symmetric matrix with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the transpose of . More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

In linear algebra, the Cholesky decomposition or Cholesky factorization is a decomposition of a Hermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose, which is useful for efficient numerical solutions, e.g., Monte Carlo simulations. It was discovered by André-Louis Cholesky for real matrices, and posthumously published in 1924. When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.

Ray transfer matrix analysis is a mathematical form for performing ray tracing calculations in sufficiently simple problems which can be solved considering only paraxial rays. Each optical element is described by a 2×2 ray transfer matrix which operates on a vector describing an incoming light ray to calculate the outgoing ray. Multiplication of the successive matrices thus yields a concise ray transfer matrix describing the entire optical system. The same mathematics is also used in accelerator physics to track particles through the magnet installations of a particle accelerator, see electron optics.

In linear algebra, an n-by-n square matrix A is called invertible if there exists an n-by-n square matrix B such that

In mathematics, the conjugate transpose, also known as the Hermitian transpose, of an complex matrix is an matrix obtained by transposing and applying complex conjugation to each entry. There are several notations, such as or , , or .

In linear algebra, a square matrix  is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix  and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space , a linear map  is called diagonalizable if there exists an ordered basis of  consisting of eigenvectors of . These definitions are equivalent: if  has a matrix representation as above, then the column vectors of  form a basis consisting of eigenvectors of , and the diagonal entries of  are the corresponding eigenvalues of ; with respect to this eigenvector basis,  is represented by .

<span class="mw-page-title-main">Block matrix</span> Matrix defined using smaller matrices called blocks

In mathematics, a block matrix or a partitioned matrix is a matrix that is interpreted as having been broken into sections called blocks or submatrices.

In differential geometry, the first fundamental form is the inner product on the tangent space of a surface in three-dimensional Euclidean space which is induced canonically from the dot product of R3. It permits the calculation of curvature and metric properties of a surface such as length and area in a manner consistent with the ambient space. The first fundamental form is denoted by the Roman numeral I,

<span class="mw-page-title-main">Projection (linear algebra)</span> Idempotent linear transformation from a vector space to itself

In linear algebra and functional analysis, a projection is a linear transformation from a vector space to itself such that . That is, whenever is applied twice to any vector, it gives the same result as if it were applied once. It leaves its image unchanged. This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

In mathematical optimization theory, the linear complementarity problem (LCP) arises frequently in computational mechanics and encompasses the well-known quadratic programming as a special case. It was proposed by Cottle and Dantzig in 1968.

In mathematics and theoretical physics, the Berezinian or superdeterminant is a generalization of the determinant to the case of supermatrices. The name is for Felix Berezin. The Berezinian plays a role analogous to the determinant when considering coordinate changes for integration on a supermanifold.

In mathematics, and in particular, algebra, a generalized inverse of an element x is an element y that has some properties of an inverse element but not necessarily all of them. The purpose of constructing a generalized inverse of a matrix is to obtain a matrix that can serve as an inverse in some sense for a wider class of matrices than invertible matrices. Generalized inverses can be defined in any mathematical structure that involves associative multiplication, that is, in a semigroup. This article describes generalized inverses of a matrix .

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

<span class="mw-page-title-main">Hadamard product (matrices)</span> Matrix operation

In mathematics, the Hadamard product is a binary operation that takes in two matrices of the same dimensions and returns a matrix of the multiplied corresponding elements. This operation can be thought as a "naive matrix multiplication" and is different from the matrix product. It is attributed to, and named after, either French mathematician Jacques Hadamard or German mathematician Issai Schur.

In mathematical optimization, the revised simplex method is a variant of George Dantzig's simplex method for linear programming.

In mathematics, Fischer's inequality gives an upper bound for the determinant of a positive-semidefinite matrix whose entries are complex numbers in terms of the determinants of its principal diagonal blocks. Suppose A, C are respectively p×p, q×q positive-semidefinite complex matrices and B is a p×q complex matrix. Let

In mathematics, the Khatri–Rao product or block Kronecker product of two partitioned matrices and is defined as

References

  1. Schur, J. (1917). "Über Potenzreihen die im Inneren des Einheitskreises beschränkt sind". J. reine u. angewandte Mathematik. 147: 205–232. doi:10.1515/crll.1917.147.205.
  2. 1 2 3 4 Zhang, Fuzhen (2005). Zhang, Fuzhen (ed.). The Schur Complement and Its Applications. Numerical Methods and Algorithms. Vol. 4. Springer. doi:10.1007/b105056. ISBN   0-387-24271-6.
  3. Haynsworth, E. V., "On the Schur Complement", Basel Mathematical Notes, #BNB 20, 17 pages, June 1968.
  4. Feshbach, Herman (1958). "Unified theory of nuclear reactions". Annals of Physics. 5 (4): 357–390. doi:10.1016/0003-4916(58)90007-1.
  5. Crabtree, Douglas E.; Haynsworth, Emilie V. (1969). "An identity for the Schur complement of a matrix". Proceedings of the American Mathematical Society. 22 (2): 364–366. doi: 10.1090/S0002-9939-1969-0255573-1 . ISSN   0002-9939. S2CID   122868483.
  6. Devriendt, Karel (2022). "Effective resistance is more than distance: Laplacians, Simplices and the Schur complement". Linear Algebra and Its Applications. 639: 24–49. arXiv: 2010.04521 . doi:10.1016/j.laa.2022.01.002. S2CID   222272289.
  7. 1 2 Boyd, S. and Vandenberghe, L. (2004), "Convex Optimization", Cambridge University Press (Appendix A.5.5)
  8. von Mises, Richard (1964). "Chapter VIII.9.3". Mathematical theory of probability and statistics . Academic Press. ISBN   978-1483255385.