Block matrix pseudoinverse

Last updated November 04, 2024

In mathematics, a block matrix pseudoinverse is a formula for the pseudoinverse of a partitioned matrix. This is useful for decomposing or approximating many algorithms updating parameters in signal processing, which are based on the least squares method.

Derivation

Consider a column-wise partitioned matrix:

{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}},\quad \mathbf {A} \in \mathbb {R} ^{m\times n},\quad \mathbf {B} \in \mathbb {R} ^{m\times p},\quad m\geq n+p.

If the above matrix is full column rank, the Moore–Penrose inverse matrices of it and its transpose are

{\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&=\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)^{-1}.\end{aligned}}

This computation of the pseudoinverse requires (n + p)-square matrix inversion and does not take advantage of the block form.

To reduce computational costs to n- and p-square matrix inversions and to introduce parallelism, treating the blocks separately, one derives ^[1]

{\begin{aligned}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}\\\mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}},\\{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}&={\begin{bmatrix}\mathbf {P} _{B}^{\perp }\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \mathbf {P} _{A}^{\perp }\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}\end{bmatrix}}={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}},\end{aligned}}

where orthogonal projection matrices are defined by

{\begin{aligned}\mathbf {P} _{A}^{\perp }&=\mathbf {I} -\mathbf {A} \left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}\mathbf {A} ^{\textsf {T}},\\\mathbf {P} _{B}^{\perp }&=\mathbf {I} -\mathbf {B} \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}\mathbf {B} ^{\textsf {T}}.\end{aligned}}

The above formulas are not necessarily valid if ${\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}$ does not have full rank – for example, if $\mathbf {A} \neq 0$ , then

{\begin{bmatrix}\mathbf {A} &\mathbf {A} \end{bmatrix}}^{+}={\frac {1}{2}}{\begin{bmatrix}\mathbf {A} ^{+}\\\mathbf {A} ^{+}\end{bmatrix}}\neq {\begin{bmatrix}\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {A} \right)^{+}\end{bmatrix}}=0

Application to least squares problems

Given the same matrices as above, we consider the following least squares problems, which appear as multiple objective optimizations or constrained problems in signal processing. Eventually, we can implement a parallel algorithm for least squares based on the following results.

Column-wise partitioning in over-determined least squares

Suppose a solution $\mathbf {x} ={\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}$ solves an over-determined system:

{\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}{\begin{bmatrix}\mathbf {x} _{1}\\\mathbf {x} _{2}\\\end{bmatrix}}=\mathbf {d} ,\quad \mathbf {d} \in \mathbb {R} ^{m\times 1}.

Using the block matrix pseudoinverse, we have

\mathbf {x} ={\begin{bmatrix}\mathbf {A} ,&\mathbf {B} \end{bmatrix}}^{+}\,\mathbf {d} ={\begin{bmatrix}\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\\\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\end{bmatrix}}\mathbf {d} .

Therefore, we have a decomposed solution:

\mathbf {x} _{1}=\left(\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{+}\,\mathbf {d} ,\quad \mathbf {x} _{2}=\left(\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{+}\,\mathbf {d} .

Row-wise partitioning in under-determined least squares

Suppose a solution $\mathbf {x}$ solves an under-determined system:

{\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}\mathbf {x} ={\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}},\quad \mathbf {e} \in \mathbb {R} ^{n\times 1},\quad \mathbf {f} \in \mathbb {R} ^{p\times 1}.

The minimum-norm solution is given by

\mathbf {x} ={\begin{bmatrix}\mathbf {A} ^{\textsf {T}}\\\mathbf {B} ^{\textsf {T}}\end{bmatrix}}^{+}\,{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}.

Using the block matrix pseudoinverse, we have

\mathbf {x} ={\begin{bmatrix}\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}&\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\end{bmatrix}}{\begin{bmatrix}\mathbf {e} \\\mathbf {f} \end{bmatrix}}=\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\right)^{+}\,\mathbf {e} +\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\right)^{+}\,\mathbf {f} .

Comments on matrix inversion

Instead of $\mathbf {\left({\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}^{\textsf {T}}{\begin{bmatrix}\mathbf {A} &\mathbf {B} \end{bmatrix}}\right)} ^{-1}$ , we need to calculate directly or indirectly^{[ citation needed ]}^{[ original research? ]}

\left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1},\quad \left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1},\quad \left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}.

In a dense and small system, we can use singular value decomposition, QR decomposition, or Cholesky decomposition to replace the matrix inversions with numerical routines. In a large system, we may employ iterative methods such as Krylov subspace methods.

Considering parallel algorithms, we can compute $\left(\mathbf {A} ^{\textsf {T}}\mathbf {A} \right)^{-1}$ and $\left(\mathbf {B} ^{\textsf {T}}\mathbf {B} \right)^{-1}$ in parallel. Then, we finish to compute $\left(\mathbf {A} ^{\textsf {T}}\mathbf {P} _{B}^{\perp }\mathbf {A} \right)^{-1}$ and $\left(\mathbf {B} ^{\textsf {T}}\mathbf {P} _{A}^{\perp }\mathbf {B} \right)^{-1}$ also in parallel.

Related Research Articles

In mathematics, a symmetric matrix $with real entries is positive-definite if the real number is positive for every nonzero real column vector where is the row vector transpose of More generally, a Hermitian matrix is positive-definite if the real number is positive for every nonzero complex column vector where denotes the conjugate transpose of$

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any ⁠ $⁠$ matrix. It is related to the polar decomposition.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is $, while an example of a 3\times3 diagonal matrix is . An identity matrix of any size, or any multiple of it is a diagonal matrix called a scalar matrix, for example, . In geometry, a diagonal matrix may be used as a scaling matrix, since matrix multiplication with it results in changing scale (size) and possibly also shape; only a scalar matrix results in uniform change in scale.$

In linear algebra, an invertible matrix is a square matrix which has an inverse. In other words, if some other matrix is multiplied by the invertible matrix, the result can be multiplied by an inverse to undo the operation. Invertible matrices are the same size as their inverse.

In linear algebra, a square matrix $is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space, a linear map is called diagonalizable if there exists an ordered basis of consisting of eigenvectors of . These definitions are equivalent: if has a matrix representation as above, then the column vectors of form a basis consisting of eigenvectors of, and the diagonal entries of are the corresponding eigenvalues of; with respect to this eigenvector basis, is represented by .$

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

In mathematics, and in particular linear algebra, the Moore–Penrose inverse⁠ $⁠$ of a matrix ⁠ $⁠$ , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

In linear algebra and functional analysis, a projection is a linear transformation $from a vector space to itself such that . That is, whenever is applied twice to any vector, it gives the same result as if it were applied once. It leaves its image unchanged. This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.$

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In mathematics, the Kronecker product, sometimes denoted by ⊗, is an operation on two matrices of arbitrary size resulting in a block matrix. It is a specialization of the tensor product from vectors to matrices and gives the matrix of the tensor product linear map with respect to a standard choice of basis. The Kronecker product is to be distinguished from the usual matrix multiplication, which is an entirely different operation. The Kronecker product is also sometimes called matrix direct product.

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector, $, of a linear transformation,, is scaled by a constant factor,, when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

In geometry and linear algebra, a principal axis is a certain line in a Euclidean space associated with a ellipsoid or hyperboloid, generalizing the major and minor axes of an ellipse or hyperbola. The principal axis theorem states that the principal axes are perpendicular, and gives a constructive procedure for finding them.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In statistics, the projection matrix $, sometimes also called the influence matrix or hat matrix, maps the vector of response values to the vector of fitted values. It describes the influence each response value has on each fitted value. The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation.$

In mathematics, given a field $, nonnegative integers, and a matrix, a rank decomposition or rank factorization of A is a factorization of A of the form A = CF, where and, where is the rank of .$

Stokes' theorem, also known as the Kelvin–Stokes theorem after Lord Kelvin and George Stokes, the fundamental theorem for curls or simply the curl theorem, is a theorem in vector calculus on $. Given a vector field, the theorem relates the integral of the curl of the vector field over some surface, to the line integral of the vector field around the boundary of the surface. The classical theorem of Stokes can be stated in one sentence:$

In mathematics, the Khatri–Rao product or block Kronecker product of two partitioned matrices $and is defined as$

References

↑ J.K. Baksalary and O.M. Baksalary (2007). "Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix". Linear Algebra Appl. 421: 16–23. doi:10.1016/j.laa.2006.03.031.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Baksalary-1] J.K. Baksalary and O.M. Baksalary (2007). "Particular formulae for the Moore–Penrose inverse of a columnwise partitioned matrix". Linear Algebra Appl. 421: 16–23. doi:10.1016/j.laa.2006.03.031.

[1]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software