# Projection (linear algebra)

Last updated

In linear algebra and functional analysis, a projection is a linear transformation ${\displaystyle P}$ from a vector space to itself (an endomorphism) such that ${\displaystyle P\circ P=P}$. That is, whenever ${\displaystyle P}$ is applied twice to any vector, it gives the same result as if it were applied once (i.e. ${\displaystyle P}$ is idempotent). It leaves its image unchanged. [1] This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.

## Definitions

A projection on a vector space ${\displaystyle V}$ is a linear operator ${\displaystyle P:V\to V}$ such that ${\displaystyle P^{2}=P}$.

When ${\displaystyle V}$ has an inner product and is complete (i.e. when ${\displaystyle V}$ is a Hilbert space) the concept of orthogonality can be used. A projection ${\displaystyle P}$ on a Hilbert space ${\displaystyle V}$ is called an orthogonal projection if it satisfies ${\displaystyle \langle P\mathbf {x} ,\mathbf {y} \rangle =\langle \mathbf {x} ,P\mathbf {y} \rangle }$ for all ${\displaystyle \mathbf {x} ,\mathbf {y} \in V}$. A projection on a Hilbert space that is not orthogonal is called an oblique projection.

### Projection matrix

• In the finite-dimensional case, a square matrix ${\displaystyle P}$ is called a projection matrix if it is equal to its square, i.e. if ${\displaystyle P^{2}=P}$. [2] :p. 38
• A square matrix ${\displaystyle P}$ is called an orthogonal projection matrix if ${\displaystyle P^{2}=P=P^{\mathrm {T} }}$ for a real matrix, and respectively ${\displaystyle P^{2}=P=P^{*}}$ for a complex matrix, where ${\displaystyle P^{\mathrm {T} }}$ denotes the transpose of ${\displaystyle P}$ and ${\displaystyle P^{*}}$ denotes the adjoint or Hermitian transpose of ${\displaystyle P}$. [2] :p. 223
• A projection matrix that is not an orthogonal projection matrix is called an oblique projection matrix.

The eigenvalues of a projection matrix must be 0 or 1.

## Examples

### Orthogonal projection

For example, the function which maps the point ${\displaystyle (x,y,z)}$ in three-dimensional space ${\displaystyle \mathbb {R} ^{3}}$ to the point ${\displaystyle (x,y,0)}$ is an orthogonal projection onto the xy-plane. This function is represented by the matrix

${\displaystyle P={\begin{bmatrix}1&0&0\\0&1&0\\0&0&0\end{bmatrix}}.}$

The action of this matrix on an arbitrary vector is

${\displaystyle P{\begin{bmatrix}x\\y\\z\end{bmatrix}}={\begin{bmatrix}x\\y\\0\end{bmatrix}}.}$

To see that ${\displaystyle P}$ is indeed a projection, i.e., ${\displaystyle P=P^{2}}$, we compute

${\displaystyle P^{2}{\begin{bmatrix}x\\y\\z\end{bmatrix}}=P{\begin{bmatrix}x\\y\\0\end{bmatrix}}={\begin{bmatrix}x\\y\\0\end{bmatrix}}=P{\begin{bmatrix}x\\y\\z\end{bmatrix}}.}$

Observing that ${\displaystyle P^{\mathrm {T} }=P}$ shows that the projection is an orthogonal projection.

### Oblique projection

A simple example of a non-orthogonal (oblique) projection is

${\displaystyle P={\begin{bmatrix}0&0\\\alpha &1\end{bmatrix}}.}$

Via matrix multiplication, one sees that

${\displaystyle P^{2}={\begin{bmatrix}0&0\\\alpha &1\end{bmatrix}}{\begin{bmatrix}0&0\\\alpha &1\end{bmatrix}}={\begin{bmatrix}0&0\\\alpha &1\end{bmatrix}}=P.}$

showing that ${\displaystyle P}$ is indeed a projection.

The projection ${\displaystyle P}$ is orthogonal if and only if ${\displaystyle \alpha =0}$ because only then ${\displaystyle P^{\mathrm {T} }=P.}$

## Properties and classification

### Idempotence

By definition, a projection ${\displaystyle P}$ is idempotent (i.e. ${\displaystyle P^{2}=P}$).

### Open map

Every projection is an open map, meaning that it maps each open set in the domain to an open set in the subspace topology of the image.[ citation needed ] That is, for any vector ${\displaystyle \mathbf {x} }$ and any ball ${\displaystyle B_{\mathbf {x} }}$ (with positive radius) centered on ${\displaystyle \mathbf {x} }$, there exists a ball ${\displaystyle B_{P\mathbf {x} }}$ (with positive radius) centered on ${\displaystyle P\mathbf {x} }$ that is wholly contained in the image ${\displaystyle P(B_{\mathbf {x} })}$.

### Complementarity of image and kernel

Let ${\displaystyle W}$ be a finite-dimensional vector space and ${\displaystyle P}$ be a projection on ${\displaystyle W}$. Suppose the subspaces ${\displaystyle U}$ and ${\displaystyle V}$ are the image and kernel of ${\displaystyle P}$ respectively. Then ${\displaystyle P}$ has the following properties:

1. ${\displaystyle P}$ is the identity operator ${\displaystyle I}$ on ${\displaystyle U}$:
${\displaystyle \forall \mathbf {x} \in U:P\mathbf {x} =\mathbf {x} .}$
2. We have a direct sum ${\displaystyle W=U\oplus V}$. Every vector ${\displaystyle \mathbf {x} \in W}$ may be decomposed uniquely as ${\displaystyle \mathbf {x} =\mathbf {u} +\mathbf {v} }$ with ${\displaystyle \mathbf {u} =P\mathbf {x} }$ and ${\displaystyle \mathbf {v} =\mathbf {x} -P\mathbf {x} =\left(I-P\right)\mathbf {x} }$, and where ${\displaystyle \mathbf {u} \in U,\mathbf {v} \in V.}$

The image and kernel of a projection are complementary, as are ${\displaystyle P}$ and ${\displaystyle Q=I-P}$. The operator ${\displaystyle Q}$ is also a projection as the image and kernel of ${\displaystyle P}$ become the kernel and image of ${\displaystyle Q}$ and vice versa. We say ${\displaystyle P}$ is a projection along ${\displaystyle V}$ onto ${\displaystyle U}$ (kernel/image) and ${\displaystyle Q}$ is a projection along ${\displaystyle U}$ onto ${\displaystyle V}$.

### Spectrum

In infinite-dimensional vector spaces, the spectrum of a projection is contained in ${\displaystyle \{0,1\}}$ as

${\displaystyle (\lambda I-P)^{-1}={\frac {1}{\lambda }}I+{\frac {1}{\lambda (\lambda -1)}}P.}$

Only 0 or 1 can be an eigenvalue of a projection. This implies that an orthogonal projection ${\displaystyle P}$ is always a positive semi-definite matrix. In general, the corresponding eigenspaces are (respectively) the kernel and range of the projection. Decomposition of a vector space into direct sums is not unique. Therefore, given a subspace ${\displaystyle V}$, there may be many projections whose range (or kernel) is ${\displaystyle V}$.

If a projection is nontrivial it has minimal polynomial ${\displaystyle x^{2}-x=x(x-1)}$, which factors into distinct linear factors, and thus ${\displaystyle P}$ is diagonalizable.

### Product of projections

The product of projections is not in general a projection, even if they are orthogonal. If two projections commute then their product is a projection, but the converse is false: the product of two non-commuting projections may be a projection.

If two orthogonal projections commute then their product is an orthogonal projection. If the product of two orthogonal projections is an orthogonal projection, then the two orthogonal projections commute (more generally: two self-adjoint endomorphisms commute if and only if their product is self-adjoint).

### Orthogonal projections

When the vector space ${\displaystyle W}$ has an inner product and is complete (is a Hilbert space) the concept of orthogonality can be used. An orthogonal projection is a projection for which the range ${\displaystyle U}$ and the null space ${\displaystyle V}$ are orthogonal subspaces. Thus, for every ${\displaystyle \mathbf {x} }$ and ${\displaystyle \mathbf {y} }$ in ${\displaystyle W}$, ${\displaystyle \langle P\mathbf {x} ,(\mathbf {y} -P\mathbf {y} )\rangle =\langle (\mathbf {x} -P\mathbf {x} ),P\mathbf {y} \rangle =0}$. Equivalently:

${\displaystyle \langle \mathbf {x} ,P\mathbf {y} \rangle =\langle P\mathbf {x} ,P\mathbf {y} \rangle =\langle P\mathbf {x} ,\mathbf {y} \rangle .}$

A projection is orthogonal if and only if it is self-adjoint. Using the self-adjoint and idempotent properties of ${\displaystyle P}$, for any ${\displaystyle \mathbf {x} }$ and ${\displaystyle \mathbf {y} }$ in ${\displaystyle W}$ we have ${\displaystyle P\mathbf {x} \in U}$, ${\displaystyle \mathbf {y} -P\mathbf {y} \in V}$, and

${\displaystyle \langle P\mathbf {x} ,\mathbf {y} -P\mathbf {y} \rangle =\langle P^{2}\mathbf {x} ,\mathbf {y} -P\mathbf {y} \rangle =\langle P\mathbf {x} ,P\left(I-P\right)\mathbf {y} \rangle =\langle P\mathbf {x} ,\left(P-P^{2}\right)\mathbf {y} \rangle =0}$

where ${\displaystyle \langle \cdot ,\cdot \rangle }$ is the inner product associated with ${\displaystyle W}$. Therefore, ${\displaystyle P}$ and ${\displaystyle I-P}$ are orthogonal projections. [3] The other direction, namely that if ${\displaystyle P}$ is orthogonal then it is self-adjoint, follows from

${\displaystyle \langle \mathbf {x} ,P\mathbf {y} \rangle =\langle P\mathbf {x} ,\mathbf {y} \rangle =\langle \mathbf {x} ,P^{*}\mathbf {y} \rangle }$

for every ${\displaystyle x}$ and ${\displaystyle y}$ in ${\displaystyle W}$; thus ${\displaystyle P=P^{*}}$.

Proof of existence

Let ${\displaystyle H}$ be a complete metric space with an inner product, and let ${\displaystyle U}$ be a closed linear subspace of ${\displaystyle H}$ (and hence complete as well).

For every ${\displaystyle \mathbf {x} }$ the following set of non-negative norm-values ${\displaystyle \{\|\mathbf {x} -\mathbf {u} \|:\mathbf {u} \in U\}}$ has an infimum, and due to the completeness of ${\displaystyle U}$ it is a minimum. We define ${\displaystyle P\mathbf {x} }$ as the point in ${\displaystyle U}$ where this minimum is obtained.

Obviously ${\displaystyle P\mathbf {x} }$ is in ${\displaystyle U}$. It remains to show that ${\displaystyle P\mathbf {x} }$ satisfies ${\displaystyle \langle \mathbf {x} -P\mathbf {x} ,P\mathbf {x} \rangle =0}$ and that it is linear.

Let us define ${\displaystyle \mathbf {a} =\mathbf {x} -P\mathbf {x} }$. For every non-zero ${\displaystyle \mathbf {v} }$ in ${\displaystyle U}$, the following holds:

${\displaystyle \left\|\mathbf {a} -{\frac {\langle \mathbf {a} ,\mathbf {v} \rangle }{\|\mathbf {v} \|^{2}}}\mathbf {v} \right\|^{2}=\|\mathbf {a} \|^{2}-{\frac {{\langle \mathbf {a} ,\mathbf {v} \rangle }^{2}}{\|\mathbf {v} \|^{2}}}}$

By defining ${\displaystyle \mathbf {w} =P\mathbf {x} +{\frac {\langle \mathbf {a} ,\mathbf {v} \rangle }{\|\mathbf {v} \|^{2}}}\mathbf {v} }$ we see that ${\displaystyle \|\mathbf {x} -\mathbf {w} \|<\|\mathbf {x} -P\mathbf {x} \|}$ unless ${\displaystyle \langle \mathbf {a} ,\mathbf {v} \rangle }$ vanishes. Since ${\displaystyle P\mathbf {x} }$ was chosen as the minimum of the aforementioned set, it follows that ${\displaystyle \langle \mathbf {a} ,\mathbf {v} \rangle }$ indeed vanishes. In particular, (for ${\displaystyle \mathbf {y} =P\mathbf {x} }$): ${\displaystyle \langle \mathbf {x} -P\mathbf {x} ,P\mathbf {x} \rangle =0}$.

Linearity follows from the vanishing of ${\displaystyle \langle \mathbf {x} -P\mathbf {x} ,\mathbf {v} \rangle }$ for every ${\displaystyle \mathbf {v} \in U}$:

${\displaystyle \langle \left(\mathbf {x} +\mathbf {y} \right)-P\left(\mathbf {x} +\mathbf {y} \right),\mathbf {v} \rangle =0}$
${\displaystyle \langle \left(\mathbf {x} -P\mathbf {x} \right)+\left(\mathbf {y} -P\mathbf {y} \right),\mathbf {v} \rangle =0}$

By taking the difference between the equations we have

${\displaystyle \langle P\mathbf {x} +P\mathbf {y} -P\left(\mathbf {x} +\mathbf {y} \right),\mathbf {v} \rangle =0}$

But since we may choose ${\displaystyle \mathbf {v} =P\mathbf {x} +P\mathbf {y} -P(\mathbf {x} +\mathbf {y} )}$ (as it is itself in ${\displaystyle U}$) it follows that ${\displaystyle P\mathbf {x} +P\mathbf {y} =P(\mathbf {x} +\mathbf {y} )}$. Similarly we have ${\displaystyle \lambda P\mathbf {x} =P(\lambda \mathbf {x} )}$ for every scalar ${\displaystyle \lambda }$.

#### Properties and special cases

An orthogonal projection is a bounded operator. This is because for every ${\displaystyle \mathbf {v} }$ in the vector space we have, by the Cauchy–Schwarz inequality:

${\displaystyle \left\|P\mathbf {v} \right\|^{2}=\langle P\mathbf {v} ,P\mathbf {v} \rangle =\langle P\mathbf {v} ,\mathbf {v} \rangle \leq \left\|P\mathbf {v} \right\|\cdot \left\|\mathbf {v} \right\|}$

Thus ${\displaystyle \left\|P\mathbf {v} \right\|\leq \left\|\mathbf {v} \right\|}$.

For finite-dimensional complex or real vector spaces, the standard inner product can be substituted for ${\displaystyle \langle \cdot ,\cdot \rangle }$.

##### Formulas

A simple case occurs when the orthogonal projection is onto a line. If ${\displaystyle \mathbf {u} }$ is a unit vector on the line, then the projection is given by the outer product

${\displaystyle P_{\mathbf {u} }=\mathbf {u} \mathbf {u} ^{\mathsf {T}}.}$

(If ${\displaystyle \mathbf {u} }$ is complex-valued, the transpose in the above equation is replaced by a Hermitian transpose). This operator leaves u invariant, and it annihilates all vectors orthogonal to ${\displaystyle \mathbf {u} }$, proving that it is indeed the orthogonal projection onto the line containing u. [4] A simple way to see this is to consider an arbitrary vector ${\displaystyle \mathbf {x} }$ as the sum of a component on the line (i.e. the projected vector we seek) and another perpendicular to it, ${\displaystyle \mathbf {x} =\mathbf {x} _{\parallel }+\mathbf {x} _{\perp }}$. Applying projection, we get

${\displaystyle P_{\mathbf {u} }\mathbf {x} =\mathbf {u} \mathbf {u} ^{\mathsf {T}}\mathbf {x} _{\parallel }+\mathbf {u} \mathbf {u} ^{\mathsf {T}}\mathbf {x} _{\perp }=\mathbf {u} \left(\operatorname {sgn} \left(\mathbf {u} ^{\mathsf {T}}\mathbf {x} _{\parallel }\right)\left\|\mathbf {x} _{\parallel }\right\|\right)+\mathbf {u} \cdot \mathbf {0} =\mathbf {x} _{\parallel }}$

by the properties of the dot product of parallel and perpendicular vectors.

This formula can be generalized to orthogonal projections on a subspace of arbitrary dimension. Let ${\displaystyle \mathbf {u} _{1},\ldots ,\mathbf {u} _{k}}$ be an orthonormal basis of the subspace ${\displaystyle U}$, and let ${\displaystyle A}$ denote the ${\displaystyle n\times k}$ matrix whose columns are ${\displaystyle \mathbf {u} _{1},\ldots ,\mathbf {u} _{k}}$, i.e., ${\displaystyle A={\begin{bmatrix}\mathbf {u} _{1}&\cdots &\mathbf {u} _{k}\end{bmatrix}}}$. Then the projection is given by: [5]

${\displaystyle P_{A}=AA^{\mathsf {T}}}$

which can be rewritten as

${\displaystyle P_{A}=\sum _{i}\langle \mathbf {u} _{i},\cdot \rangle \mathbf {u} _{i}.}$

The matrix ${\displaystyle A^{\mathsf {T}}}$ is the partial isometry that vanishes on the orthogonal complement of ${\displaystyle U}$ and ${\displaystyle A}$ is the isometry that embeds ${\displaystyle U}$ into the underlying vector space. The range of ${\displaystyle P_{A}}$ is therefore the final space of ${\displaystyle A}$. It is also clear that ${\displaystyle AA^{\mathsf {T}}}$ is the identity operator on ${\displaystyle U}$.

The orthonormality condition can also be dropped. If ${\displaystyle \mathbf {u} _{1},\ldots ,\mathbf {u} _{k}}$ is a (not necessarily orthonormal) basis, and ${\displaystyle A}$ is the matrix with these vectors as columns, then the projection is: [6] [7]

${\displaystyle P_{A}=A\left(A^{\mathsf {T}}A\right)^{-1}A^{\mathsf {T}}.}$

The matrix ${\displaystyle A}$ still embeds ${\displaystyle U}$ into the underlying vector space but is no longer an isometry in general. The matrix ${\displaystyle \left(A^{\mathsf {T}}A\right)^{-1}}$ is a "normalizing factor" that recovers the norm. For example, the rank-1 operator ${\displaystyle \mathbf {u} \mathbf {u} ^{\mathsf {T}}}$ is not a projection if ${\displaystyle \left\|\mathbf {u} \right\|\neq 1.}$ After dividing by ${\displaystyle \mathbf {u} ^{\mathsf {T}}\mathbf {u} =\left\|\mathbf {u} \right\|^{2},}$ we obtain the projection ${\displaystyle \mathbf {u} \left(\mathbf {u} ^{\mathsf {T}}\mathbf {u} \right)^{-1}\mathbf {u} ^{\mathsf {T}}}$ onto the subspace spanned by ${\displaystyle u}$.

In the general case, we can have an arbitrary positive definite matrix ${\displaystyle D}$ defining an inner product ${\displaystyle \langle x,y\rangle _{D}=y^{\dagger }Dx}$, and the projection ${\displaystyle P_{A}}$ is given by ${\textstyle P_{A}x=\operatorname {argmin} _{y\in \operatorname {range} (A)}\left\|x-y\right\|_{D}^{2}}$. Then

${\displaystyle P_{A}=A\left(A^{\mathsf {T}}DA\right)^{-1}A^{\mathsf {T}}D.}$

When the range space of the projection is generated by a frame (i.e. the number of generators is greater than its dimension), the formula for the projection takes the form: ${\displaystyle P_{A}=AA^{+}}$. Here ${\displaystyle A^{+}}$ stands for the Moore–Penrose pseudoinverse. This is just one of many ways to construct the projection operator.

If ${\displaystyle {\begin{bmatrix}A&B\end{bmatrix}}}$ is a non-singular matrix and ${\displaystyle A^{\mathsf {T}}B=0}$ (i.e., ${\displaystyle B}$ is the null space matrix of ${\displaystyle A}$), [8] the following holds:

{\displaystyle {\begin{aligned}I&={\begin{bmatrix}A&B\end{bmatrix}}{\begin{bmatrix}A&B\end{bmatrix}}^{-1}{\begin{bmatrix}A^{\mathsf {T}}\\B^{\mathsf {T}}\end{bmatrix}}^{-1}{\begin{bmatrix}A^{\mathsf {T}}\\B^{\mathsf {T}}\end{bmatrix}}\\&={\begin{bmatrix}A&B\end{bmatrix}}\left({\begin{bmatrix}A^{\mathsf {T}}\\B^{\mathsf {T}}\end{bmatrix}}{\begin{bmatrix}A&B\end{bmatrix}}\right)^{-1}{\begin{bmatrix}A^{\mathsf {T}}\\B^{\mathsf {T}}\end{bmatrix}}\\&={\begin{bmatrix}A&B\end{bmatrix}}{\begin{bmatrix}A^{\mathsf {T}}A&O\\O&B^{\mathsf {T}}B\end{bmatrix}}^{-1}{\begin{bmatrix}A^{\mathsf {T}}\\B^{\mathsf {T}}\end{bmatrix}}\\[4pt]&=A\left(A^{\mathsf {T}}A\right)^{-1}A^{\mathsf {T}}+B\left(B^{\mathsf {T}}B\right)^{-1}B^{\mathsf {T}}\end{aligned}}}

If the orthogonal condition is enhanced to ${\displaystyle A^{\mathsf {T}}WB=A^{\mathsf {T}}W^{\mathsf {T}}B=0}$ with ${\displaystyle W}$ non-singular, the following holds:

${\displaystyle I={\begin{bmatrix}A&B\end{bmatrix}}{\begin{bmatrix}\left(A^{\mathsf {T}}WA\right)^{-1}A^{\mathsf {T}}\\\left(B^{\mathsf {T}}WB\right)^{-1}B^{\mathsf {T}}\end{bmatrix}}W.}$

All these formulas also hold for complex inner product spaces, provided that the conjugate transpose is used instead of the transpose. Further details on sums of projectors can be found in Banerjee and Roy (2014). [9] Also see Banerjee (2004) [10] for application of sums of projectors in basic spherical trigonometry.

### Oblique projections

The term oblique projections is sometimes used to refer to non-orthogonal projections. These projections are also used to represent spatial figures in two-dimensional drawings (see oblique projection), though not as frequently as orthogonal projections. Whereas calculating the fitted value of an ordinary least squares regression requires an orthogonal projection, calculating the fitted value of an instrumental variables regression requires an oblique projection.

Projections are defined by their null space and the basis vectors used to characterize their range (which is the complement of the null space). When these basis vectors are orthogonal to the null space, then the projection is an orthogonal projection. When these basis vectors are not orthogonal to the null space, the projection is an oblique projection. Let the vectors ${\displaystyle \mathbf {u} _{1},\ldots ,\mathbf {u} _{k}}$ form a basis for the range of the projection, and assemble these vectors in the ${\displaystyle n\times k}$ matrix ${\displaystyle A}$. The range and the null space are complementary spaces, so the null space has dimension ${\displaystyle n-k}$. It follows that the orthogonal complement of the null space has dimension ${\displaystyle k}$. Let ${\displaystyle \mathbf {v} _{1},\ldots ,\mathbf {v} _{k}}$ form a basis for the orthogonal complement of the null space of the projection, and assemble these vectors in the matrix ${\displaystyle B}$. Then the projection is defined by

${\displaystyle P=A\left(B^{\mathsf {T}}A\right)^{-1}B^{\mathsf {T}}.}$

This expression generalizes the formula for orthogonal projections given above. [11] [12]

#### Singular Values

Note that ${\displaystyle I-P}$ is also a oblique projection. The singular values of ${\displaystyle P}$ and ${\displaystyle I-P}$ can be computed by an orthonormal basis of ${\displaystyle A}$. Let ${\displaystyle Q_{A}}$ be an orthonormal basis of ${\displaystyle A}$ and let ${\displaystyle Q_{A}^{\perp }}$ be the orthogonal complement of ${\displaystyle Q_{A}}$. Denote the singular values of the matrix ${\displaystyle Q_{A}^{T}A(B^{T}A)^{-1}B^{T}Q_{A}^{\perp }}$ by the positive values ${\displaystyle \gamma _{1}\geq \gamma _{2}\geq \ldots \geq \gamma _{k}}$. With this, the singular values for ${\displaystyle P}$ are: [13]

${\displaystyle \sigma _{i}={\begin{cases}{\sqrt {1+\gamma _{i}^{2}}}&1\leq i\leq k\\0&{\text{otherwise}}\end{cases}}}$

and the singular values for ${\displaystyle I-P}$ are

${\displaystyle \sigma _{i}={\begin{cases}{\sqrt {1+\gamma _{i}^{2}}}&1\leq i\leq k\\1&k+1\leq i\leq n-k\\0&{\text{otherwise}}\end{cases}}}$

This implies that the largest singular values of ${\displaystyle P}$ and ${\displaystyle I-P}$ are equal, and thus that the matrix norm of the oblique projections are the same. However, the condition number satisfies the relation ${\displaystyle \kappa (I-P)={\frac {\sigma _{1}}{1}}\geq {\frac {\sigma _{1}}{\sigma _{k}}}=\kappa (P)}$, and is therefore not necessarily equal.

### Finding projection with an inner product

Let ${\displaystyle V}$ be a vector space (in this case a plane) spanned by orthogonal vectors ${\displaystyle \mathbf {u} _{1},\mathbf {u} _{2},\dots ,\mathbf {u} _{p}}$. Let ${\displaystyle y}$ be a vector. One can define a projection of ${\displaystyle \mathbf {y} }$ onto ${\displaystyle V}$ as

${\displaystyle \operatorname {proj} _{V}\mathbf {y} ={\frac {\mathbf {y} \cdot \mathbf {u} ^{i}}{\mathbf {u} ^{i}\cdot \mathbf {u} ^{i}}}\mathbf {u} ^{i}}$

where repeated indices are summed over (Einstein sum notation). The vector ${\displaystyle \mathbf {y} }$ can be written as an orthogonal sum such that ${\displaystyle \mathbf {y} =\operatorname {proj} _{V}\mathbf {y} +\mathbf {z} }$. ${\displaystyle \operatorname {proj} _{V}\mathbf {y} }$ is sometimes denoted as ${\displaystyle {\hat {\mathbf {y} }}}$. There is a theorem in linear algebra that states that this ${\displaystyle \mathbf {z} }$ is the smallest distance (the orthogonal distance ) from ${\displaystyle \mathbf {y} }$ to ${\displaystyle V}$ and is commonly used in areas such as machine learning.

## Canonical forms

Any projection ${\displaystyle P=P^{2}}$ on a vector space of dimension ${\displaystyle d}$ over a field is a diagonalizable matrix, since its minimal polynomial divides ${\displaystyle x^{2}-x}$, which splits into distinct linear factors. Thus there exists a basis in which ${\displaystyle P}$ has the form

${\displaystyle P=I_{r}\oplus 0_{d-r}}$

where ${\displaystyle r}$ is the rank of ${\displaystyle P}$. Here ${\displaystyle I_{r}}$ is the identity matrix of size ${\displaystyle r}$, and ${\displaystyle 0_{d-r}}$ is the zero matrix of size ${\displaystyle d-r}$. If the vector space is complex and equipped with an inner product, then there is an orthonormal basis in which the matrix of P is [14]

${\displaystyle P={\begin{bmatrix}1&\sigma _{1}\\0&0\end{bmatrix}}\oplus \cdots \oplus {\begin{bmatrix}1&\sigma _{k}\\0&0\end{bmatrix}}\oplus I_{m}\oplus 0_{s}.}$

where ${\displaystyle \sigma _{1}\geq \sigma _{2}\geq \dots \geq \sigma _{k}>0}$. The integers ${\displaystyle k,s,m}$ and the real numbers ${\displaystyle \sigma _{i}}$ are uniquely determined. Note that ${\displaystyle 2k+s+m=d}$. The factor ${\displaystyle I_{m}\oplus 0_{s}}$ corresponds to the maximal invariant subspace on which ${\displaystyle P}$ acts as an orthogonal projection (so that P itself is orthogonal if and only if ${\displaystyle k=0}$) and the ${\displaystyle \sigma _{i}}$-blocks correspond to the oblique components.

## Projections on normed vector spaces

When the underlying vector space ${\displaystyle X}$ is a (not necessarily finite-dimensional) normed vector space, analytic questions, irrelevant in the finite-dimensional case, need to be considered. Assume now ${\displaystyle X}$ is a Banach space.

Many of the algebraic results discussed above survive the passage to this context. A given direct sum decomposition of ${\displaystyle X}$ into complementary subspaces still specifies a projection, and vice versa. If ${\displaystyle X}$ is the direct sum ${\displaystyle X=U\oplus V}$, then the operator defined by ${\displaystyle P(u+v)=u}$ is still a projection with range ${\displaystyle U}$ and kernel ${\displaystyle V}$. It is also clear that ${\displaystyle P^{2}=P}$. Conversely, if ${\displaystyle P}$ is projection on ${\displaystyle X}$, i.e. ${\displaystyle P^{2}=P}$, then it is easily verified that ${\displaystyle (1-P)^{2}=(1-P)}$. In other words, ${\displaystyle 1-P}$ is also a projection. The relation ${\displaystyle P^{2}=P}$ implies ${\displaystyle 1=P+(1-P)}$ and ${\displaystyle X}$ is the direct sum ${\displaystyle \operatorname {rg} (P)\oplus \operatorname {rg} (1-P)}$.

However, in contrast to the finite-dimensional case, projections need not be continuous in general. If a subspace ${\displaystyle U}$ of ${\displaystyle X}$ is not closed in the norm topology, then the projection onto ${\displaystyle U}$ is not continuous. In other words, the range of a continuous projection ${\displaystyle P}$ must be a closed subspace. Furthermore, the kernel of a continuous projection (in fact, a continuous linear operator in general) is closed. Thus a continuous projection ${\displaystyle P}$ gives a decomposition of ${\displaystyle X}$ into two complementary closed subspaces: ${\displaystyle X=\operatorname {rg} (P)\oplus \ker(P)=\ker(1-P)\oplus \ker(P)}$.

The converse holds also, with an additional assumption. Suppose ${\displaystyle U}$ is a closed subspace of ${\displaystyle X}$. If there exists a closed subspace ${\displaystyle V}$ such that X = UV, then the projection ${\displaystyle P}$ with range ${\displaystyle U}$ and kernel ${\displaystyle V}$ is continuous. This follows from the closed graph theorem. Suppose xnx and Pxny. One needs to show that ${\displaystyle Px=y}$. Since ${\displaystyle U}$ is closed and {Pxn} ⊂ U, y lies in ${\displaystyle U}$, i.e. Py = y. Also, xnPxn = (IP)xnxy. Because ${\displaystyle V}$ is closed and {(IP)xn} ⊂ V, we have ${\displaystyle x-y\in V}$, i.e. ${\displaystyle P(x-y)=Px-Py=Px-y=0}$, which proves the claim.

The above argument makes use of the assumption that both ${\displaystyle U}$ and ${\displaystyle V}$ are closed. In general, given a closed subspace ${\displaystyle U}$, there need not exist a complementary closed subspace ${\displaystyle V}$, although for Hilbert spaces this can always be done by taking the orthogonal complement. For Banach spaces, a one-dimensional subspace always has a closed complementary subspace. This is an immediate consequence of Hahn–Banach theorem. Let ${\displaystyle U}$ be the linear span of ${\displaystyle u}$. By Hahn–Banach, there exists a bounded linear functional ${\displaystyle \varphi }$ such that φ(u) = 1. The operator ${\displaystyle P(x)=\varphi (x)u}$ satisfies ${\displaystyle P^{2}=P}$, i.e. it is a projection. Boundedness of ${\displaystyle \varphi }$ implies continuity of ${\displaystyle P}$ and therefore ${\displaystyle \ker(P)=\operatorname {rg} (I-P)}$ is a closed complementary subspace of ${\displaystyle U}$.

## Applications and further considerations

Projections (orthogonal and otherwise) play a major role in algorithms for certain linear algebra problems:

As stated above, projections are a special case of idempotents. Analytically, orthogonal projections are non-commutative generalizations of characteristic functions. Idempotents are used in classifying, for instance, semisimple algebras, while measure theory begins with considering characteristic functions of measurable sets. Therefore, as one can imagine, projections are very often encountered in the context of operator algebras. In particular, a von Neumann algebra is generated by its complete lattice of projections.

## Generalizations

More generally, given a map between normed vector spaces ${\displaystyle T\colon V\to W,}$ one can analogously ask for this map to be an isometry on the orthogonal complement of the kernel: that ${\displaystyle (\ker T)^{\perp }\to W}$ be an isometry (compare Partial isometry); in particular it must be onto. The case of an orthogonal projection is when W is a subspace of V. In Riemannian geometry, this is used in the definition of a Riemannian submersion.

## Notes

1. Meyer, pp 386+387
2. Horn, Roger A.; Johnson, Charles R. (2013). Matrix Analysis, second edition. Cambridge University Press. ISBN   9780521839402.
3. Meyer, p. 433
4. Meyer, p. 431
5. Meyer, equation (5.13.4)
6. Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN   978-1420095388
7. Meyer, equation (5.13.3)
8. Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN   978-1420095388
9. Banerjee, Sudipto (2004), "Revisiting Spherical Trigonometry with Orthogonal Projectors", The College Mathematics Journal, 35 (5): 375–381, doi:10.1080/07468342.2004.11922099, S2CID   122277398
10. Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN   978-1420095388
11. Meyer, equation (7.10.39)
12. Brust, J. J.; Marcia, R. F.; Petra, C. G. (2020), "Computationally Efficient Decompositions of Oblique Projection Matrices", SIAM Journal on Matrix Analysis and Applications, 41 (2): 852–870, doi:10.1137/19M1288115
13. Doković, D. Ž. (August 1991). "Unitary similarity of projectors". Aequationes Mathematicae . 42 (1): 220–224. doi:10.1007/BF01818492. S2CID   122704926.

## Related Research Articles

In mathematics, an inner product space is a real vector space or a complex vector space with an operation called an inner product. The inner product of two vectors in the space is a scalar, often denoted with angle brackets such as in . Inner products allow formal definitions of intuitive geometric notions, such as lengths, angles, and orthogonality of vectors. Inner product spaces generalize Euclidean vector spaces, in which the inner product is the dot product or scalar product of Cartesian coordinates. Inner product spaces of infinite dimension are widely used in functional analysis. Inner product spaces over the field of complex numbers are sometimes referred to as unitary spaces. The first usage of the concept of a vector space with an inner product is due to Giuseppe Peano, in 1898.

Linear algebra is the branch of mathematics concerning linear equations such as:

In mathematics, and more specifically in linear algebra, a linear subspace, also known as a vector subspace is a vector space that is a subset of some larger vector space. A linear subspace is usually simply called a subspace when the context serves to distinguish it from other types of subspaces.

In mathematics, particularly linear algebra and numerical analysis, the Gram–Schmidt process is a method for orthonormalizing a set of vectors in an inner product space, most commonly the Euclidean space Rn equipped with the standard inner product. The Gram–Schmidt process takes a finite, linearly independent set of vectors S = {v1, ..., vk} for kn and generates an orthogonal set S′ = {u1, ..., uk} that spans the same k-dimensional subspace of Rn as S.

In linear algebra, the column space of a matrix A is the span of its column vectors. The column space of a matrix is the image or range of the corresponding matrix transformation.

In mathematics, the dot product or scalar product is an algebraic operation that takes two equal-length sequences of numbers, and returns a single number. In Euclidean geometry, the dot product of the Cartesian coordinates of two vectors is widely used. It is often called the inner product of Euclidean space, even though it is not the only inner product that can be defined on Euclidean space.

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

In mathematics, a Hermitian matrix is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the i-th row and j-th column is equal to the complex conjugate of the element in the j-th row and i-th column, for all indices i and j:

In mathematics, a linear form is a linear map from a vector space to its field of scalars.

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthogonal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

In linear algebra, a square matrix with complex entries is said to be skew-Hermitian or anti-Hermitian if its conjugate transpose is the negative of the original matrix. That is, the matrix is skew-Hermitian if it satisfies the relation

In linear algebra, linear transformations can be represented by matrices. If is a linear transformation mapping to and is a column vector with entries, then

In linear algebra, a rotation matrix is a transformation matrix that is used to perform a rotation in Euclidean space. For example, using the convention below, the matrix

In the mathematical fields of linear algebra and functional analysis, the orthogonal complement of a subspace W of a vector space V equipped with a bilinear form B is the set W of all vectors in V that are orthogonal to every vector in W. Informally, it is called the perp, short for perpendicular complement. It is a subspace of V.

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the linear subspace of the domain of the map which is mapped to the zero vector. That is, given a linear map L : VW between two vector spaces V and W, the kernel of L is the vector space of all elements v of V such that L(v) = 0, where 0 denotes the zero vector in W, or more symbolically:

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

This article derives the main properties of rotations in 3-dimensional space.

In statistics and signal processing, the orthogonality principle is a necessary and sufficient condition for the optimality of a Bayesian estimator. Loosely stated, the orthogonality principle says that the error vector of the optimal estimator is orthogonal to any possible estimator. The orthogonality principle is most commonly stated for linear estimators, but more general formulations are possible. Since the principle is a necessary and sufficient condition for optimality, it can be used to find the minimum mean square error estimator.

In mathematics, Hilbert spaces allow generalizing the methods of linear algebra and calculus from (finite-dimensional) Euclidean vector spaces to spaces that may be infinite-dimensional. A Hilbert space is a vector space equipped with an inner product which defines a distance function for which it is a complete metric space. Hilbert spaces arise naturally and frequently in mathematics and physics, typically as function spaces.

In mathematics, orthogonality is the generalization of the geometric notion of perpendicularity to the linear algebra of bilinear forms.

## References

• Banerjee, Sudipto; Roy, Anindya (2014), Linear Algebra and Matrix Analysis for Statistics, Texts in Statistical Science (1st ed.), Chapman and Hall/CRC, ISBN   978-1420095388
• Dunford, N.; Schwartz, J. T. (1958). Linear Operators, Part I: General Theory. Interscience.
• Meyer, Carl D. (2000). Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics. ISBN   978-0-89871-454-8.