Principal axis theorem

Last updated October 14, 2024

In geometry and linear algebra, a principal axis is a certain line in a Euclidean space associated with a ellipsoid or hyperboloid, generalizing the major and minor axes of an ellipse or hyperbola. The principal axis theorem states that the principal axes are perpendicular, and gives a constructive procedure for finding them.

Mathematically, the principal axis theorem is a generalization of the method of completing the square from elementary algebra. In linear algebra and functional analysis, the principal axis theorem is a geometrical counterpart of the spectral theorem. It has applications to the statistics of principal components analysis and the singular value decomposition. In physics, the theorem is fundamental to the studies of angular momentum and birefringence.

Motivation

The equations in the Cartesian plane R²:

{\begin{aligned}{\frac {x^{2}}{9}}+{\frac {y^{2}}{25}}&=1\\[3pt]{\frac {x^{2}}{9}}-{\frac {y^{2}}{25}}&=1\end{aligned}}

define, respectively, an ellipse and a hyperbola. In each case, the x and y axes are the principal axes. This is easily seen, given that there are no cross-terms involving products xy in either expression. However, the situation is more complicated for equations like

5x^{2}+8xy+5y^{2}=1.

Here some method is required to determine whether this is an ellipse or a hyperbola. The basic observation is that if, by completing the square, the quadratic expression can be reduced to a sum of two squares then the equation defines an ellipse, whereas if it reduces to a difference of two squares then the equation represents a hyperbola:

{\begin{aligned}u(x,y)^{2}+v(x,y)^{2}&=1\qquad {\text{(ellipse)}}\\u(x,y)^{2}-v(x,y)^{2}&=1\qquad {\text{(hyperbola)}}.\end{aligned}}

Thus, in our example expression, the problem is how to absorb the coefficient of the cross-term 8xy into the functions u and v. Formally, this problem is similar to the problem of matrix diagonalization, where one tries to find a suitable coordinate system in which the matrix of a linear transformation is diagonal. The first step is to find a matrix in which the technique of diagonalization can be applied.

The trick is to write the quadratic form as

5x^{2}+8xy+5y^{2}={\begin{bmatrix}x&y\end{bmatrix}}{\begin{bmatrix}5&4\\4&5\end{bmatrix}}{\begin{bmatrix}x\\y\end{bmatrix}}=\mathbf {x} ^{\textsf {T}}A\mathbf {x}

where the cross-term has been split into two equal parts. The matrix A in the above decomposition is a symmetric matrix. In particular, by the spectral theorem, it has real eigenvalues and is diagonalizable by an orthogonal matrix (orthogonally diagonalizable).

To orthogonally diagonalize A, one must first find its eigenvalues, and then find an orthonormal eigenbasis. Calculation reveals that the eigenvalues of A are

\lambda _{1}=1,\quad \lambda _{2}=9

with corresponding eigenvectors

\mathbf {v} _{1}={\begin{bmatrix}1\\-1\end{bmatrix}},\quad \mathbf {v} _{2}={\begin{bmatrix}1\\1\end{bmatrix}}.

Dividing these by their respective lengths yields an orthonormal eigenbasis:

\mathbf {u} _{1}={\begin{bmatrix}1/{\sqrt {2}}\\-1/{\sqrt {2}}\end{bmatrix}},\quad \mathbf {u} _{2}={\begin{bmatrix}1/{\sqrt {2}}\\1/{\sqrt {2}}\end{bmatrix}}.

Now the matrix S = [u₁u₂] is an orthogonal matrix, since it has orthonormal columns, and A is diagonalized by:

A=SDS^{-1}=SDS^{\textsf {T}}={\begin{bmatrix}1/{\sqrt {2}}&1/{\sqrt {2}}\\-1/{\sqrt {2}}&1/{\sqrt {2}}\end{bmatrix}}{\begin{bmatrix}1&0\\0&9\end{bmatrix}}{\begin{bmatrix}1/{\sqrt {2}}&-1/{\sqrt {2}}\\1/{\sqrt {2}}&1/{\sqrt {2}}\end{bmatrix}}.

This applies to the present problem of "diagonalizing" the quadratic form through the observation that

5x^{2}+8xy+5y^{2}=\mathbf {x} ^{\textsf {T}}A\mathbf {x} =\mathbf {x} ^{\textsf {T}}\left(SDS^{\textsf {T}}\right)\mathbf {x} =\left(S^{\textsf {T}}\mathbf {x} \right)^{\textsf {T}}D\left(S^{\textsf {T}}\mathbf {x} \right)=1\left({\frac {x-y}{\sqrt {2}}}\right)^{2}+9\left({\frac {x+y}{\sqrt {2}}}\right)^{2}.

Thus, the equation $5x^{2}+8xy+5y^{2}=1$ is that of an ellipse, since the left side can be written as the sum of two squares.

It is tempting to simplify this expression by pulling out factors of 2. However, it is important not to do this. The quantities

c_{1}={\frac {x-y}{\sqrt {2}}},\quad c_{2}={\frac {x+y}{\sqrt {2}}}

have a geometrical meaning. They determine an orthonormal coordinate system on R². In other words, they are obtained from the original coordinates by the application of a rotation (and possibly a reflection). Consequently, one may use the c₁ and c₂ coordinates to make statements about length and angles (particularly length), which would otherwise be more difficult in a different choice of coordinates (by rescaling them, for instance). For example, the maximum distance from the origin on the ellipse c₁² + 9c₂² = 1 occurs when c₂ = 0, so at the points c₁ = ±1. Similarly, the minimum distance is where c₂ = ±1/3.

It is possible now to read off the major and minor axes of this ellipse. These are precisely the individual eigenspaces of the matrix A, since these are where c₂ = 0 or c₁ = 0. Symbolically, the principal axes are

E_{1}={\text{span}}\left({\begin{bmatrix}1/{\sqrt {2}}\\-1/{\sqrt {2}}\end{bmatrix}}\right),\quad E_{2}={\text{span}}\left({\begin{bmatrix}1/{\sqrt {2}}\\1/{\sqrt {2}}\end{bmatrix}}\right).

To summarize:

The equation is for an ellipse, since both eigenvalues are positive. (Otherwise, if one were positive and the other negative, it would be a hyperbola.)
The principal axes are the lines spanned by the eigenvectors.
The minimum and maximum distances to the origin can be read off the equation in diagonal form.

Using this information, it is possible to attain a clear geometrical picture of the ellipse: to graph it, for instance.

Formal statement

The principal axis theorem concerns quadratic forms in Rⁿ, which are homogeneous polynomials of degree 2. Any quadratic form may be represented as

Q(\mathbf {x} )=\mathbf {x} ^{\textsf {T}}A\mathbf {x}

where A is a symmetric matrix.

The first part of the theorem is contained in the following statements guaranteed by the spectral theorem:

The eigenvalues of A are real.
A is diagonalizable, and the eigenspaces of A are mutually orthogonal.

In particular, A is orthogonally diagonalizable, since one may take a basis of each eigenspace and apply the Gram-Schmidt process separately within the eigenspace to obtain an orthonormal eigenbasis.

For the second part, suppose that the eigenvalues of A are λ₁, ..., λ_n (possibly repeated according to their algebraic multiplicities) and the corresponding orthonormal eigenbasis is u₁, ..., u_n. Then,

\mathbf {c} =[\mathbf {u} _{1},\ldots ,\mathbf {u} _{n}]^{\textsf {T}}\mathbf {x} ,

and

Q(\mathbf {x} )=\lambda _{1}c_{1}^{2}+\lambda _{2}c_{2}^{2}+\dots +\lambda _{n}c_{n}^{2},

where c_i is the i-th entry of c . Furthermore,

The i-th principal axis is the line determined by equating c_j =0 for all

j=1,\ldots ,i-1,i+1,\ldots ,n

. The i-th principal axis is the span of the vector u_i .

Related Research Articles

In linear algebra, an orthogonal matrix, or orthonormal matrix, is a real square matrix whose columns and rows are orthonormal vectors.

In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally,

<span class="mw-page-title-main">Singular value decomposition</span> Matrix decomposition

In linear algebra, the singular value decomposition (SVD) is a factorization of a real or complex matrix into a rotation, followed by a rescaling followed by another rotation. It generalizes the eigendecomposition of a square normal matrix with an orthonormal eigenbasis to any ⁠ $⁠$ matrix. It is related to the polar decomposition.

<span class="mw-page-title-main">Ellipsoid</span> Quadric surface that looks like a deformed sphere

An ellipsoid is a surface that can be obtained from a sphere by deforming it by means of directional scalings, or more generally, of an affine transformation.

In mathematics, a square matrix is a matrix with the same number of rows and columns. An n-by-n matrix is known as a square matrix of order $.$ Any two square matrices of the same order can be added and multiplied.

In mathematics, a complex square matrix $A$ is normal if it commutes with its conjugate transpose $A *$ :

In mathematics, particularly in linear algebra, a skew-symmetricmatrix is a square matrix whose transpose equals its negative. That is, it satisfies the condition

In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal matrix is $, while an example of a 3\times3 diagonal matrix is . An identity matrix of any size, or any multiple of it is a diagonal matrix called a scalar matrix, for example, . In geometry, a diagonal matrix may be used as a scaling matrix, since matrix multiplication with it results in changing scale (size) and possibly also shape; only a scalar matrix results in uniform change in scale.$

In mathematics, the matrix representation of conic sections permits the tools of linear algebra to be used in the study of conic sections. It provides easy ways to calculate a conic section's axis, vertices, tangents and the pole and polar relationship between points and lines of the plane determined by the conic. The technique does not require putting the equation of a conic section into a standard form, thus making it easier to investigate those conic sections whose axes are not parallel to the coordinate system.

In mathematics, a Hermitian matrix is a complex square matrix that is equal to its own conjugate transpose—that is, the element in the $i$ -th row and $j$ -th column is equal to the complex conjugate of the element in the $j$ -th row and $i$ -th column, for all indices $i$ and $j$ :

In linear algebra, a square matrix $is called diagonalizable or non-defective if it is similar to a diagonal matrix. That is, if there exists an invertible matrix and a diagonal matrix such that . This is equivalent to . This property exists for any linear map: for a finite-dimensional vector space, a linear map is called diagonalizable if there exists an ordered basis of consisting of eigenvectors of . These definitions are equivalent: if has a matrix representation as above, then the column vectors of form a basis consisting of eigenvectors of, and the diagonal entries of are the corresponding eigenvalues of; with respect to this eigenvector basis, is represented by .$

In linear algebra, a QR decomposition, also known as a QR factorization or QU factorization, is a decomposition of a matrix A into a product A = QR of an orthonormal matrix Q and an upper triangular matrix R. QR decomposition is often used to solve the linear least squares (LLS) problem and is the basis for a particular eigenvalue algorithm, the QR algorithm.

In linear algebra, a square matrix with complex entries is said to be skew-Hermitian or anti-Hermitian if its conjugate transpose is the negative of the original matrix. That is, the matrix $is skew-Hermitian if it satisfies the relation$

In linear algebra and functional analysis, a projection is a linear transformation $from a vector space to itself such that . That is, whenever is applied twice to any vector, it gives the same result as if it were applied once. It leaves its image unchanged. This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.$

In mathematics, the matrix exponential is a matrix function on square matrices analogous to the ordinary exponential function. It is used to solve systems of linear differential equations. In the theory of Lie groups, the matrix exponential gives the exponential map between a matrix Lie algebra and the corresponding Lie group.

In geometry, Euler's rotation theorem states that, in three-dimensional space, any displacement of a rigid body such that a point on the rigid body remains fixed, is equivalent to a single rotation about some axis that runs through the fixed point. It also means that the composition of two rotations is also a rotation. Therefore the set of rotations has a group structure, known as a rotation group.

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector, $, of a linear transformation,, is scaled by a constant factor,, when the linear transformation is applied to it: . It is often important to know these vectors in linear algebra. The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

In linear algebra, an idempotent matrix is a matrix which, when multiplied by itself, yields itself. That is, the matrix $is idempotent if and only if . For this product to be defined, must necessarily be a square matrix. Viewed this way, idempotent matrices are idempotent elements of matrix rings.$

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

References

Strang, Gilbert (1994). Introduction to Linear Algebra. Wellesley-Cambridge Press. ISBN 0-9614088-5-5.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Principal axis theorem

Contents

Motivation

Formal statement

See also

Related Research Articles

References