Inverse iteration

Last updated April 20, 2023

In numerical analysis, inverse iteration (also known as the inverse power method) is an iterative eigenvalue algorithm. It allows one to find an approximate eigenvector when an approximation to a corresponding eigenvalue is already known. The method is conceptually similar to the power method. It appears to have originally been developed to compute resonance frequencies in the field of structural mechanics. ^[1]

Theory and convergence
Speed of convergence
Complexity
Implementation options
Calculate inverse matrix or solve system of linear equations
Tridiagonalization, Hessenberg form
Choice of the normalization constant Ck
Usage
Methods to find approximate eigenvalues
Norm of matrix as approximation to the dominant eigenvalue
Estimates based on statistics
See also
References

The inverse power iteration algorithm starts with an approximation $\mu$ for the eigenvalue corresponding to the desired eigenvector and a vector $b_{0}$ , either a randomly selected vector or an approximation to the eigenvector. The method is described by the iteration

b_{k+1}={\frac {(A-\mu I)^{-1}b_{k}}{C_{k}}},

where $C_{k}$ are some constants usually chosen as $C_{k}=\|(A-\mu I)^{-1}b_{k}\|.$ Since eigenvectors are defined up to multiplication by constant, the choice of $C_{k}$ can be arbitrary in theory; practical aspects of the choice of $C_{k}$ are discussed below.

At every iteration, the vector $b_{k}$ is multiplied by the matrix $(A-\mu I)^{-1}$ and normalized. It is exactly the same formula as in the power method, except replacing the matrix $A$ by $(A-\mu I)^{-1}.$ The closer the approximation $\mu$ to the eigenvalue is chosen, the faster the algorithm converges; however, incorrect choice of $\mu$ can lead to slow convergence or to the convergence to an eigenvector other than the one desired. In practice, the method is used when a good approximation for the eigenvalue is known, and hence one needs only few (quite often just one) iterations.

Theory and convergence

The basic idea of the power iteration is choosing an initial vector $b$ (either an eigenvector approximation or a random vector) and iteratively calculating $Ab,A^{2}b,A^{3}b,...$ . Except for a set of zero measure, for any initial vector, the result will converge to an eigenvector corresponding to the dominant eigenvalue.

The inverse iteration does the same for the matrix $(A-\mu I)^{-1}$ , so it converges to the eigenvector corresponding to the dominant eigenvalue of the matrix $(A-\mu I)^{-1}$ . Eigenvalues of this matrix are $(\lambda _{1}-\mu )^{-1},...,(\lambda _{n}-\mu )^{-1},$ where $\lambda _{i}$ are eigenvalues of $A$ . The largest of these numbers corresponds to the smallest of $(\lambda _{1}-\mu ),...,(\lambda _{n}-\mu ).$ The eigenvectors of $A$ and of $(A-\mu I)^{-1}$ are the same, since

Av=\lambda v\Leftrightarrow (A-\mu I)v=\lambda v-\mu v\Leftrightarrow (\lambda -\mu )^{-1}v=(A-\mu I)^{-1}v

Conclusion: The method converges to the eigenvector of the matrix $A$ corresponding to the closest eigenvalue to $\mu .$

In particular, taking $\mu =0$ we see that $(A)^{-1}b_{k}$ converges to the eigenvector corresponding to the eigenvalue of $A^{-1}$ with the largest magnitude ${\frac {1}{\lambda _{N}}}$ and thus can be used to determine the smallest magnitude eigenvalue of $A$ since they are inversely related.

Speed of convergence

Let us analyze the rate of convergence of the method.

The power method is known to converge linearly to the limit, more precisely:

\mathrm {Distance} (b^{\mathrm {ideal} },b_{\mathrm {Power~Method} }^{k})=O\left(\left|{\frac {\lambda _{\mathrm {subdominant} }}{\lambda _{\mathrm {dominant} }}}\right|^{k}\right),

hence for the inverse iteration method similar result sounds as:

\mathrm {Distance} (b^{\mathrm {ideal} },b_{\mathrm {Inverse~iteration} }^{k})=O\left(\left|{\frac {\mu -\lambda _{\mathrm {closest~to~} \mu }}{\mu -\lambda _{\mathrm {second~closest~to~} \mu }}}\right|^{k}\right).

This is a key formula for understanding the method's convergence. It shows that if $\mu$ is chosen close enough to some eigenvalue $\lambda$ , for example $\mu -\lambda =\epsilon$ each iteration will improve the accuracy $|\epsilon |/|\lambda +\epsilon -\lambda _{\mathrm {closest~to~} \lambda }|$ times. (We use that for small enough $\epsilon$ "closest to $\mu$ " and "closest to $\lambda$ " is the same.) For small enough $|\epsilon |$ it is approximately the same as $|\epsilon |/|\lambda -\lambda _{{\text{closest to }}\lambda }|$ . Hence if one is able to find $\mu$ , such that the $\epsilon$ will be small enough, then very few iterations may be satisfactory.

Complexity

The inverse iteration algorithm requires solving a linear system or calculation of the inverse matrix. For non-structured matrices (not sparse, not Toeplitz,...) this requires $O(n^{3})$ operations.

Implementation options

The method is defined by the formula:

b_{k+1}={\frac {(A-\mu I)^{-1}b_{k}}{C_{k}}},

There are, however, multiple options for its implementation.

Calculate inverse matrix or solve system of linear equations

We can rewrite the formula in the following way:

(A-\mu I)b_{k+1}={\frac {b_{k}}{C_{k}}},

emphasizing that to find the next approximation $b_{k+1}$ we may solve a system of linear equations. There are two options: one may choose an algorithm that solves a linear system, or one may calculate the inverse $(A-\mu I)^{-1}$ and then apply it to the vector. Both options have complexity O(n³), the exact number depends on the chosen method.

The choice depends also on the number of iterations. Naively, if at each iteration one solves a linear system, the complexity will be k O(n³), where k is number of iterations; similarly, calculating the inverse matrix and applying it at each iteration is of complexity k O(n³). Note, however, that if the eigenvalue estimate $\mu$ remains constant, then we may reduce the complexity to O(n³) + k O(n²) with either method. Calculating the inverse matrix once, and storing it to apply at each iteration is of complexity O(n³) + k O(n²). Storing an LU decomposition of $(A-\mu I)$ and using forward and back substitution to solve the system of equations at each iteration is also of complexity O(n³) + k O(n²).

Inverting the matrix will typically have a greater initial cost, but lower cost at each iteration. Conversely, solving systems of linear equations will typically have a lesser initial cost, but require more operations for each iteration.

Tridiagonalization, Hessenberg form

If it is necessary to perform many iterations (or few iterations, but for many eigenvectors), then it might be wise to bring the matrix to the upper Hessenberg form first (for symmetric matrix this will be tridiagonal form). Which costs ${\textstyle {\frac {10}{3}}n^{3}+O(n^{2})}$ arithmetic operations using a technique based on Householder reduction), with a finite sequence of orthogonal similarity transforms, somewhat like a two-sided QR decomposition.^[2]^[3] (For QR decomposition, the Householder rotations are multiplied only on the left, but for the Hessenberg case they are multiplied on both left and right.) For symmetric matrices this procedure costs ${\textstyle {\frac {4}{3}}n^{3}+O(n^{2})}$ arithmetic operations using a technique based on Householder reduction.^[2]^[3]

Solution of the system of linear equations for the tridiagonal matrix costs $O(n)$ operations, so the complexity grows like $O(n^{3})+kO(n)$ , where $k$ is the iteration number, which is better than for the direct inversion. However, for few iterations such transformation may not be practical.

Also transformation to the Hessenberg form involves square roots and the division operation, which are not universally supported by hardware.

Choice of the normalization constant $C k$

On general purpose processors (e.g. produced by Intel) the execution time of addition, multiplication and division is approximately equal. But on embedded and/or low energy consuming hardware (digital signal processors, FPGA, ASIC) division may not be supported by hardware, and so should be avoided. Choosing $C_{k}=2^{n_{k}}$ allows fast division without explicit hardware support, as division by a power of 2 may be implemented as either a bit shift (for fixed-point arithmetic) or subtraction of $k$ from the exponent (for floating-point arithmetic).

When implementing the algorithm using fixed-point arithmetic, the choice of the constant $C_{k}$ is especially important. Small values will lead to fast growth of the norm of $b_{k}$ and to overflow; large values of $C_{k}$ will cause the vector $b_{k}$ to tend toward zero.

Usage

The main application of the method is the situation when an approximation to an eigenvalue is found and one needs to find the corresponding approximate eigenvector. In such a situation the inverse iteration is the main and probably the only method to use.

Methods to find approximate eigenvalues

Typically, the method is used in combination with some other method which finds approximate eigenvalues: the standard example is the bisection eigenvalue algorithm, another example is the Rayleigh quotient iteration, which is actually the same inverse iteration with the choice of the approximate eigenvalue as the Rayleigh quotient corresponding to the vector obtained on the previous step of the iteration.

There are some situations where the method can be used by itself, however they are quite marginal.

Norm of matrix as approximation to the dominant eigenvalue

The dominant eigenvalue can be easily estimated for any matrix. For any induced norm it is true that $\left\|A\right\|\geq |\lambda |,$ for any eigenvalue $\lambda$ . So taking the norm of the matrix as an approximate eigenvalue one can see that the method will converge to the dominant eigenvector.

Estimates based on statistics

In some real-time applications one needs to find eigenvectors for matrices with a speed of millions of matrices per second. In such applications, typically the statistics of matrices is known in advance and one can take as an approximate eigenvalue the average eigenvalue for some large matrix sample. Better, one may calculate the mean ratio of the eigenvalues to the trace or the norm of the matrix and estimate the average eigenvalue as the trace or norm multiplied by the average value of that ratio. Clearly such a method can be used only with discretion and only when high precision is not critical. This approach of estimating an average eigenvalue can be combined with other methods to avoid excessively large error.

Related Research Articles

In mathematics, particularly linear algebra and functional analysis, a spectral theorem is a result about when a linear operator or matrix can be diagonalized. This is extremely useful because computations involving a diagonalizable matrix can often be reduced to much simpler computations involving the corresponding diagonal matrix. The concept of diagonalization is relatively straightforward for operators on finite-dimensional vector spaces but requires some modification for operators on infinite-dimensional spaces. In general, the spectral theorem identifies a class of linear operators that can be modeled by multiplication operators, which are as simple as one can hope to find. In more abstract language, the spectral theorem is a statement about commutative C*-algebras. See also spectral theory for a historical perspective.

In linear algebra, an $n$ -by- $n$ square matrix $A$ is called invertible, if there exists an $n$ -by- $n$ square matrix $B$ such that

Rayleigh quotient iteration is an eigenvalue algorithm which extends the idea of the inverse iteration by using the Rayleigh quotient to obtain increasingly accurate eigenvalue estimates.

In numerical analysis, one of the most important problems is designing efficient and stable algorithms for finding the eigenvalues of a matrix. These eigenvalue algorithms may also find eigenvectors.

The spectrum of a linear operator $that operates on a Banach space is a fundamental concept of functional analysis. The spectrum consists of all scalars such that the operator does not have a bounded inverse on . The spectrum has a standard decomposition into three parts:$

In linear algebra, an eigenvector or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted by $, is the factor by which the eigenvector is scaled.$

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the $"most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.$

The Maxwell stress tensor is a symmetric second-order tensor used in classical electromagnetism to represent the interaction between electromagnetic forces and mechanical momentum. In simple situations, such as a point charge moving freely in a homogeneous magnetic field, it is easy to calculate the forces on the charge from the Lorentz force law. When the situation becomes more complicated, this ordinary procedure can become impractically difficult, with equations spanning multiple lines. It is therefore convenient to collect many of these terms in the Maxwell stress tensor, and to use tensor arithmetic to find the answer to the problem at hand.

In numerical linear algebra, the method of successive over-relaxation (SOR) is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.

In numerical linear algebra, the Jacobi eigenvalue algorithm is an iterative method for the calculation of the eigenvalues and eigenvectors of a real symmetric matrix. It is named after Carl Gustav Jacob Jacobi, who first proposed the method in 1846, but only became widely used in the 1950s with the advent of computers.

<span class="mw-page-title-main">Vertex model</span>

A vertex model is a type of statistical mechanics model in which the Boltzmann weights are associated with a vertex in the model. This contrasts with a nearest-neighbour model, such as the Ising model, in which the energy, and thus the Boltzmann weight of a statistical microstate is attributed to the bonds connecting two neighbouring particles. The energy associated with a vertex in the lattice of particles is thus dependent on the state of the bonds which connect it to adjacent vertices. It turns out that every solution of the Yang–Baxter equation with spectral parameters in a tensor product of vector spaces $yields an exactly-solvable vertex model.$

In mathematics, preconditioning is the application of a transformation, called the preconditioner, that conditions a given problem into a form that is more suitable for numerical solving methods. Preconditioning is typically related to reducing a condition number of the problem. The preconditioned problem is then usually solved by an iterative method.

In mathematics, power iteration is an eigenvalue algorithm: given a diagonalizable matrix $, the algorithm will produce a number, which is the greatest eigenvalue of, and a nonzero vector, which is a corresponding eigenvector of, that is, . The algorithm is also known as the Von Mises iteration .$

The goal of modal analysis in structural mechanics is to determine the natural mode shapes and frequencies of an object or structure during free vibration. It is common to use the finite element method (FEM) to perform this analysis because, like other calculations using the FEM, the object being analyzed can have arbitrary shape and the results of the calculations are acceptable. The types of equations which arise from modal analysis are those seen in eigensystems. The physical interpretation of the eigenvalues and eigenvectors which come from solving the system are that they represent the frequencies and corresponding mode shapes. Sometimes, the only desired modes are the lowest frequencies because they can be the most prominent modes at which the object will vibrate, dominating all the higher frequency modes.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, over the generation sequence, individuals with better and better -values are generated.$

In mathematics, an eigenvalue perturbation problem is that of finding the eigenvectors and eigenvalues of a system $that is perturbed from one with known eigenvectors and eigenvalues . This is useful for studying how sensitive the original system's eigenvectors and eigenvalues are to changes in the system. This type of analysis was popularized by Lord Rayleigh, in his investigation of harmonic vibrations of a string perturbed by small inhomogeneities.$

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In the fields of computer vision and image analysis, the Harris affine region detector belongs to the category of feature detection. Feature detection is a preprocessing step of several algorithms that rely on identifying characteristic points or interest points so to make correspondences between images, recognize textures, categorize objects or build panoramas.

In mathematics, a nonlinear eigenproblem, sometimes nonlinear eigenvalue problem, is a generalization of the (ordinary) eigenvalue problem to equations that depend nonlinearly on the eigenvalue. Specifically, it refers to equations of the form

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

References

↑ Ernst Pohlhausen, Berechnung der Eigenschwingungen statisch-bestimmter Fachwerke, ZAMM - Zeitschrift für Angewandte Mathematik und Mechanik 1, 28-42 (1921).
1 2 Demmel, James W. (1997), Applied Numerical Linear Algebra, Philadelphia, PA: Society for Industrial and Applied Mathematics, ISBN 0-89871-389-7, MR 1463942 .
1 2 Lloyd N. Trefethen and David Bau, Numerical Linear Algebra (SIAM, 1997).

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Pohlhausen-1] Ernst Pohlhausen, Berechnung der Eigenschwingungen statisch-bestimmter Fachwerke, ZAMM - Zeitschrift für Angewandte Mathematik und Mechanik 1, 28-42 (1921).

[Demmel-2] 1 2 Demmel, James W. (1997), Applied Numerical Linear Algebra, Philadelphia, PA: Society for Industrial and Applied Mathematics, ISBN 0-89871-389-7, MR 1463942 .

[Trefethen-3] 1 2 Lloyd N. Trefethen and David Bau, Numerical Linear Algebra (SIAM, 1997).

[1]

[2]

[3]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software