Iterative method

Last updated January 11, 2025

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the i-th approximation (called an "iterate") is derived from the previous ones.

A specific implementation with termination criteria for a given iterative method like gradient descent, hill climbing, Newton's method, or quasi-Newton methods like BFGS, is an algorithm of an iterative method or a method of successive approximation. An iterative method is called convergent if the corresponding sequence converges for given initial approximations. A mathematically rigorous convergence analysis of an iterative method is usually performed; however, heuristic-based iterative methods are also common.

In contrast, direct methods attempt to solve the problem by a finite sequence of operations. In the absence of rounding errors, direct methods would deliver an exact solution (for example, solving a linear system of equations $A\mathbf {x} =\mathbf {b}$ by Gaussian elimination). Iterative methods are often the only choice for nonlinear equations. However, iterative methods are often useful even for linear problems involving many variables (sometimes on the order of millions), where direct methods would be prohibitively expensive (and in some cases impossible) even with the best available computing power.^[1]

Attractive fixed points

If an equation can be put into the form f(x) = x, and a solution x is an attractive fixed point of the function f, then one may begin with a point x₁ in the basin of attraction of x, and let x_n+1 = f(x_n) for n ≥ 1, and the sequence {x_n}_n ≥ 1 will converge to the solution x. Here x_n is the nth approximation or iteration of x and x_n+1 is the next or n + 1 iteration of x. Alternately, superscripts in parentheses are often used in numerical methods, so as not to interfere with subscripts with other meanings. (For example, x⁽ⁿ⁺¹⁾ = f(x⁽ⁿ⁾).) If the function f is continuously differentiable, a sufficient condition for convergence is that the spectral radius of the derivative is strictly bounded by one in a neighborhood of the fixed point. If this condition holds at the fixed point, then a sufficiently small neighborhood (basin of attraction) must exist.

Linear systems

In the case of a system of linear equations, the two main classes of iterative methods are the stationary iterative methods, and the more general Krylov subspace methods.

Stationary iterative methods

Introduction

Stationary iterative methods solve a linear system with an operator approximating the original one; and based on a measurement of the error in the result (the residual), form a "correction equation" for which this process is repeated. While these methods are simple to derive, implement, and analyze, convergence is only guaranteed for a limited class of matrices.

Definition

An iterative method is defined by

\mathbf {x} ^{k+1}:=\Psi (\mathbf {x} ^{k}),\quad k\geq 0

and for a given linear system $A\mathbf {x} =\mathbf {b}$ with exact solution $\mathbf {x} ^{*}$ the error by

\mathbf {e} ^{k}:=\mathbf {x} ^{k}-\mathbf {x} ^{*},\quad k\geq 0.

An iterative method is called linear if there exists a matrix $C\in \mathbb {R} ^{n\times n}$ such that

\mathbf {e} ^{k+1}=C\mathbf {e} ^{k}\quad \forall k\geq 0

and this matrix is called the iteration matrix. An iterative method with a given iteration matrix $C$ is called convergent if the following holds

\lim _{k\rightarrow \infty }C^{k}=0.

An important theorem states that for a given iterative method and its iteration matrix $C$ it is convergent if and only if its spectral radius $\rho (C)$ is smaller than unity, that is,

\rho (C)<1.

The basic iterative methods work by splitting the matrix $A$ into

A=M-N

and here the matrix $M$ should be easily invertible. The iterative methods are now defined as

M\mathbf {x} ^{k+1}=N\mathbf {x} ^{k}+b,\quad k\geq 0,

or, equivalently,

\mathbf {x} ^{k+1}=\mathbf {x} ^{k}+M^{-1}(b-A\mathbf {x} ^{k}),\quad k\geq 0.

From this follows that the iteration matrix is given by

C=I-M^{-1}A=M^{-1}N.

Examples

Basic examples of stationary iterative methods use a splitting of the matrix $A$ such as

A=D+L+U\,,\quad D:={\text{diag}}((a_{ii})_{i})

where $D$ is only the diagonal part of $A$ , and $L$ is the strict lower triangular part of $A$ . Respectively, $U$ is the strict upper triangular part of $A$ .

Richardson method: $M:={\frac {1}{\omega }}I\quad (\omega \neq 0)$
Jacobi method: $M:=D$
Damped Jacobi method: $M:={\frac {1}{\omega }}D\quad (\omega \neq 0)$
Gauss–Seidel method: $M:=D+L$
Successive over-relaxation method (SOR): $M:={\frac {1}{\omega }}D+L\quad (\omega \neq 0)$
Symmetric successive over-relaxation (SSOR): $M:={\frac {1}{\omega (2-\omega )}}(D+\omega L)D^{-1}(D+\omega U)\quad (\omega \not \in \{0,2\})$

Linear stationary iterative methods are also called relaxation methods.

Krylov subspace methods

Krylov subspace methods^[2] work by forming a basis of the sequence of successive matrix powers times the initial residual (the Krylov sequence). The approximations to the solution are then formed by minimizing the residual over the subspace formed. The prototypical method in this class is the conjugate gradient method (CG) which assumes that the system matrix $A$ is symmetric positive-definite. For symmetric (and possibly indefinite) $A$ one works with the minimal residual method (MINRES). In the case of non-symmetric matrices, methods such as the generalized minimal residual method (GMRES) and the biconjugate gradient method (BiCG) have been derived.

Convergence of Krylov subspace methods

Since these methods form a basis, it is evident that the method converges in N iterations, where N is the system size. However, in the presence of rounding errors this statement does not hold; moreover, in practice N can be very large, and the iterative process reaches sufficient accuracy already far earlier. The analysis of these methods is hard, depending on a complicated function of the spectrum of the operator.

Preconditioners

The approximating operator that appears in stationary iterative methods can also be incorporated in Krylov subspace methods such as GMRES (alternatively, preconditioned Krylov methods can be considered as accelerations of stationary iterative methods), where they become transformations of the original operator to a presumably better conditioned one. The construction of preconditioners is a large research area.

Methods of successive approximation

Mathematical methods relating to successive approximation include:

Babylonian method, for finding square roots of numbers^[3]
Fixed-point iteration ^[4]
Means of finding zeros of functions:
- Halley's method
- Newton's method
Differential-equation matters:
- Picard–Lindelöf theorem, on existence of solutions of differential equations
- Runge–Kutta methods, for numerical solution of differential equations

History

Jamshīd al-Kāshī used iterative methods to calculate the sine of 1° and $π$ in The Treatise of Chord and Sine to high precision. An early iterative method for solving a linear system appeared in a letter of Gauss to a student of his. He proposed solving a 4-by-4 system of equations by repeatedly solving the component in which the residual was the largest ^{[ citation needed ]}.

The theory of stationary iterative methods was solidly established with the work of D.M. Young starting in the 1950s. The conjugate gradient method was also invented in the 1950s, with independent developments by Cornelius Lanczos, Magnus Hestenes and Eduard Stiefel, but its nature and applicability were misunderstood at the time. Only in the 1970s was it realized that conjugacy based methods work very well for partial differential equations, especially the elliptic type.

Related Research Articles

<span class="mw-page-title-main">Eigenfunction</span> Mathematical function of a linear operator

In mathematics, an eigenfunction of a linear operator D defined on some function space is any non-zero function $in that space that, when acted upon by D, is only multiplied by some scaling factor called an eigenvalue. As an equation, this condition can be written as for some scalar eigenvalue The solutions to this equation may also be subject to boundary conditions that limit the allowable eigenvalues and eigenfunctions.$

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the part of the domain which is mapped to the zero vector of the co-domain; the kernel is always a linear subspace of the domain. That is, given a linear map $L : V \to W$ between two vector spaces $V$ and $W$ , the kernel of $L$ is the vector space of all elements $v$ of $V$ such that $L (v) = 0$ , where $0$ denotes the zero vector in $W$ , or more symbolically:

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

The Rayleigh–Ritz method is a direct numerical method of approximating eigenvalues, originated in the context of solving physical boundary value problems and named after Lord Rayleigh and Walther Ritz.

In linear algebra, an eigenvector or characteristic vector is a vector that has its direction unchanged by a given linear transformation. More precisely, an eigenvector, $, of a linear transformation,, is scaled by a constant factor,, when the linear transformation is applied to it: . The corresponding eigenvalue, characteristic value, or characteristic root is the multiplying factor .$

In linear algebra, the order-rKrylov subspace generated by an n-by-n matrix A and a vector b of dimension n is the linear subspace spanned by the images of b under the first r powers of A, that is,

In mathematics, in the area of numerical analysis, Galerkin methods are a family of methods for converting a continuous operator problem, such as a differential equation, commonly in a weak formulation, to a discrete problem by applying linear constraints determined by finite sets of basis functions. They are named after the Soviet mathematician Boris Galerkin.

Harmonic balance is a method used to calculate the steady-state response of nonlinear differential equations, and is mostly applied to nonlinear electrical circuits. It is a frequency domain method for calculating the steady state, as opposed to the various time-domain steady-state methods. The name "harmonic balance" is descriptive of the method, which starts with Kirchhoff's Current Law written in the frequency domain and a chosen number of harmonics. A sinusoidal signal applied to a nonlinear component in a system will generate harmonics of the fundamental frequency. Effectively the method assumes a linear combination of sinusoids can represent the solution, then balances current and voltage sinusoids to satisfy Kirchhoff's law. The method is commonly used to simulate circuits which include nonlinear elements, and is most applicable to systems with feedback in which limit cycles occur.

In numerical linear algebra, the Gauss–Seidel method, also known as the Liebmann method or the method of successive displacement, is an iterative method used to solve a system of linear equations. It is named after the German mathematicians Carl Friedrich Gauss and Philipp Ludwig von Seidel. Though it can be applied to any matrix with non-zero elements on the diagonals, convergence is only guaranteed if the matrix is either strictly diagonally dominant, or symmetric and positive definite. It was only mentioned in a private letter from Gauss to his student Gerling in 1823. A publication was not delivered before 1874 by Seidel.

In numerical linear algebra, the Jacobi method is an iterative algorithm for determining the solutions of a strictly diagonally dominant system of linear equations. Each diagonal element is solved for, and an approximate value is plugged in. The process is then iterated until it converges. This algorithm is a stripped-down version of the Jacobi transformation method of matrix diagonalization. The method is named after Carl Gustav Jacob Jacobi.

In numerical linear algebra, the method of successive over-relaxation (SOR) is a variant of the Gauss–Seidel method for solving a linear system of equations, resulting in faster convergence. A similar method can be used for any slowly converging iterative process.

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

Finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat transfer, fluid flow, mass transport, and electromagnetic potential. Computers are usually used to perform the calculations required. With high-speed supercomputers, better solutions can be achieved and are often required to solve the largest and most complex problems.

In the finite element method for the numerical solution of elliptic partial differential equations, the stiffness matrix is a matrix that represents the system of linear equations that must be solved in order to ascertain an approximate solution to the differential equation.

Iterative refinement is an iterative method proposed by James H. Wilkinson to improve the accuracy of numerical solutions to systems of linear equations.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

The Kansa method is a computer method used to solve partial differential equations. Its main advantage is it is very easy to understand and program on a computer. It is much less complicated than the finite element method. Another advantage is it works well on multi variable problems. The finite element method is complicated when working with more than 3 space variables and time.

In the mathematical discipline of numerical linear algebra, a matrix splitting is an expression which represents a given matrix as a sum or difference of matrices. Many iterative methods depend upon the direct solution of matrix equations involving matrices more general than tridiagonal matrices. These matrix equations can often be solved directly and efficiently when written as a matrix splitting. The technique was devised by Richard S. Varga in 1960.

References

↑ Amritkar, Amit; de Sturler, Eric; Świrydowicz, Katarzyna; Tafti, Danesh; Ahuja, Kapil (2015). "Recycling Krylov subspaces for CFD applications and a new hybrid recycling solver". Journal of Computational Physics. 303: 222. arXiv: 1501.03358 . Bibcode:2015JCoPh.303..222A. doi:10.1016/j.jcp.2015.09.040.
↑ Charles George Broyden and Maria Terasa Vespucci: Krylov Solvers for Linear Algebraic Systems: Krylov Solvers, Elsevier, ISBN 0-444-51474-0, (2004).
↑ "Babylonian mathematics". Babylonian mathematics. December 1, 2000.
↑ day, Mahlon (November 2, 1960). Fixed-point theorems for compact convex sets. Mahlon M day.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Amritkar, Amit; de Sturler, Eric; Świrydowicz, Katarzyna; Tafti, Danesh; Ahuja, Kapil (2015). "Recycling Krylov subspaces for CFD applications and a new hybrid recycling solver". Journal of Computational Physics. 303: 222. arXiv: 1501.03358 . Bibcode:2015JCoPh.303..222A. doi:10.1016/j.jcp.2015.09.040.

[2] Charles George Broyden and Maria Terasa Vespucci: Krylov Solvers for Linear Algebraic Systems: Krylov Solvers, Elsevier, ISBN 0-444-51474-0, (2004).

[3] "Babylonian mathematics". Babylonian mathematics. December 1, 2000.

[4] y, Mahlon (November 2, 1960). Fixed-point theorems for compact convex sets. Mahlon M day.

[1]

[2]

[3]

[4]