Derivation of the conjugate gradient method

Last updated

In numerical linear algebra, the conjugate gradient method is an iterative method for numerically solving the linear system

Contents

where is symmetric positive-definite. The conjugate gradient method can be derived from several different perspectives, including specialization of the conjugate direction method [1] for optimization, and variation of the Arnoldi/Lanczos iteration for eigenvalue problems.

The intent of this article is to document the important steps in these derivations.

Derivation from the conjugate direction method

The conjugate gradient method can be seen as a special case of the conjugate direction method applied to minimization of the quadratic function

The conjugate direction method

In the conjugate direction method for minimizing

one starts with an initial guess and the corresponding residual , and computes the iterate and residual by the formulae

where are a series of mutually conjugate directions, i.e.,

for any .

The conjugate direction method is imprecise in the sense that no formulae are given for selection of the directions . Specific choices lead to various methods including the conjugate gradient method and Gaussian elimination.

Derivation from the Arnoldi/Lanczos iteration

The conjugate gradient method can also be seen as a variant of the Arnoldi/Lanczos iteration applied to solving linear systems.

The general Arnoldi method

In the Arnoldi iteration, one starts with a vector and gradually builds an orthonormal basis of the Krylov subspace

by defining where

In other words, for , is found by Gram-Schmidt orthogonalizing against followed by normalization.

Put in matrix form, the iteration is captured by the equation

where

with

When applying the Arnoldi iteration to solving linear systems, one starts with , the residual corresponding to an initial guess . After each step of iteration, one computes and the new iterate .

The direct Lanczos method

For the rest of discussion, we assume that is symmetric positive-definite. With symmetry of , the upper Hessenberg matrix becomes symmetric and thus tridiagonal. It then can be more clearly denoted by

This enables a short three-term recurrence for in the iteration, and the Arnoldi iteration is reduced to the Lanczos iteration.

Since is symmetric positive-definite, so is . Hence, can be LU factorized without partial pivoting into

with convenient recurrences for and :

Rewrite as

with

It is now important to observe that

In fact, there are short recurrences for and as well:

With this formulation, we arrive at a simple recurrence for :

The relations above straightforwardly lead to the direct Lanczos method, which turns out to be slightly more complex.

The conjugate gradient method from imposing orthogonality and conjugacy

If we allow to scale and compensate for the scaling in the constant factor, we potentially can have simpler recurrences of the form:

As premises for the simplification, we now derive the orthogonality of and conjugacy of , i.e., for ,

The residuals are mutually orthogonal because is essentially a multiple of since for , , for ,

To see the conjugacy of , it suffices to show that is diagonal:

is symmetric and lower triangular simultaneously and thus must be diagonal.

Now we can derive the constant factors and with respect to the scaled by solely imposing the orthogonality of and conjugacy of .

Due to the orthogonality of , it is necessary that . As a result,

Similarly, due to the conjugacy of , it is necessary that . As a result,

This completes the derivation.

Related Research Articles

Spherical coordinate system 3-dimensional coordinate system

In mathematics, a spherical coordinate system is a coordinate system for three-dimensional space where the position of a point is specified by three numbers: the radial distance of that point from a fixed origin, its polar angle measured from a fixed zenith direction, and the azimuthal angle of its orthogonal projection on a reference plane that passes through the origin and is orthogonal to the zenith, measured from a fixed reference direction on that plane. It can be seen as the three-dimensional version of the polar coordinate system.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Kinematics is a subfield of physics, developed in classical mechanics, that describes the motion of points, bodies (objects), and systems of bodies without considering the forces that cause them to move. Kinematics, as a field of study, is often referred to as the "geometry of motion" and is occasionally seen as a branch of mathematics. A kinematics problem begins by describing the geometry of the system and declaring the initial conditions of any known values of position, velocity and/or acceleration of points within the system. Then, using arguments from geometry, the position, velocity and acceleration of any unknown parts of the system can be determined. The study of how forces act on bodies falls within kinetics, not kinematics. For further details, see analytical dynamics.

Moment of inertia Scalar measure of the rotational inertia with respect to a fixed axis of rotation

The moment of inertia, otherwise known as the mass moment of inertia, angular mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is a quantity that determines the torque needed for a desired angular acceleration about a rotational axis, akin to how mass determines the force needed for a desired acceleration. It depends on the body's mass distribution and the axis chosen, with larger moments requiring more torque to change the body's rate of rotation.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

In mathematics, the conjugate transpose of an m-by-n matrix with complex entries is the n-by-m matrix obtained from by taking the transpose and then taking the complex conjugate of each entry. It is often denoted as or .

In linear algebra, a square matrix  is called diagonalizable or non-defective if it is similar to a diagonal matrix, i.e., if there exists an invertible matrix  and a diagonal matrix such that , or equivalently . For a finite-dimensional vector space , a linear map  is called diagonalizable if there exists an ordered basis of  consisting of eigenvectors of . These definitions are equivalent: if  has a matrix representation as above, then the column vectors of  form a basis consisting of eigenvectors of , and the diagonal entries of  are the corresponding eigenvalues of ; with respect to this eigenvector basis,  is represented by .Diagonalization is the process of finding the above  and .

In physics, an operator is a function over a space of physical states onto another space of physical states. The simplest example of the utility of operators is the study of symmetry. Because of this, they are very useful tools in classical mechanics. Operators are even more important in quantum mechanics, where they form an intrinsic part of the formulation of the theory.

In the calculus of variations, a field of mathematical analysis, the functional derivative relates a change in a functional to a change in a function on which the functional depends.

Differential geometry of curves is the branch of geometry that deals with smooth curves in the plane and the Euclidean space by methods of differential and integral calculus.

Vector fields in cylindrical and spherical coordinates Vector field representation in 3D curvilinear coordinate systems

Note: This page uses common physics notation for spherical coordinates, in which is the angle between the z axis and the radius vector connecting the origin to the point in question, while is the angle between the projection of the radius vector onto the x-y plane and the x axis. Several other definitions are in use, and so care must be taken in comparing different sources.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

Conjugate gradient method Optimization algorithm

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

In mathematics, every analytic function can be used for defining a matrix function that maps square matrices with complex entries to square matrices of the same size.

The SPIKE algorithm is a hybrid parallel solver for banded linear systems developed by Eric Polizzi and Ahmed Sameh^

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

Stokes theorem Theorem in vector calculus

Stokes' theorem, also known as Kelvin–Stokes theorem after Lord Kelvin and George Stokes, the fundamental theorem for curls or simply the curl theorem, is a theorem in vector calculus on . Given a vector field, the theorem relates the integral of the curl of the vector field over some surface, to the line integral of the vector field around the boundary of the surface. The classical Stokes' theorem can be stated in one sentence: The line integral of a vector field over a loop is equal to the flux of its curl through the enclosed surface.

Kirchhoff–Love plate theory

The Kirchhoff–Love theory of plates is a two-dimensional mathematical model that is used to determine the stresses and deformations in thin plates subjected to forces and moments. This theory is an extension of Euler-Bernoulli beam theory and was developed in 1888 by Love using assumptions proposed by Kirchhoff. The theory assumes that a mid-surface plane can be used to represent a three-dimensional plate in two-dimensional form.

In mathematical optimization, the revised simplex method is a variant of George Dantzig's simplex method for linear programming.

References

  1. Hestenes, M. R.; Stiefel, E. (December 1952). "Methods of conjugate gradients for solving linear systems" (PDF). Journal of Research of the National Bureau of Standards. 49 (6).
  2. Saad, Y. (2003). "Chapter 6: Krylov Subspace Methods, Part I". Iterative methods for sparse linear systems (2nd ed.). SIAM. ISBN   978-0-89871-534-7.