Biconjugate gradient method

Last updated October 31, 2023

In mathematics, more specifically in numerical linear algebra, the biconjugate gradient method is an algorithm to solve systems of linear equations

The Algorithm

Choose initial guess $x_{0}\,$ , two other vectors $x_{0}^{*}$ and $b^{*}\,$ and a preconditioner $M\,$
$r_{0}\leftarrow b-A\,x_{0}\,$
$r_{0}^{*}\leftarrow b^{*}-x_{0}^{*}\,A^{*}$
$p_{0}\leftarrow M^{-1}r_{0}\,$
$p_{0}^{*}\leftarrow r_{0}^{*}M^{-1}\,$
for $k=0,1,\ldots$ $Biconjugate gradient method$ do
1. $\alpha _{k}\leftarrow {r_{k}^{*}M^{-1}r_{k} \over p_{k}^{*}Ap_{k}}\,$
2. $x_{k+1}\leftarrow x_{k}+\alpha _{k}\cdot p_{k}\,$
3. $x_{k+1}^{*}\leftarrow x_{k}^{*}+{\overline {\alpha _{k}}}\cdot p_{k}^{*}\,$
4. $r_{k+1}\leftarrow r_{k}-\alpha _{k}\cdot Ap_{k}\,$
5. $r_{k+1}^{*}\leftarrow r_{k}^{*}-{\overline {\alpha _{k}}}\cdot p_{k}^{*}\,A^{*}$
6. $\beta _{k}\leftarrow {r_{k+1}^{*}M^{-1}r_{k+1} \over r_{k}^{*}M^{-1}r_{k}}\,$
7. $p_{k+1}\leftarrow M^{-1}r_{k+1}+\beta _{k}\cdot p_{k}\,$
8. $p_{k+1}^{*}\leftarrow r_{k+1}^{*}M^{-1}+{\overline {\beta _{k}}}\cdot p_{k}^{*}\,$

In the above formulation, the computed $r_{k}\,$ and $r_{k}^{*}$ satisfy

r_{k}=b-Ax_{k},\,

r_{k}^{*}=b^{*}-x_{k}^{*}\,A^{*}

and thus are the respective residuals corresponding to $x_{k}\,$ and $x_{k}^{*}$ , as approximate solutions to the systems

Ax=b,\,

x^{*}\,A^{*}=b^{*}\,;

$x^{*}$ is the adjoint, and ${\overline {\alpha }}$ is the complex conjugate.

Unpreconditioned version of the algorithm

Choose initial guess $x_{0}\,$ ,
$r_{0}\leftarrow b-A\,x_{0}\,$
${\hat {r}}_{0}\leftarrow {\hat {b}}-{\hat {x}}_{0}A$
$p_{0}\leftarrow r_{0}\,$
${\hat {p}}_{0}\leftarrow {\hat {r}}_{0}\,$
for $k=0,1,\ldots$ $Biconjugate gradient method$ do
1. $\alpha _{k}\leftarrow {{\hat {r}}_{k}r_{k} \over {\hat {p}}_{k}Ap_{k}}\,$
2. $x_{k+1}\leftarrow x_{k}+\alpha _{k}\cdot p_{k}\,$
3. ${\hat {x}}_{k+1}\leftarrow {\hat {x}}_{k}+\alpha _{k}\cdot {\hat {p}}_{k}\,$
4. $r_{k+1}\leftarrow r_{k}-\alpha _{k}\cdot Ap_{k}\,$
5. ${\hat {r}}_{k+1}\leftarrow {\hat {r}}_{k}-\alpha _{k}\cdot {\hat {p}}_{k}A$
6. $\beta _{k}\leftarrow {{\hat {r}}_{k+1}r_{k+1} \over {\hat {r}}_{k}r_{k}}\,$
7. $p_{k+1}\leftarrow r_{k+1}+\beta _{k}\cdot p_{k}\,$
8. ${\hat {p}}_{k+1}\leftarrow {\hat {r}}_{k+1}+\beta _{k}\cdot {\hat {p}}_{k}\,$

Discussion

The biconjugate gradient method is numerically unstable ^{[ citation needed ]} (compare to the biconjugate gradient stabilized method), but very important from a theoretical point of view. Define the iteration steps by

x_{k}:=x_{j}+P_{k}A^{-1}\left(b-Ax_{j}\right),

x_{k}^{*}:=x_{j}^{*}+\left(b^{*}-x_{j}^{*}A\right)P_{k}A^{-1},

where $j<k$ using the related projection

P_{k}:=\mathbf {u} _{k}\left(\mathbf {v} _{k}^{*}A\mathbf {u} _{k}\right)^{-1}\mathbf {v} _{k}^{*}A,

with

\mathbf {u} _{k}=\left[u_{0},u_{1},\dots ,u_{k-1}\right],

\mathbf {v} _{k}=\left[v_{0},v_{1},\dots ,v_{k-1}\right].

These related projections may be iterated themselves as

P_{k+1}=P_{k}+\left(1-P_{k}\right)u_{k}\otimes {v_{k}^{*}A\left(1-P_{k}\right) \over v_{k}^{*}A\left(1-P_{k}\right)u_{k}}.

A relation to Quasi-Newton methods is given by $P_{k}=A_{k}^{-1}A$ and $x_{k+1}=x_{k}-A_{k+1}^{-1}\left(Ax_{k}-b\right)$ , where

A_{k+1}^{-1}=A_{k}^{-1}+\left(1-A_{k}^{-1}A\right)u_{k}\otimes {v_{k}^{*}\left(1-AA_{k}^{-1}\right) \over v_{k}^{*}A\left(1-A_{k}^{-1}A\right)u_{k}}.

The new directions

p_{k}=\left(1-P_{k}\right)u_{k},

p_{k}^{*}=v_{k}^{*}A\left(1-P_{k}\right)A^{-1}

are then orthogonal to the residuals:

v_{i}^{*}r_{k}=p_{i}^{*}r_{k}=0,

r_{k}^{*}u_{j}=r_{k}^{*}p_{j}=0,

which themselves satisfy

r_{k}=A\left(1-P_{k}\right)A^{-1}r_{j},

r_{k}^{*}=r_{j}^{*}\left(1-P_{k}\right)

where $i,j<k$ .

The biconjugate gradient method now makes a special choice and uses the setting

u_{k}=M^{-1}r_{k},\,

v_{k}^{*}=r_{k}^{*}\,M^{-1}.\,

With this particular choice, explicit evaluations of $P_{k}$ and $A - 1$ are avoided, and the algorithm takes the form stated above.

Properties

If $A=A^{*}\,$ is self-adjoint, $x_{0}^{*}=x_{0}$ and $b^{*}=b$ , then $r_{k}=r_{k}^{*}$ , $p_{k}=p_{k}^{*}$ , and the conjugate gradient method produces the same sequence $x_{k}=x_{k}^{*}$ at half the computational cost.
The sequences produced by the algorithm are biorthogonal, i.e., $p_{i}^{*}Ap_{j}=r_{i}^{*}M^{-1}r_{j}=0$ for $i\neq j$ .
if $P_{j'}\,$ is a polynomial with $\deg \left(P_{j'}\right)+j<k$ , then $r_{k}^{*}P_{j'}\left(M^{-1}A\right)u_{j}=0$ . The algorithm thus produces projections onto the Krylov subspace.
if $P_{i'}\,$ is a polynomial with $i+\deg \left(P_{i'}\right)<k$ , then $v_{i}^{*}P_{i'}\left(AM^{-1}\right)r_{k}=0$ .

Related Research Articles

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the second matrix. The resulting matrix, known as the matrix product, has the number of rows of the first and the number of columns of the second matrix. The product of matrices $A$ and $B$ is denoted as $AB$ .

<span class="mw-page-title-main">Moment of inertia</span> Scalar measure of the rotational inertia with respect to a fixed axis of rotation

The moment of inertia, otherwise known as the mass moment of inertia, angular mass, second moment of mass, or most accurately, rotational inertia, of a rigid body is a quantity that determines the torque needed for a desired angular acceleration about a rotational axis, akin to how mass determines the force needed for a desired acceleration. It depends on the body's mass distribution and the axis chosen, with larger moments requiring more torque to change the body's rate of rotation by a given amount.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

Unit quaternions, known as versors, provide a convenient mathematical notation for representing spatial orientations and rotations of elements in three dimensional space. Specifically, they encode information about an axis-angle rotation about an arbitrary axis. Rotation and orientation quaternions have applications in computer graphics, computer vision, robotics, navigation, molecular dynamics, flight dynamics, orbital mechanics of satellites, and crystallographic texture analysis.

In special relativity, a four-vector is an object with four components, which transform in a specific way under Lorentz transformations. Specifically, a four-vector is an element of a four-dimensional vector space considered as a representation space of the standard representation of the Lorentz group, the representation. It differs from a Euclidean vector in how its magnitude is determined. The transformations that preserve this magnitude are the Lorentz transformations, which include spatial rotations and boosts.

In atomic physics, hyperfine structure is defined by small shifts in otherwise degenerate energy levels and the resulting splittings in those energy levels of atoms, molecules, and ions, due to electromagnetic multipole interaction between the nucleus and electron clouds.

In physics, the S-matrix or scattering matrix relates the initial state and the final state of a physical system undergoing a scattering process. It is used in quantum mechanics, scattering theory and quantum field theory (QFT).

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-definite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In mathematics, a matrix norm is a vector norm in a vector space whose elements (vectors) are matrices.

In differential geometry, the four-gradient $is the four-vector analogue of the gradient from vector calculus.$

In continuum mechanics, the finite strain theory—also called large strain theory, or large deformation theory—deals with deformations in which strains and/or rotations are large enough to invalidate assumptions inherent in infinitesimal strain theory. In this case, the undeformed and deformed configurations of the continuum are significantly different, requiring a clear distinction between them. This is commonly the case with elastomers, plastically-deforming materials and other fluids and biological soft tissue.

In geometry, various formalisms exist to express a rotation in three dimensions as a mathematical transformation. In physics, this concept is applied to classical mechanics where rotational kinematics is the science of quantitative description of a purely rotational motion. The orientation of an object at a given instant is described with the same tools, as it is defined as an imaginary rotation from a reference placement in space, rather than an actually observed rotation from a previous placement in space.

<span class="mw-page-title-main">Dual quaternion</span> Eight-dimensional algebra over the real numbers

In mathematics, the dual quaternions are an 8-dimensional real algebra isomorphic to the tensor product of the quaternions and the dual numbers. Thus, they may be constructed in the same way as the quaternions, except using dual numbers instead of real numbers as coefficients. A dual quaternion can be represented in the form A + εB, where A and B are ordinary quaternions and ε is the dual unit, which satisfies ε² = 0 and commutes with every element of the algebra. Unlike quaternions, the dual quaternions do not form a division algebra.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

The theory of special relativity plays an important role in the modern theory of classical electromagnetism. It gives formulas for how electromagnetic objects, in particular the electric and magnetic fields, are altered under a Lorentz transformation from one inertial frame of reference to another. It sheds light on the relationship between electricity and magnetism, showing that frame of reference determines if an observation follows electric or magnetic laws. It motivates a compact and convenient notation for the laws of electromagnetism, namely the "manifestly covariant" tensor form.

In continuum mechanics, a compatible deformation tensor field in a body is that unique tensor field that is obtained when the body is subjected to a continuous, single-valued, displacement field. Compatibility is the study of the conditions under which such a displacement field can be guaranteed. Compatibility conditions are particular cases of integrability conditions and were first derived for linear elasticity by Barré de Saint-Venant in 1864 and proved rigorously by Beltrami in 1886.

In physics, relativistic angular momentum refers to the mathematical formalisms and physical concepts that define angular momentum in special relativity (SR) and general relativity (GR). The relativistic quantity is subtly different from the three-dimensional quantity in classical mechanics.

References

Fletcher, R. (1976). Watson, G. Alistair (ed.). "Conjugate gradient methods for indefinite systems". Numerical Analysis. Lecture Notes in Mathematics. Springer Berlin / Heidelberg. 506: 73–89. doi: 10.1007/BFb0080109 . ISBN 978-3-540-07610-0. ISSN 1617-9692.
Press, WH; Teukolsky, SA; Vetterling, WT; Flannery, BP (2007). "Section 2.7.6". Numerical Recipes: The Art of Scientific Computing (3rd ed.). New York: Cambridge University Press. ISBN 978-0-521-88068-8.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software