Kaczmarz method

Last updated September 18, 2024

The Kaczmarz method or Kaczmarz's algorithm is an iterative algorithm for solving linear equation systems $Ax=b$ . It was first discovered by the Polish mathematician Stefan Kaczmarz,^[1] and was rediscovered in the field of image reconstruction from projections by Richard Gordon, Robert Bender, and Gabor Herman in 1970, where it is called the Algebraic Reconstruction Technique (ART).^[2] ART includes the positivity constraint, making it nonlinear.^[3]

Algorithm 1: Kaczmarz algorithm
Algorithm 2: Randomized Kaczmarz algorithm
Proof
Algorithm 3: Gower-Richtarik algorithm
Insights about Randomized Kaczmarz
Six Equivalent Formulations
1. Sketch and Project
2. Constrain and Approximate
5. Random Update
6. Random Fixed Point
Convergence
Theorem [Gower & Richtarik 2015]
Theorem [Gower & Richtarik 2015] 2
Convergence of Randomized Kaczmarz
Further Special Cases
Algorithm 4: PLSS-Kaczmarz
Notes
References
External links

The Kaczmarz method is applicable to any linear system of equations, but its computational advantage relative to other methods depends on the system being sparse. It has been demonstrated to be superior, in some biomedical imaging applications, to other methods such as the filtered backprojection method.^[4]

It has many applications ranging from computed tomography (CT) to signal processing. It can be obtained also by applying to the hyperplanes, described by the linear system, the method of successive projections onto convex sets (POCS).^[5]^[6]

Algorithm 1: Kaczmarz algorithm

Let $Ax=b$ be a system of linear equations, let $m$ be the number of rows of A, $a_{i}$ be the $i$ th row of complex-valued matrix $A$ , and let $x^{0}$ be arbitrary complex-valued initial approximation to the solution of $Ax=b$ . For $k=0,1,\ldots$ compute:

x^{k+1}=x^{k}+{\frac {b_{i}-\langle a_{i},x^{k}\rangle }{\|a_{i}\|^{2}}}{\overline {a_{i}}}

(1)

where $i=k{\bmod {m}}+1,i=1,2,\ldots m$ and ${\overline {a_{i}}}$ denotes complex conjugation of $a_{i}$ .

If the system is consistent, $x^{k}$ converges to the minimum-norm solution, provided that the iterations start with the zero vector.

A more general algorithm can be defined using a relaxation parameter $\lambda ^{k}$

x^{k+1}=x^{k}+\lambda ^{k}{\frac {b_{i}-\langle a_{i},x^{k}\rangle }{\|a_{i}\|^{2}}}{\overline {a_{i}}}

There are versions of the method that converge to a regularized weighted least squares solution when applied to a system of inconsistent equations and, at least as far as initial behavior is concerned, at a lesser cost than other iterative methods, such as the conjugate gradient method.^[7]

Algorithm 2: Randomized Kaczmarz algorithm

In 2009, a randomized version of the Kaczmarz method for overdetermined linear systems was introduced by Thomas Strohmer and Roman Vershynin^[8] in which the i-th equation is selected randomly with probability proportional to $\|a_{i}\|^{2}.$

This method can be seen as a particular case of stochastic gradient descent.^[9]

Under such circumstances $x_{k}$ converges exponentially fast to the solution of $Ax=b,$ and the rate of convergence depends only on the scaled condition number $\kappa (A)$ .

Theorem. Let

x

be the solution of

Ax=b.

Then Algorithm 2 converges to

x

in expectation, with the average error:

\mathbb {E} \|x_{k}-x\|^{2}\leq \left(1-\kappa (A)^{-2}\right)^{k}\cdot \|x_{0}-x\|^{2}.

Proof

We have

\forall z\in \mathbb {C} ^{n}:\quad \sum _{j=1}^{m}|\langle z,a_{j}\rangle |^{2}\geq {\frac {\|z\|^{2}}{\|A^{-1}\|^{2}}}

(2)

Using

\|A\|^{2}=\sum _{j=1}^{m}\|a_{j}\|^{2}

we can write ( 2 ) as

\forall z\in \mathbb {C} ^{n}:\quad \sum _{j=1}^{m}{\frac {\|a_{j}\|^{2}}{\|A\|^{2}}}\left|\left\langle z,{\frac {a_{j}}{\|a_{j}\|}}\right\rangle \right|^{2}\geq \kappa (A)^{-2}{\|z\|^{2}}

(3)

The main point of the proof is to view the left hand side in ( 3 ) as an expectation of some random variable. Namely, recall that the solution space of the $j-th$ equation of $Ax=b$ is the hyperplane

\{y:\langle y,a_{j}\rangle =b_{j}\},

whose normal is ${\tfrac {a_{j}}{\|a_{j}\|^{2}}}.$ Define a random vector Z whose values are the normals to all the equations of $Ax=b$ , with probabilities as in our algorithm:

Z={\frac {a_{j}}{\|a_{j}\|}}

with probability

{\frac {\|a_{j}\|^{2}}{\|A\|^{2}}}\qquad \qquad \qquad j=1,\ldots ,m

Then ( 3 ) says that

\forall z\in \mathbb {C} ^{n}:\quad \mathbb {E} |\langle z,Z\rangle |^{2}\geq \kappa (A)^{-2}{\|z\|^{2}}

(4)

The orthogonal projection $P$ onto the solution space of a random equation of $Ax=b$ is given by $Pz=z-\langle z-x,Z\rangle Z.$

Now we are ready to analyze our algorithm. We want to show that the error ${\|x_{k}-x\|^{2}}$ reduces at each step in average (conditioned on the previous steps) by at least the factor of $(1-\kappa (A)^{-2}).$ The next approximation $x_{k}$ is computed from $x_{k-1}$ as $x_{k}=P_{k}x_{k-1},$ where $P_{1},P_{2},\ldots$ are independent realizations of the random projection $P.$ The vector $x_{k-1}-x_{k}$ is in the kernel of $P_{k}.$ It is orthogonal to the solution space of the equation onto which $P_{k}$ projects, which contains the vector $x_{k}-x$ (recall that $x$ is the solution to all equations). The orthogonality of these two vectors then yields

\|x_{k}-x\|^{2}=\|x_{k-1}-x\|^{2}-\|x_{k-1}-x_{k}\|^{2}.

To complete the proof, we have to bound $\|x_{k-1}-x_{k}\|^{2}$ from below. By the definition of $x_{k}$ , we have

\|x_{k-1}-x_{k}\|=\langle x_{k-1}-x,Z_{k}\rangle

where $Z_{1},Z_{2},\ldots$ are independent realizations of the random vector $Z.$

Thus

\|x_{k}-x\|^{2}\leq \left(1-\left|\left\langle {\frac {x_{k-1}-x}{\|x_{k-1}-x\|}},Z_{k}\right\rangle \right|^{2}\right){\|x_{k-1}-x\|^{2}}.

Now we take the expectation of both sides conditional upon the choice of the random vectors $Z_{1},\ldots ,Z_{k-1}$ (hence we fix the choice of the random projections $P_{1},\ldots ,P_{k-1}$ and thus the random vectors $x_{1},\ldots ,x_{k-1}$ and we average over the random vector $Z_{k}$ ). Then

\mathbb {E} _{Z_{1},\ldots ,Z_{k-1}}{\|x_{k}-x\|^{2}}=\left(1-\mathbb {E} _{Z_{1},\ldots ,Z_{k-1},Z_{k}}\left|\left\langle {\frac {x_{k-1}-x}{\|x_{k-1}-x\|}},Z_{k}\right\rangle \right|^{2}\right){\|x_{k-1}-x\|^{2}}.

By ( 4 ) and the independence,

\mathbb {E} _{Z_{1},\ldots ,Z_{k-1}}{\|x_{k}-x\|^{2}}\leq (1-\kappa (A)^{-2}){\|x_{k-1}-x\|^{2}}.

Taking the full expectation of both sides, we conclude that

\mathbb {E} \|x_{k}-x\|^{2}\leq (1-\kappa (A)^{-2})\mathbb {E} {\|x_{k-1}-x\|^{2}}.\blacksquare

The superiority of this selection was illustrated with the reconstruction of a bandlimited function from its nonuniformly spaced sampling values. However, it has been pointed out^[10] that the reported success by Strohmer and Vershynin depends on the specific choices that were made there in translating the underlying problem, whose geometrical nature is to find a common point of a set of hyperplanes, into a system of algebraic equations. There will always be legitimate algebraic representations of the underlying problem for which the selection method in^[8] will perform in an inferior manner.^[8]^[10]^[11]

The Kaczmarz iteration ( 1 ) has a purely geometric interpretation: the algorithm successively projects the current iterate onto the hyperplane defined by the next equation. Hence, any scaling of the equations is irrelevant; it can also be seen from ( 1 ) that any (nonzero) scaling of the equations cancels out. Thus, in RK, one can use $\|a_{i}\|$ or any other weights that may be relevant. Specifically, in the above-mentioned reconstruction example, the equations were chosen with probability proportional to the average distance of each sample point from its two nearest neighbors — a concept introduced by Feichtinger and Gröchenig. For additional progress on this topic, see,^[12]^[13] and the references therein.

Algorithm 3: Gower-Richtarik algorithm

In 2015, Robert M. Gower and Peter Richtarik ^[14] developed a versatile randomized iterative method for solving a consistent system of linear equations $Ax=b$ which includes the randomized Kaczmarz algorithm as a special case. Other special cases include randomized coordinate descent, randomized Gaussian descent and randomized Newton method. Block versions and versions with importance sampling of all these methods also arise as special cases. The method is shown to enjoy exponential rate decay (in expectation) - also known as linear convergence, under very mild conditions on the way randomness enters the algorithm. The Gower-Richtarik method is the first algorithm uncovering a "sibling" relationship between these methods, some of which were independently proposed before, while many of which were new.

Insights about Randomized Kaczmarz

Interesting new insights about the randomized Kaczmarz method that can be gained from the analysis of the method include:

The general rate of the Gower-Richtarik algorithm precisely recovers the rate of the randomized Kaczmarz method in the special case when it reduced to it.
The choice of probabilities for which the randomized Kaczmarz algorithm was originally formulated and analyzed (probabilities proportional to the squares of the row norms) is not optimal. Optimal probabilities are the solution of a certain semidefinite program. The theoretical complexity of randomized Kaczmarz with the optimal probabilities can be arbitrarily better than the complexity for the standard probabilities. However, the amount by which it is better depends on the matrix $A$ . There are problems for which the standard probabilities are optimal.
When applied to a system with matrix $A$ which is positive definite, Randomized Kaczmarz method is equivalent to the Stochastic Gradient Descent (SGD) method (with a very special stepsize) for minimizing the strongly convex quadratic function $f(x)={\tfrac {1}{2}}x^{T}Ax-b^{T}x.$ Note that since $f$ is convex, the minimizers of $f$ must satisfy $\nabla f(x)=0$ , which is equivalent to $Ax=b.$ The "special stepsize" is the stepsize which leads to a point which in the one-dimensional line spanned by the stochastic gradient minimizes the Euclidean distance from the unknown(!) minimizer of $f$ , namely, from $x^{*}=A^{-1}b.$ This insight is gained from a dual view of the iterative process (below described as "Optimization Viewpoint: Constrain and Approximate").

Six Equivalent Formulations

The Gower-Richtarik method enjoys six seemingly different but equivalent formulations, shedding additional light on how to interpret it (and, as a consequence, how to interpret its many variants, including randomized Kaczmarz):

1. Sketching viewpoint: Sketch & Project
2. Optimization viewpoint: Constrain and Approximate
3. Geometric viewpoint: Random Intersect
4. Algebraic viewpoint 1: Random Linear Solve
5. Algebraic viewpoint 2: Random Update
6. Analytic viewpoint: Random Fixed Point

We now describe some of these viewpoints. The method depends on 2 parameters:

a positive definite matrix $B$ giving rise to a weighted Euclidean inner product $\langle x,y\rangle _{B}:=x^{T}By$ and the induced norm

\|x\|_{B}=\left(\langle x,x\rangle _{B}\right)^{\frac {1}{2}},

and a random matrix $S$ with as many rows as $A$ (and possibly random number of columns).

1. Sketch and Project

Given previous iterate $x^{k},$ the new point $x^{k+1}$ is computed by drawing a random matrix $S$ (in an iid fashion from some fixed distribution), and setting

x^{k+1}={\underset {x}{\operatorname {arg\ min} }}\|x-x^{k}\|_{B}{\text{ subject to }}S^{T}Ax=S^{T}b.

That is, $x^{k+1}$ is obtained as the projection of $x^{k}$ onto the randomly sketched system $S^{T}Ax=S^{T}b$ . The idea behind this method is to pick $S$ in such a way that a projection onto the sketched system is substantially simpler than the solution of the original system $Ax=b$ . Randomized Kaczmarz method is obtained by picking $B$ to be the identity matrix, and $S$ to be the $i^{th}$ unit coordinate vector with probability $p_{i}=\|a_{i}\|_{2}^{2}/\|A\|_{F}^{2}.$ Different choices of $B$ and $S$ lead to different variants of the method.

2. Constrain and Approximate

A seemingly different but entirely equivalent formulation of the method (obtained via Lagrangian duality) is

x^{k+1}={\underset {x}{\operatorname {arg\ min} }}\left\|x-x^{*}\right\|_{B}{\text{ subject to }}x=x^{k}+B^{-1}A^{T}Sy,

where $y$ is also allowed to vary, and where $x^{*}$ is any solution of the system $Ax=b.$ Hence, $x^{k+1}$ is obtained by first constraining the update to the linear subspace spanned by the columns of the random matrix $B^{-1}A^{T}S$ , i.e., to

\left\{h:h=B^{-1}A^{T}Sy,\quad y{\text{ can vary }}\right\},

and then choosing the point $x$ from this subspace which best approximates $x^{*}$ . This formulation may look surprising as it seems impossible to perform the approximation step due to the fact that $x^{*}$ is not known (after all, this is what we are trying the compute!). However, it is still possible to do this, simply because $x^{k+1}$ computed this way is the same as $x^{k+1}$ computed via the sketch and project formulation and since $x^{*}$ does not appear there.

5. Random Update

The update can also be written explicitly as

x^{k+1}=x^{k}-B^{-1}A^{T}S\left(S^{T}AB^{-1}A^{T}S\right)^{\dagger }S^{T}\left(Ax^{k}-b\right),

where by $M^{\dagger }$ we denote the Moore-Penrose pseudo-inverse of matrix $M$ . Hence, the method can be written in the form $x^{k+1}=x^{k}+h^{k}$ , where $h^{k}$ is a random update vector.

Letting $M=S^{T}AB^{-1}A^{T}S,$ it can be shown that the system $My=S^{T}(Ax^{k}-b)$ always has a solution $y^{k}$ , and that for all such solutions the vector $x^{k+1}-B^{-1}A^{T}Sy^{k}$ is the same. Hence, it does not matter which of these solutions is chosen, and the method can be also written as $x^{k+1}=x^{k}-B^{-1}A^{T}Sy^{k}$ . The pseudo-inverse leads just to one particular solution. The role of the pseudo-inverse is twofold:

It allows the method to be written in the explicit "random update" form as above,
It makes the analysis simple through the final, sixth, formulation.

6. Random Fixed Point

If we subtract $x^{*}$ from both sides of the random update formula, denote

Z:=A^{T}S\left(S^{T}AB^{-1}A^{T}S\right)^{\dagger }S^{T}A,

and use the fact that $Ax^{*}=b,$ we arrive at the last formulation:

x^{k+1}-x^{*}=\left(I-B^{-1}Z\right)\left(x^{k}-x^{*}\right),

where $I$ is the identity matrix. The iteration matrix, $I-B^{-1}Z,$ is random, whence the name of this formulation.

Convergence

By taking conditional expectations in the 6th formulation (conditional on $x^{k}$ ), we obtain

\mathbb {E} \left.\left[x^{k+1}-x^{*}\right|x^{k}\right]=\left(I-B^{-1}\mathbb {E} [Z]\right)\left[x^{k}-x^{*}\right].

By taking expectation again, and using the tower property of expectations, we obtain

\mathbb {E} \left[x^{k+1}-x^{*}\right]=(I-B^{-1}\mathbb {E} [Z])\mathbb {E} \left[x^{k}-x^{*}\right].

Gower and Richtarik^[14] show that

{\displaystyle \rho

where the matrix norm is defined by

\|M\|_{B}:=\max _{x\neq 0}{\frac {\|Mx\|_{B}}{\|x\|_{B}}}.

Moreover, without any assumptions on $S$ one has $0\leq \rho \leq 1.$ By taking norms and unrolling the recurrence, we obtain

Theorem [Gower & Richtarik 2015]

\left\|\mathbb {E} \left[x^{k}-x^{*}\right]\right\|_{B}\leq \rho ^{k}\|x^{0}-x^{*}\|_{B}.

Remark. A sufficient condition for the expected residuals to converge to 0 is $\rho <1.$ This can be achieved if $A$ has a full column rank and under very mild conditions on $S.$ Convergence of the method can be established also without the full column rank assumption in a different way.^[15]

It is also possible to show a stronger result:

Theorem [Gower & Richtarik 2015]

The expected squared norms (rather than norms of expectations) converge at the same rate:

\mathbb {E} \left\|\left[x^{k}-x^{*}\right]\right\|_{B}^{2}\leq \rho ^{k}\left\|x^{0}-x^{*}\right\|_{B}^{2}.

Remark. This second type of convergence is stronger due to the following identity^[14] which holds for any random vector $x$ and any fixed vector $x^{*}$ :

\left\|\mathbb {E} \left[x-x^{*}\right]\right\|^{2}=\mathbb {E} \left[\left\|x-x^{*}\right\|^{2}\right]-\mathbb {E} \left[\|x-\mathbb {E} [x]\|^{2}\right].

Convergence of Randomized Kaczmarz

We have seen that the randomized Kaczmarz method appears as a special case of the Gower-Richtarik method for $B=I$ and $S$ being the $i^{th}$ unit coordinate vector with probability $p_{i}=\|a_{i}\|_{2}^{2}/\|A\|_{F}^{2},$ where $a_{i}$ is the $i^{th}$ row of $A.$ It can be checked by direct calculation that

\rho =\|I-B^{-1}\mathbb {E} [Z]\|_{B}=1-{\frac {\lambda _{\min }(A^{T}A)}{\|A\|_{F}^{2}}}.

Further Special Cases

Algorithm 4: PLSS-Kaczmarz

Since the convergence of the (randomized) Kaczmarz method depends on a rate of convergence the method may make slow progress on some practical problems.^[10] To ensure finite termination of the method, Johannes Brust and Michael Saunders (academic) ^[16] have developed a process that generalizes the (randomized) Kaczmarz iteration and terminates in at most $m$ iterations to a solution for the consistent system $Ax=b$ . The process is based on Dimensionality reduction, or projections onto lower dimensional spaces, which is how it derives its name PLSS (Projected Linear Systems Solver). An iteration of PLSS-Kaczmarz can be regarded as the generalization

x^{k+1}=x^{k}+A_{:,1:k}^{T}(A_{1:k,:}A_{:,1:k}^{T})^{\dagger }(b_{1:k}-A_{1:k,:}x^{k})

where $A_{1:k,:}$ is the selection of rows 1 to $k$ and all columns of $A$ . A randomized version of the method uses $k$ non repeated row indices at each iteration: $\{i_{1},\ldots ,i_{k-1},i_{k}\}$ where each $i_{j}$ is in $1,2,...,m$ . The iteration converges to a solution when $k=m$ . In particular, since $A_{1:m,:}=A$ it holds that

Ax^{m+1}=Ax^{m}+AA^{T}(AA^{T})^{\dagger }(b-Ax^{m})=b

and therefore $x^{m+1}$ is a solution to the linear system. The computation of iterates in PLSS-Kaczmarz can be simplified and organized effectively. The resulting algorithm only requires matrix-vector products and has a direct form

algorithm PLSS-Kaczmarz isinput: matrix A right hand side boutput: solution x such that Ax=bx := 0, P = [0]forkin1,2,...,mdoa := A(i_k,:)'                   // Select an index i_k in 1,...,m without resamplingd := P' * ac₁ := norm(a)c₂ := norm(d)c₃ := (b_{i_k}-x'*a)/((c₁-c₂)*(c₁+c₂))p := c₃*(a - P*(P'*a))         P := [ P, p/norm(p) ]           // Append a normalized updatex := x + p      returnx

Notes

Related Research Articles

In mathematics, a system of linear equations is a collection of two or more linear equations involving the same variables. For example,

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

Covariance in probability theory and statistics is a measure of the joint variability of two random variables.

In geometry, a normal is an object that is perpendicular to a given object. For example, the normal line to a plane curve at a given point is the line perpendicular to the tangent line to the curve at the point.

In mathematics, and in particular linear algebra, the Moore–Penrose inverse⁠ $⁠$ of a matrix ⁠ $⁠$ , often called the pseudoinverse, is the most widely known generalization of the inverse matrix. It was independently described by E. H. Moore in 1920, Arne Bjerhammar in 1951, and Roger Penrose in 1955. Earlier, Erik Ivar Fredholm had introduced the concept of a pseudoinverse of integral operators in 1903. The terms pseudoinverse and generalized inverse are sometimes used as synonyms for the Moore–Penrose inverse of a matrix, but sometimes applied to other elements of algebraic structures which share some but not all properties expected for an inverse element.

The density matrix renormalization group (DMRG) is a numerical variational technique devised to obtain the low-energy physics of quantum many-body systems with high accuracy. As a variational method, DMRG is an efficient algorithm that attempts to find the lowest-energy matrix product state wavefunction of a Hamiltonian. It was invented in 1992 by Steven R. White and it is nowadays the most efficient method for 1-dimensional systems.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

In mathematics, the kernel of a linear map, also known as the null space or nullspace, is the part of the domain which is mapped to the zero vector of the co-domain; the kernel is always a linear subspace of the domain. That is, given a linear map $L : V \to W$ between two vector spaces $V$ and $W$ , the kernel of $L$ is the vector space of all elements $v$ of $V$ such that $L (v) = 0$ , where $0$ denotes the zero vector in $W$ , or more symbolically:

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In mathematics, Farkas' lemma is a solvability theorem for a finite system of linear inequalities. It was originally proven by the Hungarian mathematician Gyula Farkas. Farkas' lemma is the key result underpinning the linear programming duality and has played a central role in the development of mathematical optimization. It is used amongst other things in the proof of the Karush–Kuhn–Tucker theorem in nonlinear programming. Remarkably, in the area of the foundations of quantum theory, the lemma also underlies the complete set of Bell inequalities in the form of necessary and sufficient conditions for the existence of a local hidden-variable theory, given data from any specific set of measurements.

The Lanczos algorithm is an iterative method devised by Cornelius Lanczos that is an adaptation of power methods to find the $"most useful" eigenvalues and eigenvectors of an Hermitian matrix, where is often but not necessarily much smaller than . Although computationally efficient in principle, the method as initially formulated was not useful, due to its numerical instability.$

In numerical analysis, the Weierstrass method or Durand–Kerner method, discovered by Karl Weierstrass in 1891 and rediscovered independently by Durand in 1960 and Kerner in 1966, is a root-finding algorithm for solving polynomial equations. In other words, the method can be used to solve numerically the equation

In mathematics, the generalized minimal residual method (GMRES) is an iterative method for the numerical solution of an indefinite nonsymmetric system of linear equations. The method approximates the solution by the vector in a Krylov subspace with minimal residual. The Arnoldi iteration is used to find this vector.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

In mathematics, a system of equations is considered overdetermined if there are more equations than unknowns. An overdetermined system is almost always inconsistent when constructed with random coefficients. However, an overdetermined system will have solutions in some cases, for example if some equation occurs several times in the system, or if some equations are linear combinations of the others.

In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matrix being factorized is a normal or real symmetric matrix, the decomposition is called "spectral decomposition", derived from the spectral theorem.

In numerical linear algebra, the alternating-direction implicit (ADI) method is an iterative method used to solve Sylvester matrix equations. It is a popular method for solving the large matrix equations that arise in systems theory and control, and can be formulated to construct solutions in a memory-efficient, factored form. It is also used to numerically solve parabolic and elliptic partial differential equations, and is a classic method used for modeling heat conduction and solving the diffusion equation in two or more dimensions. It is an example of an operator splitting method.

The algebraic reconstruction technique (ART) is an iterative reconstruction technique used in computed tomography. It reconstructs an image from a series of angular projections. Gordon, Bender and Herman first showed its use in image reconstruction; whereas the method is known as Kaczmarz method in numerical linear algebra.

The Minimal Residual Method or MINRES is a Krylov subspace method for the iterative solution of symmetric linear equation systems. It was proposed by mathematicians Christopher Conway Paige and Michael Alan Saunders in 1975.

References

Kaczmarz, Stefan (1937), "Angenäherte Auflösung von Systemen linearer Gleichungen" (PDF), Bulletin International de l'Académie Polonaise des Sciences et des Lettres. Classe des Sciences Mathématiques et Naturelles. Série A, Sciences Mathématiques, vol. 35, pp. 355–357
Chong, Edwin K. P.; Zak, Stanislaw H. (2008), An Introduction to Optimization (3rd ed.), John Wiley & Sons, pp. 226–230
Gordon, Richard; Bender, Robert; Herman, Gabor (1970), "Algebraic reconstruction techniques (ART) for threedimensional electron microscopy and x-ray photography", Journal of Theoretical Biology, 29 (3): 471–481, Bibcode:1970JThBi..29..471G, doi:10.1016/0022-5193(70)90109-8, PMID 5492997
Gordon, Richard (2011), Stop breast cancer now! Imagining imaging pathways towards search, destroy, cure and watchful waiting of premetastasis breast cancer. In: Breast Cancer - A Lobar Disease, editor: Tibor Tot, Springer, pp. 167–203
Herman, Gabor (2009), Fundamentals of computerized tomography: Image reconstruction from projection (2nd ed.), Springer, ISBN 9781846287237
Censor, Yair; Zenios, S.A. (1997), Parallel optimization: theory, algorithms, and applications, New York: Oxford University Press
Aster, Richard; Borchers, Brian; Thurber, Clifford (2004), Parameter Estimation and Inverse Problems, Elsevier
Strohmer, Thomas; Vershynin, Roman (2009), "A randomized Kaczmarz algorithm for linear systems with exponential convergence" (PDF), Journal of Fourier Analysis and Applications, 15 (2): 262–278, arXiv: math/0702226 , doi:10.1007/s00041-008-9030-4, S2CID 1903919
Needell, Deanna; Srebro, Nati; Ward, Rachel (2015), "Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm", Mathematical Programming, 155 (1–2): 549–573, arXiv: 1310.5715 , doi:10.1007/s10107-015-0864-7, S2CID 2370209
Censor, Yair; Herman, Gabor; Jiang, M. (2009), "A note on the behavior of the randomized Kaczmarz algorithm of Strohmer and Vershynin", Journal of Fourier Analysis and Applications, 15 (4): 431–436, doi:10.1007/s00041-009-9077-x, PMC 2872793 , PMID 20495623
Strohmer, Thomas; Vershynin, Roman (2009b), "Comments on the randomized Kaczmarz method", Journal of Fourier Analysis and Applications, 15 (4): 437–440, doi:10.1007/s00041-009-9082-0, S2CID 14806325
Bass, Richard F.; Gröchenig, Karlheinz (2013), "Relevant sampling of band-limited functions", Illinois Journal of Mathematics, 57 (1): 43–58, arXiv: 1203.0146 , doi:10.1215/ijm/1403534485, S2CID 42705738
Gordon, Dan (2017), "A derandomization approach to recovering bandlimited signals across a wide range of random sampling rates", Numerical Algorithms, 77 (4): 1141–1157, doi:10.1007/s11075-017-0356-3, S2CID 1794974
Vinh Nguyen, Quang; Lumban Gaol, Ford (2011), Proceedings of the 2011 2nd International Congress on Computer Applications and Computational Science, vol. 2, Springer, pp. 465–469
Gower, Robert; Richtarik, Peter (2015a), "Randomized iterative methods for linear systems", SIAM Journal on Matrix Analysis and Applications, 36 (4): 1660–1690, arXiv: 1506.03296 , doi:10.1137/15M1025487, S2CID 8215294
Gower, Robert; Richtarik, Peter (2015b), "Stochastic dual ascent for solving linear systems", arXiv: 1512.06890 [math.NA]
Brust, Johannes J; Saunders, Michael A (2023), "PLSS: A Projected Linear Systems Solver", SIAM Journal on Scientific Computing, 45 (2): A1012–A1037, arXiv: 2207.07615 , Bibcode:2023SJSC...45A1012B, doi:10.1137/22M1509783

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] ↑ Kaczmarz (1937)

[2] ↑ Gordon, Bender & Herman (1970)

[3] ↑ Gordon (2011)

[Herman2009-4] ↑ Herman (2009)

[5] ↑ Censor & Zenios (1997)

[6] ↑ Aster, Borchers & Thurber (2004)

[7] ↑ See Herman (2009) and references therein.

[Strohmer_Vershynin_2009-8] 1 2 3 Strohmer & Vershynin (2009)

[Needell_Srebro_Ward_2014-9] ↑ Needell, Srebro & Ward (2015)

[Censor_Herman_Jiang_2009-10] 1 2 3 Censor, Herman & Jiang (2009)

[11] ↑ Strohmer & Vershynin (2009b)

[12] ↑ Bass & Gröchenig (2013)

[13] ↑ Gordon (2017)

[Gower_Richtarik_2015-14] 1 2 3 Gower & Richtarik (2015a)

[gower-richtarik2015.06890-15] ↑ Gower & Richtarik (2015b)

[Brust_Saunders_2023-16] ↑ Brust & Saunders (2023)

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

v t e Numerical linear algebra
Key concepts	Floating point Numerical stability
Problems	System of linear equations Matrix decompositions Matrix multiplication (algorithms) Matrix splitting Sparse problems
Hardware	CPU cache TLB Cache-oblivious algorithm SIMD Multiprocessing
Software	ATLAS MATLAB Basic Linear Algebra Subprograms (BLAS) LAPACK Specialized libraries General purpose software

Kaczmarz method

Contents

Algorithm 1: Kaczmarz algorithm

Algorithm 2: Randomized Kaczmarz algorithm

Proof

Algorithm 3: Gower-Richtarik algorithm

Insights about Randomized Kaczmarz

Six Equivalent Formulations

1. Sketch and Project

2. Constrain and Approximate

5. Random Update

6. Random Fixed Point

Convergence

Theorem [Gower & Richtarik 2015]

Theorem [Gower & Richtarik 2015]

Convergence of Randomized Kaczmarz

Further Special Cases

Algorithm 4: PLSS-Kaczmarz

Notes

Related Research Articles

References

External links