Dykstra's projection algorithm

Last updated June 27, 2024

Dykstra's algorithm is a method that computes a point in the intersection of convex sets, and is a variant of the alternating projection method (also called the projections onto convex sets method). In its simplest form, the method finds a point in the intersection of two convex sets by iteratively projecting onto each of the convex set; it differs from the alternating projection method in that there are intermediate steps. A parallel version of the algorithm was developed by Gaffke and Mathar.

A key difference between Dykstra's algorithm and the standard alternating projection method occurs when there is more than one point in the intersection of the two sets. In this case, the alternating projection method gives some arbitrary point in this intersection, whereas Dykstra's algorithm gives a specific point: the projection of r onto the intersection, where r is the initial point used in the algorithm,

Algorithm

Dykstra's algorithm finds for each $r$ the only ${\bar {x}}\in C\cap D$ such that:

\|{\bar {x}}-r\|^{2}\leq \|x-r\|^{2},{\text{for all }}x\in C\cap D,

where $C,D$ are convex sets. This problem is equivalent to finding the projection of $r$ onto the set $C\cap D$ , which we denote by ${\mathcal {P}}_{C\cap D}$ .

To use Dykstra's algorithm, one must know how to project onto the sets $C$ and $D$ separately.

First, consider the basic alternating projection (aka POCS) method (first studied, in the case when the sets $C,D$ were linear subspaces, by John von Neumann ^[1]), which initializes $x_{0}=r$ and then generates the sequence

x_{k+1}={\mathcal {P}}_{C}\left({\mathcal {P}}_{D}(x_{k})\right)

.

Dykstra's algorithm is of a similar form, but uses additional auxiliary variables. Start with $x_{0}=r,p_{0}=q_{0}=0$ and update by

y_{k}={\mathcal {P}}_{D}(x_{k}+p_{k})

p_{k+1}=x_{k}+p_{k}-y_{k}

x_{k+1}={\mathcal {P}}_{C}(y_{k}+q_{k})

q_{k+1}=y_{k}+q_{k}-x_{k+1}.

Then the sequence $(x_{k})$ converges to the solution of the original problem. For convergence results and a modern perspective on the literature, see ^[2].

Citations

↑ J. von Neumann, On rings of operators. Reduction theory, Ann. of Math. 50 (1949) 401–485 (a reprint of lecture notes first distributed in 1933).
↑ P. L. Combettes and J.-C. Pesquet, "Proximal splitting methods in signal processing," in: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, (H. H. Bauschke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Editors), pp. 185–212. Springer, New York, 2011

Related Research Articles

In mathematics, a contraction mapping, or contraction or contractor, on a metric space (M, d) is a function f from M to itself, with the property that there is some real number $such that for all x and y in M,$

In geometry, a subset of a Euclidean space, or more generally an affine space over the reals, is convex if, given any two points in the subset, the subset contains the whole line segment that joins them. Equivalently, a convex set or a convex region is a subset that intersects every line into a single line segment . For example, a solid cube is a convex set, but anything that is hollow or has an indent, for example, a crescent shape, is not convex.

In functional analysis and related areas of mathematics, locally convex topological vector spaces (LCTVS) or locally convex spaces are examples of topological vector spaces (TVS) that generalize normed spaces. They can be defined as topological vector spaces whose topology is generated by translations of balanced, absorbent, convex sets. Alternatively they can be defined as a vector space with a family of seminorms, and a topology can be defined in terms of that family. Although in general such spaces are not necessarily normable, the existence of a convex local base for the zero vector is strong enough for the Hahn–Banach theorem to hold, yielding a sufficiently rich theory of continuous linear functionals.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In mathematics, differential algebra is, broadly speaking, the area of mathematics consisting in the study of differential equations and differential operators as algebraic objects in view of deriving properties of differential equations and operators without computing the solutions, similarly as polynomial algebras are used for the study of algebraic varieties, which are solution sets of systems of polynomial equations. Weyl algebras and Lie algebras may be considered as belonging to differential algebra.

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

In the mathematical theory of functional analysis, the Krein–Milman theorem is a proposition about compact convex sets in locally convex topological vector spaces (TVSs).

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

In mathematics, specifically statistics and information geometry, a Bregman divergence or Bregman distance is a measure of difference between two points, defined in terms of a strictly convex function; they form an important class of divergences. When the points are interpreted as probability distributions – notably as either values of the parameter of a parametric model or as a data set of observed values – the resulting distance is a statistical distance. The most basic Bregman divergence is the squared Euclidean distance.

<span class="mw-page-title-main">Chan's algorithm</span> Algorithm for finding the convex hull of a set of points in the plane

In computational geometry, Chan's algorithm, named after Timothy M. Chan, is an optimal output-sensitive algorithm to compute the convex hull of a set $of points, in 2- or 3-dimensional space. The algorithm takes time, where is the number of vertices of the output. In the planar case, the algorithm combines an algorithm with Jarvis march, in order to obtain an optimal time. Chan's algorithm is notable because it is much simpler than the Kirkpatrick-Seidel algorithm, and it naturally extends to 3-dimensional space. This paradigm has been independently developed by Frank Nielsen in his Ph.D. thesis.$

Subgradient methods are convex optimization methods which use subderivatives. Originally developed by Naum Z. Shor and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, sub-gradient methods for unconstrained problems use the same search direction as the method of steepest descent.

The difference-map algorithm is a search algorithm for general constraint satisfaction problems. It is a meta-algorithm in the sense that it is built from more basic algorithms that perform projections onto constraint sets. From a mathematical perspective, the difference-map algorithm is a dynamical system based on a mapping of Euclidean space. Solutions are encoded as fixed points of the mapping.

In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input items into the same "buckets" with high probability. Since similar items end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventional hashing techniques in that hash collisions are maximized, not minimized. Alternatively, the technique can be seen as a way to reduce the dimensionality of high-dimensional data; high-dimensional input items can be reduced to low-dimensional versions while preserving relative distances between items.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

The Landweber iteration or Landweber algorithm is an algorithm to solve ill-posed linear inverse problems, and it has been extended to solve non-linear problems that involve constraints. The method was first proposed in the 1950s by Louis Landweber, and it can be now viewed as a special case of many other more general methods.

In mathematics, projections onto convex sets (POCS), sometimes known as the alternating projection method, is a method to find a point in the intersection of two closed convex sets. It is a very simple algorithm and has been rediscovered many times. The simplest case, when the sets are affine spaces, was analyzed by John von Neumann. The case when the sets are affine spaces is special, since the iterates not only converge to a point in the intersection but to the orthogonal projection of the point onto the intersection. For general closed convex sets, the limit point need not be the projection. Classical work on the case of two closed convex sets shows that the rate of convergence of the iterates is linear. There are now extensions that consider cases when there are more than two sets, or when the sets are not convex, or that give faster convergence rates. Analysis of POCS and related methods attempt to show that the algorithm converges, and whether it converges to the projection of the original point. These questions are largely known for simple cases, but a topic of active research for the extensions. There are also variants of the algorithm, such as Dykstra's projection algorithm. See the references in the further reading section for an overview of the variants, extensions and applications of the POCS method; a good historical background can be found in section III of.

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is $regularization of the form$

In mathematical optimization, the proximal operator is an operator associated with a proper, lower semi-continuous convex function $from a Hilbert space to, and is defined by:$

References

Boyle, J. P.; Dykstr, R. L. (1986). "A Method for Finding Projections onto the Intersection of Convex Sets in Hilbert Spaces". Advances in Order Restricted Statistical Inference. Lecture Notes in Statistics. Vol. 37. pp. 28–47. doi:10.1007/978-1-4613-9940-7_3. ISBN 978-0-387-96419-5.
Gaffke, N.; Mathar, R. (1989). "A cyclic projection algorithm via duality". Metrika. 36: 29–54. doi:10.1007/bf02614077. S2CID 120944669.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] J. von Neumann, On rings of operators. Reduction theory, Ann. of Math. 50 (1949) 401–485 (a reprint of lecture notes first distributed in 1933).

[2] P. L. Combettes and J.-C. Pesquet, "Proximal splitting methods in signal processing," in: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, (H. H. Bauschke, R. S. Burachik, P. L. Combettes, V. Elser, D. R. Luke, and H. Wolkowicz, Editors), pp. 185–212. Springer, New York, 2011

[1]

[2]

Dykstra's projection algorithm

Contents

Algorithm

Citations

Related Research Articles

References