Projections onto convex sets

Last updated

In mathematics, projections onto convex sets (POCS), sometimes known as the alternating projection method, is a method to find a point in the intersection of two closed convex sets. It is a very simple algorithm and has been rediscovered many times. [1] The simplest case, when the sets are affine spaces, was analyzed by John von Neumann. [2] [3] The case when the sets are affine spaces is special, since the iterates not only converge to a point in the intersection (assuming the intersection is non-empty) but to the orthogonal projection of the point onto the intersection. For general closed convex sets, the limit point need not be the projection. Classical work on the case of two closed convex sets shows that the rate of convergence of the iterates is linear. [4] [5] There are now extensions that consider cases when there are more than two sets, or when the sets are not convex, [6] or that give faster convergence rates. Analysis of POCS and related methods attempt to show that the algorithm converges (and if so, find the rate of convergence), and whether it converges to the projection of the original point. These questions are largely known for simple cases, but a topic of active research for the extensions. There are also variants of the algorithm, such as Dykstra's projection algorithm. See the references in the further reading section for an overview of the variants, extensions and applications of the POCS method; a good historical background can be found in section III of. [7]

Contents

Algorithm

Example on two circles Projections onto convex sets circles.svg
Example on two circles

The POCS algorithm solves the following problem:

where C and D are closed convex sets.

To use the POCS algorithm, one must know how to project onto the sets C and D separately, via the projections .

The algorithm starts with an arbitrary value for and then generates the sequence

The simplicity of the algorithm explains some of its popularity. If the intersection of C and D is non-empty, then the sequence generated by the algorithm will converge to some point in this intersection.

Unlike Dykstra's projection algorithm, the solution need not be a projection onto the intersection C and D.

Example of averaged projections variant Projections onto convex avg sets circles.svg
Example of averaged projections variant

The method of averaged projections is quite similar. For the case of two closed convex sets C and D, it proceeds by

It has long been known to converge globally. [8] Furthermore, the method is easy to generalize to more than two sets; some convergence results for this case are in. [9]

The averaged projections method can be reformulated as alternating projections method using a standard trick. Consider the set

which is defined in the product space . Then define another set, also in the product space:

Thus finding is equivalent to finding .

To find a point in , use the alternating projection method. The projection of a vector onto the set F is given by . Hence

Since and assuming , then for all , and hence we can simplify the iteration to .

Related Research Articles

In mathematics, a topological vector space is one of the basic structures investigated in functional analysis. A topological vector space is a vector space that is also a topological space with the property that the vector space operations are also continuous functions. Such a topology is called a vector topology and every topological vector space has a uniform topological structure, allowing a notion of uniform convergence and completeness. Some authors also require that the space is a Hausdorff space. One of the most widely studied categories of TVSs are locally convex topological vector spaces. This article focuses on TVSs that are not necessarily locally convex. Banach spaces, Hilbert spaces and Sobolev spaces are other well-known examples of TVSs.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

The theory of functions of several complex variables is the branch of mathematics dealing with functions defined on the complex coordinate space , that is, n-tuples of complex numbers. The name of the field dealing with the properties of these functions is called several complex variables, which the Mathematics Subject Classification has as a top-level heading.

In functional analysis and related areas of mathematics, locally convex topological vector spaces (LCTVS) or locally convex spaces are examples of topological vector spaces (TVS) that generalize normed spaces. They can be defined as topological vector spaces whose topology is generated by translations of balanced, absorbent, convex sets. Alternatively they can be defined as a vector space with a family of seminorms, and a topology can be defined in terms of that family. Although in general such spaces are not necessarily normable, the existence of a convex local base for the zero vector is strong enough for the Hahn–Banach theorem to hold, yielding a sufficiently rich theory of continuous linear functionals.

Multi-task learning (MTL) is a subfield of machine learning in which multiple learning tasks are solved at the same time, while exploiting commonalities and differences across tasks. This can result in improved learning efficiency and prediction accuracy for the task-specific models, when compared to training the models separately. Early versions of MTL were called "hints".

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard. With recent advancements in computing and optimization algorithms, convex programming is nearly as straightforward as linear programming.

In mathematics, particularly in functional analysis, a bornological space is a type of space which, in some sense, possesses the minimum amount of structure needed to address questions of boundedness of sets and linear maps, in the same way that a topological space possesses the minimum amount of structure needed to address questions of continuity. Bornological spaces are distinguished by the property that a linear map from a bornological space into any locally convex spaces is continuous if and only if it is a bounded linear operator.

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

Subgradient methods are convex optimization methods which use subderivatives. Originally developed by Naum Z. Shor and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, sub-gradient methods for unconstrained problems use the same search direction as the method of steepest descent.

<span class="mw-page-title-main">Online machine learning</span> Method of machine learning

In computer science, online machine learning is a method of machine learning in which data becomes available in a sequential order and is used to update the best predictor for future data at each step, as opposed to batch learning techniques which generate the best predictor by learning on the entire training data set at once. Online learning is a common technique used in areas of machine learning where it is computationally infeasible to train over the entire dataset, requiring the need of out-of-core algorithms. It is also used in situations where it is necessary for the algorithm to dynamically adapt to new patterns in the data, or when the data itself is generated as a function of time, e.g., stock price prediction. Online learning algorithms may be prone to catastrophic interference, a problem that can be addressed by incremental learning approaches.

Dykstra's algorithm is a method that computes a point in the intersection of convex sets, and is a variant of the alternating projection method. In its simplest form, the method finds a point in the intersection of two convex sets by iteratively projecting onto each of the convex set; it differs from the alternating projection method in that there are intermediate steps. A parallel version of the algorithm was developed by Gaffke and Mathar.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

<span class="mw-page-title-main">Proximal gradient method</span>

Proximal gradient methods are a generalized form of projection used to solve non-differentiable convex optimization problems.

Proximal gradientmethods for learning is an area of research in optimization and statistical learning theory which studies algorithms for a general class of convex regularization problems where the regularization penalty may not be differentiable. One such example is regularization of the form

In mathematical optimization, the proximal operator is an operator associated with a proper, lower semi-continuous convex function from a Hilbert space to , and is defined by:

In functional analysis and related areas of mathematics, a complete topological vector space is a topological vector space (TVS) with the property that whenever points get progressively closer to each other, then there exists some point towards which they all get closer. The notion of "points that get progressively closer" is made rigorous by Cauchy nets or Cauchy filters, which are generalizations of Cauchy sequences, while "point towards which they all get closer" means that this Cauchy net or filter converges to The notion of completeness for TVSs uses the theory of uniform spaces as a framework to generalize the notion of completeness for metric spaces. But unlike metric-completeness, TVS-completeness does not depend on any metric and is defined for all TVSs, including those that are not metrizable or Hausdorff.

In mathematics, especially functional analysis, a bornology on a vector space over a field where has a bornology ℬ, is called a vector bornology if makes the vector space operations into bounded maps.

<span class="mw-page-title-main">Chambolle-Pock algorithm</span> Primal-Dual algorithm optimization for convex problems

In mathematics, the Chambolle-Pock algorithm is an algorithm used to solve convex optimization problems. It was introduced by Antonin Chambolle and Thomas Pock in 2011 and has since become a widely used method in various fields, including image processing, computer vision, and signal processing.

References

  1. Bauschke, H.H.; Borwein, J.M. (1996). "On projection algorithms for solving convex feasibility problems". SIAM Review. 38 (3): 367–426. CiteSeerX   10.1.1.49.4940 . doi:10.1137/S0036144593251710.
  2. J. von Neumann,Neumann, John Von (1949). "On rings of operators. Reduction theory". Ann. of Math. 50 (2): 401–485. doi:10.2307/1969463. JSTOR   1969463. (a reprint of lecture notes first distributed in 1933)
  3. J. von Neumann. Functional Operators, volume II. Princeton University Press, Princeton, NJ, 1950. Reprint of mimeographed lecture notes first distributed in 1933.
  4. Gubin, L.G.; Polyak, B.T.; Raik, E.V. (1967). "The method of projections for finding the common point of convex sets". U.S.S.R. Computational Mathematics and Mathematical Physics. 7 (6): 1–24. doi:10.1016/0041-5553(67)90113-9.
  5. Bauschke, H.H.; Borwein, J.M. (1993). "On the convergence of von Neumann's alternating projection algorithm for two sets". Set-Valued Analysis. 1 (2): 185–212. doi:10.1007/bf01027691. S2CID   121602545.
  6. Lewis, Adrian S.; Malick, Jérôme (2008). "Alternating Projections on Manifolds". Mathematics of Operations Research . 33: 216–234. CiteSeerX   10.1.1.416.6182 . doi:10.1287/moor.1070.0291.
  7. Combettes, P. L. (1993). "The foundations of set theoretic estimation" (PDF). Proceedings of the IEEE. 81 (2): 182–208. doi:10.1109/5.214546. Archived from the original (PDF) on 2015-06-14. Retrieved 2012-10-09.
  8. A. Auslender. Methodes Numeriques pour la Resolution des Problems d’Optimisation avec Constraintes. PhD thesis, Faculte des Sciences, Grenoble, 1969
  9. Lewis, A. S.; Luke, D. R.; Malick, J. (2009). "Local convergence for alternating and averaged nonconvex projections". Foundations of Computational Mathematics. 9 (4): 485–513. arXiv: 0709.0109 . doi:10.1007/s10208-008-9036-y.

Further reading