Convex optimization

Last updated

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets (or, equivalently, maximizing concave functions over convex sets). Many classes of convex optimization problems admit polynomial-time algorithms, [1] whereas mathematical optimization is in general NP-hard. [2] [3] [4]

Contents

Definition

Abstract form

A convex optimization problem is defined by two ingredients: [5] [6]

The goal of the problem is to find some attaining

.

In general, there are three options regarding the existence of a solution: [7] :chpt.4

Standard form

A convex optimization problem is in standard form if it is written as

where: [7] :chpt.4

The feasible set of the optimization problem consists of all points satisfying the inequality and the equality constraints. This set is convex because is convex, the sublevel sets of convex functions are convex, affine sets are convex, and the intersection of convex sets is convex. [7] :chpt.2

Many optimization problems can be equivalently formulated in this standard form. For example, the problem of maximizing a concave function can be re-formulated equivalently as the problem of minimizing the convex function . The problem of maximizing a concave function over a convex set is commonly called a convex optimization problem. [8]

Epigraph form (standard form with linear objective)

In the standard form it is possible to assume, without loss of generality, that the objective function f is a linear function. This is because any program with a general objective can be transformed into a program with a linear objective by adding a single variable t and a single constraint, as follows: [9] :1.4

Conic form

Every convex program can be presented in a conic form, which means minimizing a linear objective over the intersection of an affine plane and a convex cone: [9] :5.1

where K is a closed pointed convex cone, L is a linear subspace of Rn, and b is a vector in Rn. A linear program in standard form in the special case in which K is the nonnegative orthant of Rn.

Eliminating linear equality constraints

It is possible to convert a convex program in standard form, to a convex program with no equality constraints. [7] :132 Denote the equality constraints hi(x)=0 as Ax=b, where A has n columns. If Ax=b is infeasible, then of course the original problem is infeasible. Otherwise, it has some solution x0 , and the set of all solutions can be presented as: Fz+x0, where z is in Rk, k=n-rank(A), and F is an n-by-k matrix. Substituting x = Fz+x0 in the original problem gives:

where the variables are z. Note that there are rank(A) fewer variables. This means that, in principle, one can restrict attention to convex optimization problems without equality constraints. In practice, however, it is often preferred to retain the equality constraints, since they might make some algorithms more efficient, and also make the problem easier to understand and analyze.

Special cases

The following problem classes are all convex optimization problems, or can be reduced to convex optimization problems via simple transformations: [7] :chpt.4 [10]

A hierarchy of convex optimization problems. (LP: linear programming, QP: quadratic programming, SOCP second-order cone program, SDP: semidefinite programming, CP: conic optimization.) Hierarchy compact convex.png
A hierarchy of convex optimization problems. (LP: linear programming, QP: quadratic programming, SOCP second-order cone program, SDP: semidefinite programming, CP: conic optimization.)

Other special cases include;

Properties

The following are useful properties of convex optimization problems: [11] [7] :chpt.4

These results are used by the theory of convex minimization along with geometric notions from functional analysis (in Hilbert spaces) such as the Hilbert projection theorem, the separating hyperplane theorem, and Farkas' lemma.[ citation needed ]

Algorithms

Unconstrained and equality-constrained problems

The convex programs easiest to solve are the unconstrained problems, or the problems with only equality constraints. As the equality constraints are all linear, they can be eliminated with linear algebra and integrated into the objective, thus converting an equality-constrained problem into an unconstrained one.

In the class of unconstrained (or equality-constrained) problems, the simplest ones are those in which the objective is quadratic. For these problems, the KKT conditions (which are necessary for optimality) are all linear, so they can be solved analytically. [7] :chpt.11

For unconstrained (or equality-constrained) problems with a general convex objective that is twice-differentiable, Newton's method can be used. It can be seen as reducing a general unconstrained convex problem, to a sequence of quadratic problems. [7] :chpt.11Newton's method can be combined with line search for an appropriate step size, and it can be mathematically proven to converge quickly.

Other efficient algorithms for unconstrained minimization are gradient descent (a special case of steepest descent).

General problems

The more challenging problems are those with inequality constraints. A common way to solve them is to reduce them to unconstrained problems by adding a barrier function, enforcing the inequality constraints, to the objective function. Such methods are called interior point methods. [7] :chpt.11They have to be initialized by finding a feasible interior point using by so-called phase I methods, which either find a feasible point or show that none exist. Phase I methods generally consist of reducing the search in question to a simpler convex optimization problem. [7] :chpt.11

Convex optimization problems can also be solved by the following contemporary methods: [12]

Subgradient methods can be implemented simply and so are widely used. [15] Dual subgradient methods are subgradient methods applied to a dual problem. The drift-plus-penalty method is similar to the dual subgradient method, but takes a time average of the primal variables.[ citation needed ]

Lagrange multipliers

Consider a convex minimization problem given in standard form by a cost function and inequality constraints for . Then the domain is:

The Lagrangian function for the problem is [16]

For each point in that minimizes over , there exist real numbers called Lagrange multipliers, that satisfy these conditions simultaneously:

  1. minimizes over all
  2. with at least one
  3. (complementary slackness).

If there exists a "strictly feasible point", that is, a point satisfying

then the statement above can be strengthened to require that .

Conversely, if some in satisfies (1)–(3) for scalars with then is certain to minimize over .

Software

There is a large software ecosystem for convex optimization. This ecosystem has two main categories: solvers on the one hand and modeling tools (or interfaces) on the other hand.

Solvers implement the algorithms themselves and are usually written in C. They require users to specify optimization problems in very specific formats which may not be natural from a modeling perspective. Modeling tools are separate pieces of software that let the user specify an optimization in higher-level syntax. They manage all transformations to and from the user's high-level model and the solver's input/output format.

The table below shows a mix of modeling tools (such as CVXPY and Convex.jl) and solvers (such as CVXOPT and MOSEK). This table is by no means exhaustive.

ProgramLanguageDescription FOSS?Ref
CVX MATLAB Interfaces with SeDuMi and SDPT3 solvers; designed to only express convex optimization problems.Yes [17]
CVXMOD Python Interfaces with the CVXOPT solver.Yes [17]
CVXPYPython [18]
Convex.jl Julia Disciplined convex programming, supports many solvers.Yes [19]
CVXR R Yes [20]
YALMIPMATLAB, OctaveInterfaces with CPLEX, GUROBI, MOSEK, SDPT3, SEDUMI, CSDP, SDPA, PENNON solvers; also supports integer and nonlinear optimization, and some nonconvex optimization. Can perform robust optimization with uncertainty in LP/SOCP/SDP constraints.Yes [17]
LMI labMATLABExpresses and solves semidefinite programming problems (called "linear matrix inequalities")No [17]
LMIlab translatorTransforms LMI lab problems into SDP problems.Yes [17]
xLMIMATLABSimilar to LMI lab, but uses the SeDuMi solver.Yes [17]
AIMMSCan do robust optimization on linear programming (with MOSEK to solve second-order cone programming) and mixed integer linear programming. Modeling package for LP + SDP and robust versions.No [17]
ROMEModeling system for robust optimization. Supports distributionally robust optimization and uncertainty sets.Yes [17]
GloptiPoly 3MATLAB,

Octave

Modeling system for polynomial optimization.Yes [17]
SOSTOOLSModeling system for polynomial optimization. Uses SDPT3 and SeDuMi. Requires Symbolic Computation Toolbox.Yes [17]
SparsePOPModeling system for polynomial optimization. Uses the SDPA or SeDuMi solvers.Yes [17]
CPLEXSupports primal-dual methods for LP + SOCP. Can solve LP, QP, SOCP, and mixed integer linear programming problems.No [17]
CSDP C Supports primal-dual methods for LP + SDP. Interfaces available for MATLAB, R, and Python. Parallel version available. SDP solver.Yes [17]
CVXOPT PythonSupports primal-dual methods for LP + SOCP + SDP. Uses Nesterov-Todd scaling. Interfaces to MOSEK and DSDP.Yes [17]
MOSEKSupports primal-dual methods for LP + SOCP.No [17]
SeDuMiMATLAB, Octave, MEX Solves LP + SOCP + SDP. Supports primal-dual methods for LP + SOCP + SDP.Yes [17]
SDPA C++ Solves LP + SDP. Supports primal-dual methods for LP + SDP. Parallelized and extended precision versions are available.Yes [17]
SDPT3MATLAB, Octave, MEXSolves LP + SOCP + SDP. Supports primal-dual methods for LP + SOCP + SDP.Yes [17]
ConicBundleSupports general-purpose codes for LP + SOCP + SDP. Uses a bundle method. Special support for SDP and SOCP constraints.Yes [17]
DSDPSupports general-purpose codes for LP + SDP. Uses a dual interior point method.Yes [17]
LOQOSupports general-purpose codes for SOCP, which it treats as a nonlinear programming problem.No [17]
PENNONSupports general-purpose codes. Uses an augmented Lagrangian method, especially for problems with SDP constraints.No [17]
SDPLRSupports general-purpose codes. Uses low-rank factorization with an augmented Lagrangian method.Yes [17]
GAMSModeling system for linear, nonlinear, mixed integer linear/nonlinear, and second-order cone programming problems.No [17]
Optimization ServicesXML standard for encoding optimization problems and solutions. [17]

Applications

Convex optimization can be used to model problems in a wide range of disciplines, such as automatic control systems, estimation and signal processing, communications and networks, electronic circuit design, [7] :17 data analysis and modeling, finance, statistics (optimal experimental design), [21] and structural optimization, where the approximation concept has proven to be efficient. [7] [22] Convex optimization can be used to model problems in the following fields:

Extensions

Extensions of convex optimization include the optimization of biconvex, pseudo-convex, and quasiconvex functions. Extensions of the theory of convex analysis and iterative methods for approximately solving non-convex minimization problems occur in the field of generalized convexity, also known as abstract convex analysis.[ citation needed ]

See also

Notes

  1. 1 2 Nesterov & Nemirovskii 1994
  2. Murty, Katta; Kabadi, Santosh (1987). "Some NP-complete problems in quadratic and nonlinear programming". Mathematical Programming. 39 (2): 117–129. doi:10.1007/BF02592948. hdl: 2027.42/6740 . S2CID   30500771.
  3. Sahni, S. "Computationally related problems," in SIAM Journal on Computing, 3, 262--279, 1974.
  4. Pardalos, Panos M.; Vavasis, Stephen A. (1991). "Quadratic programming with one negative eigenvalue is NP-hard". Journal of Global Optimization. 1: 15–22. doi:10.1007/BF00120662.
  5. Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude (1996). Convex analysis and minimization algorithms: Fundamentals. Springer. p. 291. ISBN   9783540568506.
  6. Ben-Tal, Aharon; Nemirovskiĭ, Arkadiĭ Semenovich (2001). Lectures on modern convex optimization: analysis, algorithms, and engineering applications. pp. 335–336. ISBN   9780898714913.
  7. 1 2 3 4 5 6 7 8 9 10 11 12 Boyd, Stephen; Vandenberghe, Lieven (2004). Convex Optimization (PDF). Cambridge University Press. ISBN   978-0-521-83378-3 . Retrieved 12 Apr 2021.
  8. "Optimization Problem Types - Convex Optimization". 9 January 2011.
  9. 1 2 Arkadi Nemirovsky (2004). Interior point polynomial-time methods in convex programming.
  10. Agrawal, Akshay; Verschueren, Robin; Diamond, Steven; Boyd, Stephen (2018). "A rewriting system for convex optimization problems" (PDF). Control and Decision. 5 (1): 42–60. arXiv: 1709.04494 . doi:10.1080/23307706.2017.1397554. S2CID   67856259.
  11. Rockafellar, R. Tyrrell (1993). "Lagrange multipliers and optimality" (PDF). SIAM Review. 35 (2): 183–238. CiteSeerX   10.1.1.161.7209 . doi:10.1137/1035044.
  12. For methods for convex minimization, see the volumes by Hiriart-Urruty and Lemaréchal (bundle) and the textbooks by Ruszczyński, Bertsekas, and Boyd and Vandenberghe (interior point).
  13. Nesterov, Yurii; Arkadii, Nemirovskii (1995). Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial and Applied Mathematics. ISBN   978-0898715156.
  14. Peng, Jiming; Roos, Cornelis; Terlaky, Tamás (2002). "Self-regular functions and new search directions for linear and semidefinite optimization". Mathematical Programming. 93 (1): 129–171. doi:10.1007/s101070200296. ISSN   0025-5610. S2CID   28882966.
  15. "Numerical Optimization". Springer Series in Operations Research and Financial Engineering. 2006. doi:10.1007/978-0-387-40065-5. ISBN   978-0-387-30303-1.
  16. Beavis, Brian; Dobbs, Ian M. (1990). "Static Optimization". Optimization and Stability Theory for Economic Analysis. New York: Cambridge University Press. p. 40. ISBN   0-521-33605-8.
  17. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 Borchers, Brian. "An Overview Of Software For Convex Optimization" (PDF). Archived from the original (PDF) on 2017-09-18. Retrieved 12 Apr 2021.
  18. "Welcome to CVXPY 1.1 — CVXPY 1.1.11 documentation". www.cvxpy.org. Retrieved 2021-04-12.
  19. Udell, Madeleine; Mohan, Karanveer; Zeng, David; Hong, Jenny; Diamond, Steven; Boyd, Stephen (2014-10-17). "Convex Optimization in Julia". arXiv: 1410.4821 [math.OC].
  20. "Disciplined Convex Optimiation - CVXR". www.cvxgrp.org. Retrieved 2021-06-17.
  21. Chritensen/Klarbring, chpt. 4.
  22. Schmit, L.A.; Fleury, C. 1980: Structural synthesis by combining approximation concepts and dual methods. J. Amer. Inst. Aeronaut. Astronaut 18, 1252-1260
  23. 1 2 3 4 5 Boyd, Stephen; Diamond, Stephen; Zhang, Junzi; Agrawal, Akshay. "Convex Optimization Applications" (PDF). Archived (PDF) from the original on 2015-10-01. Retrieved 12 Apr 2021.
  24. 1 2 3 Malick, Jérôme (2011-09-28). "Convex optimization: applications, formulations, relaxations" (PDF). Archived (PDF) from the original on 2021-04-12. Retrieved 12 Apr 2021.
  25. Ben Haim Y. and Elishakoff I., Convex Models of Uncertainty in Applied Mechanics, Elsevier Science Publishers, Amsterdam, 1990
  26. Ahmad Bazzi, Dirk TM Slock, and Lisa Meilhac. "Online angle of arrival estimation in the presence of mutual coupling." 2016 IEEE Statistical Signal Processing Workshop (SSP). IEEE, 2016.

Related Research Articles

In mathematics, and more specifically in linear algebra, a linear map is a mapping between two vector spaces that preserves the operations of vector addition and scalar multiplication. The same names and the same definition are also used for the more general case of modules over a ring; see Module homomorphism.

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

In machine learning, support vector machines are supervised max-margin models with associated learning algorithms that analyze data for classification and regression analysis. Developed at AT&T Bell Laboratories, SVMs are one of the most studied models, being based on statistical learning frameworks of VC theory proposed by Vapnik and Chervonenkis (1974).

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

In mathematics, the Hessian matrix, Hessian or Hesse matrix is a square matrix of second-order partial derivatives of a scalar-valued function, or scalar field. It describes the local curvature of a function of many variables. The Hessian matrix was developed in the 19th century by the German mathematician Ludwig Otto Hesse and later named after him. Hesse originally used the term "functional determinants". The Hessian is sometimes denoted by H or, ambiguously, by ∇2.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

<span class="mw-page-title-main">Regularization (mathematics)</span> Technique to make a model more generalizable and transferable

In mathematics, statistics, finance, and computer science, particularly in machine learning and inverse problems, regularization is a process that converts the answer of a problem to a simpler one. It is often used in solving ill-posed problems or to prevent overfitting.

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

<span class="mw-page-title-main">Differential evolution</span> Method of mathematical optimization

In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the optimized problem and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

In mathematical optimization, constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied.

<span class="mw-page-title-main">Quasiconvex function</span> Mathematical function with convex lower level sets

In mathematics, a quasiconvex function is a real-valued function defined on an interval or on a convex subset of a real vector space such that the inverse image of any set of the form is a convex set. For a function of a single variable, along any stretch of the curve the highest point is one of the endpoints. The negative of a quasiconvex function is said to be quasiconcave.

In the field of mathematical optimization, Lagrangian relaxation is a relaxation method which approximates a difficult problem of constrained optimization by a simpler problem. A solution to the relaxed problem is an approximate solution to the original problem, and provides useful information.

Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization which may be considered a quasi-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable, but not necessarily convex.

Subgradient methods are convex optimization methods which use subderivatives. Originally developed by Naum Z. Shor and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, sub-gradient methods for unconstrained problems use the same search direction as the method of steepest descent.

In mathematical optimization, linear-fractional programming (LFP) is a generalization of linear programming (LP). Whereas the objective function in a linear program is a linear function, the objective function in a linear-fractional program is a ratio of two linear functions. A linear program can be regarded as a special case of a linear-fractional program in which the denominator is the constant function 1.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

Sparse dictionary learning is a representation learning method which aims at finding a sparse representation of the input data in the form of a linear combination of basic elements as well as those basic elements themselves. These elements are called atoms and they compose a dictionary. Atoms in the dictionary are not required to be orthogonal, and they may be an over-complete spanning set. This problem setup also allows the dimensionality of the signals being represented to be higher than the one of the signals being observed. The above two properties lead to having seemingly redundant atoms that allow multiple representations of the same signal but also provide an improvement in sparsity and flexibility of the representation.

References