Duality (optimization)

Last updated

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem (and vice versa). Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. [1] This fact is called weak duality .

Contents

In general, the optimal values of the primal and dual problems need not be equal. Their difference is called the duality gap. For convex optimization problems, the duality gap is zero under a constraint qualification condition. This fact is called strong duality .

Dual problem

Usually the term "dual problem" refers to the Lagrangian dual problem but other dual problems are used – for example, the Wolfe dual problem and the Fenchel dual problem. The Lagrangian dual problem is obtained by forming the Lagrangian of a minimization problem by using nonnegative Lagrange multipliers to add the constraints to the objective function, and then solving for the primal variable values that minimize the original objective function. This solution gives the primal variables as functions of the Lagrange multipliers, which are called dual variables, so that the new problem is to maximize the objective function with respect to the dual variables under the derived constraints on the dual variables (including at least the nonnegativity constraints).

In general given two dual pairs of separated locally convex spaces and and the function , we can define the primal problem as finding such that In other words, if exists, is the minimum of the function and the infimum (greatest lower bound) of the function is attained.

If there are constraint conditions, these can be built into the function by letting where is a suitable function on that has a minimum 0 on the constraints, and for which one can prove that . The latter condition is trivially, but not always conveniently, satisfied for the characteristic function (i.e. for satisfying the constraints and otherwise). Then extend to a perturbation function such that . [2]

The duality gap is the difference of the right and left hand sides of the inequality

where is the convex conjugate in both variables and denotes the supremum (least upper bound). [2] [3] [4]

Duality gap

The duality gap is the difference between the values of any primal solutions and any dual solutions. If is the optimal dual value and is the optimal primal value, then the duality gap is equal to . This value is always greater than or equal to 0 (for minimization problems). The duality gap is zero if and only if strong duality holds. Otherwise the gap is strictly positive and weak duality holds. [5]

In computational optimization, another "duality gap" is often reported, which is the difference in value between any dual solution and the value of a feasible but suboptimal iterate for the primal problem. This alternative "duality gap" quantifies the discrepancy between the value of a current feasible but suboptimal iterate for the primal problem and the value of the dual problem; the value of the dual problem is, under regularity conditions, equal to the value of the convex relaxation of the primal problem: The convex relaxation is the problem arising replacing a non-convex feasible set with its closed convex hull and with replacing a non-convex function with its convex closure, that is the function that has the epigraph that is the closed convex hull of the original primal objective function. [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16]

Linear case

Linear programming problems are optimization problems in which the objective function and the constraints are all linear. In the primal problem, the objective function is a linear combination of n variables. There are m constraints, each of which places an upper bound on a linear combination of the n variables. The goal is to maximize the value of the objective function subject to the constraints. A solution is a vector (a list) of n values that achieves the maximum value for the objective function.

In the dual problem, the objective function is a linear combination of the m values that are the limits in the m constraints from the primal problem. There are n dual constraints, each of which places a lower bound on a linear combination of m dual variables.

Relationship between the primal problem and the dual problem

In the linear case, in the primal problem, from each sub-optimal point that satisfies all the constraints, there is a direction or subspace of directions to move that increases the objective function. Moving in any such direction is said to remove slack between the candidate solution and one or more constraints. An infeasible value of the candidate solution is one that exceeds one or more of the constraints.

In the dual problem, the dual vector multiplies the constraints that determine the positions of the constraints in the primal. Varying the dual vector in the dual problem is equivalent to revising the upper bounds in the primal problem. The lowest upper bound is sought. That is, the dual vector is minimized in order to remove slack between the candidate positions of the constraints and the actual optimum. An infeasible value of the dual vector is one that is too low. It sets the candidate positions of one or more of the constraints in a position that excludes the actual optimum.

This intuition is made formal by the equations in Linear programming: Duality.

Nonlinear case

In nonlinear programming, the constraints are not necessarily linear. Nonetheless, many of the same principles apply.

To ensure that the global maximum of a non-linear problem can be identified easily, the problem formulation often requires that the functions be convex and have compact lower level sets. This is the significance of the Karush–Kuhn–Tucker conditions. They provide necessary conditions for identifying local optima of non-linear programming problems. There are additional conditions (constraint qualifications) that are necessary so that it will be possible to define the direction to an optimal solution. An optimal solution is one that is a local optimum, but possibly not a global optimum.

Lagrange duality

Motivation. [17] Suppose we want to solve the following nonlinear programming problem:

The problem has constraints; we would like to convert it to a program without constraints. Theoretically, it is possible to do it by minimizing the function J(x), defined as

where I is an infinite step function: I[u]=0 if u≤0, and I[u]=∞ otherwise. But J(x) is hard to solve as it is not continuous. It is possible to "approximate" I[u] by λu, where λ is a positive constant. This yields a function known as the lagrangian:

Note that, for every x,

.

Proof:

Therefore, the original problem is equivalent to:

.

By reversing the order of min and max, we get:

.

The dual function is the inner problem in the above formula:

.

The Lagrangian dual program is the program of maximizing g:

.

The optimal solution to the dual program is a lower bound for the optimal solution of the original (primal) program; this is the weak duality principle. If the primal problem is convex and bounded from below, and there exists a point in which all nonlinear constraints are strictly satisfied (Slater's condition), then the optimal solution to the dual program equals the optimal solution of the primal program; this is the strong duality principle. In this case, we can solve the primal program by finding an optimal solution λ* to the dual program, and then solving:

.

Note that, to use either the weak or the strong duality principle, we need a way to compute g(λ). In general this may be hard, as we need to solve a different minimization problem for every λ. But for some classes of functions, it is possible to get an explicit formula for g(). Solving the primal and dual programs together is often easier than solving only one of them. Examples are linear programming and quadratic programming. A better and more general approach to duality is provided by Fenchel's duality theorem. [18] :Sub.3.3.1

Another condition in which the min-max and max-min are equal is when the Lagrangian has a saddle point: (x∗, λ∗) is a saddle point of the Lagrange function L if and only if x∗ is an optimal solution to the primal, λ∗ is an optimal solution to the dual, and the optimal values in the indicated problems are equal to each other. [18] :Prop.3.2.2

The strong Lagrange principle

Given a nonlinear programming problem in standard form

with the domain having non-empty interior, the Lagrangian function is defined as

The vectors and are called the dual variables or Lagrange multiplier vectors associated with the problem. The Lagrange dual function is defined as

The dual function g is concave, even when the initial problem is not convex, because it is a point-wise infimum of affine functions. The dual function yields lower bounds on the optimal value of the initial problem; for any and any we have .

If a constraint qualification such as Slater's condition holds and the original problem is convex, then we have strong duality, i.e. .

Convex problems

For a convex minimization problem with inequality constraints,

the Lagrangian dual problem is

where the objective function is the Lagrange dual function. Provided that the functions and are continuously differentiable, the infimum occurs where the gradient is equal to zero. The problem

is called the Wolfe dual problem. This problem may be difficult to deal with computationally, because the objective function is not concave in the joint variables . Also, the equality constraint is nonlinear in general, so the Wolfe dual problem is typically a nonconvex optimization problem. In any case, weak duality holds. [19]

History

According to George Dantzig, the duality theorem for linear optimization was conjectured by John von Neumann immediately after Dantzig presented the linear programming problem. Von Neumann noted that he was using information from his game theory, and conjectured that two person zero sum matrix game was equivalent to linear programming. Rigorous proofs were first published in 1948 by Albert W. Tucker and his group. (Dantzig's foreword to Nering and Tucker, 1993)

Applications

In support vector machines (SVMs), formulating the primal problem of SVMs as the dual problem can be used to implement the Kernel trick, but the latter has higher time complexity in the historical cases.

See also

Notes

  1. Boyd, Stephen P.; Vandenberghe, Lieven (2004). Convex Optimization (pdf). Cambridge University Press. p. 216. ISBN   978-0-521-83378-3 . Retrieved October 15, 2011.
  2. 1 2 Boţ, Radu Ioan; Wanka, Gert; Grad, Sorin-Mihai (2009). Duality in Vector Optimization. Springer. ISBN   978-3-642-02885-4.
  3. Csetnek, Ernö Robert (2010). Overcoming the failure of the classical generalized interior-point regularity conditions in convex optimization. Applications of the duality theory to enlargements of maximal monotone operators. Logos Verlag Berlin GmbH. ISBN   978-3-8325-2503-3.
  4. Zălinescu, Constantin (2002). Convex analysis in general vector spaces . River Edge, NJ: World Scientific Publishing Co., Inc. pp.  106–113. ISBN   981-238-067-1. MR   1921556.
  5. Borwein, Jonathan; Zhu, Qiji (2005). Techniques of Variational Analysis. Springer. ISBN   978-1-4419-2026-3.
  6. Ahuja, Ravindra K.; Magnanti, Thomas L.; Orlin, James B. (1993). Network Flows: Theory, Algorithms and Applications. Prentice Hall. ISBN   0-13-617549-X.
  7. Bertsekas, Dimitri; Nedic, Angelia; Ozdaglar, Asuman (2003). Convex Analysis and Optimization. Athena Scientific. ISBN   1-886529-45-0.
  8. Bertsekas, Dimitri P. (1999). Nonlinear Programming (2nd ed.). Athena Scientific. ISBN   1-886529-00-0.
  9. Bertsekas, Dimitri P. (2009). Convex Optimization Theory. Athena Scientific. ISBN   978-1-886529-31-1.
  10. Bonnans, J. Frédéric; Gilbert, J. Charles; Lemaréchal, Claude; Sagastizábal, Claudia A. (2006). Numerical optimization: Theoretical and practical aspects. Universitext (Second revised ed. of translation of 1997 French ed.). Berlin: Springer-Verlag. pp. xiv+490. doi:10.1007/978-3-540-35447-5. ISBN   3-540-35445-X. MR   2265882.
  11. Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude (1993). Convex analysis and minimization algorithms, Volume I: Fundamentals. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Vol. 305. Berlin: Springer-Verlag. pp. xviii+417. ISBN   3-540-56850-6. MR   1261420.
  12. Hiriart-Urruty, Jean-Baptiste; Lemaréchal, Claude (1993). "14 Duality for Practitioners". Convex analysis and minimization algorithms, Volume II: Advanced theory and bundle methods. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Vol. 306. Berlin: Springer-Verlag. pp. xviii+346. ISBN   3-540-56852-2. MR   1295240.
  13. Lasdon, Leon S. (2002) [Reprint of the 1970 Macmillan]. Optimization theory for large systems. Mineola, New York: Dover Publications, Inc. pp. xiii+523. ISBN   978-0-486-41999-2. MR   1888251.
  14. Lemaréchal, Claude (2001). "Lagrangian relaxation". In Jünger, Michael; Naddef, Denis (eds.). Computational combinatorial optimization: Papers from the Spring School held in Schloß Dagstuhl, May 15–19, 2000. Lecture Notes in Computer Science (LNCS). Vol. 2241. Berlin: Springer-Verlag. pp. 112–156. doi:10.1007/3-540-45586-8_4. ISBN   3-540-42877-1. MR   1900016. S2CID   9048698.
  15. Minoux, Michel (1986). Mathematical programming: Theory and algorithms. Egon Balas (forward); Steven Vajda (trans) from the (1983 Paris: Dunod) French. Chichester: A Wiley-Interscience Publication. John Wiley & Sons, Ltd. pp. xxviii+489. ISBN   0-471-90170-9. MR   0868279. (2008 Second ed., in French: Programmation mathématique : Théorie et algorithmes, Éditions Tec & Doc, Paris, 2008. xxx+711 pp. ).
  16. Shapiro, Jeremy F. (1979). Mathematical programming: Structures and algorithms. New York: Wiley-Interscience [John Wiley & Sons]. pp.  xvi+388. ISBN   0-471-77886-9. MR   0544669.
  17. David Knowles (2010). "Lagrangian Duality for Dummies" (PDF).
  18. 1 2 Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization" (PDF).
  19. Geoffrion, Arthur M. (1971). "Duality in Nonlinear Programming: A Simplified Applications-Oriented Development". SIAM Review. 13 (1): 1–37. doi:10.1137/1013001. JSTOR   2028848.

Related Research Articles

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

<span class="mw-page-title-main">Linear programming</span> Method to solve optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming.

In mathematical optimization, the method of Lagrange multipliers is a strategy for finding the local maxima and minima of a function subject to equation constraints. It is named after the mathematician Joseph-Louis Lagrange.

<span class="mw-page-title-main">Optimal control</span> Mathematical way of attaining a desired output from a dynamic system

Optimal control theory is a branch of control theory that deals with finding a control for a dynamical system over a period of time such that an objective function is optimized. It has numerous applications in science, engineering and operations research. For example, the dynamical system might be a spacecraft with controls corresponding to rocket thrusters, and the objective might be to reach the Moon with minimum fuel expenditure. Or the dynamical system could be a nation's economy, with the objective to minimize unemployment; the controls in this case could be fiscal and monetary policy. A dynamical system may also be introduced to embed operations research problems within the framework of optimal control theory.

In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints are not linear equalities or the objective function is not a linear function. An optimization problem is one of calculation of the extrema of an objective function over a set of unknown real variables and conditional to the satisfaction of a system of equalities and inequalities, collectively termed constraints. It is the sub-field of mathematical optimization that deals with problems that are not linear.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

<span class="mw-page-title-main">Quasiconvex function</span> Mathematical function with convex lower level sets

In mathematics, a quasiconvex function is a real-valued function defined on an interval or on a convex subset of a real vector space such that the inverse image of any set of the form is a convex set. For a function of a single variable, along any stretch of the curve the highest point is one of the endpoints. The negative of a quasiconvex function is said to be quasiconcave.

In the field of mathematical optimization, Lagrangian relaxation is a relaxation method which approximates a difficult problem of constrained optimization by a simpler problem. A solution to the relaxed problem is an approximate solution to the original problem, and provides useful information.

Sequential quadratic programming (SQP) is an iterative method for constrained nonlinear optimization which may be considered a quasi-Newton method. SQP methods are used on mathematical problems for which the objective function and the constraints are twice continuously differentiable, but not necessarily convex.

Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes, and thus also belongs to the class of margin classifier algorithms.

The Bregman method is an iterative algorithm to solve certain convex optimization problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

In optimization problems in applied mathematics, the duality gap is the difference between the primal and dual solutions. If is the optimal dual value and is the optimal primal value then the duality gap is equal to . This value is always greater than or equal to 0. The duality gap is zero if and only if strong duality holds. Otherwise the gap is strictly positive and weak duality holds.

In mathematical optimization, the perturbation function is any function which relates to primal and dual problems. The name comes from the fact that any such function defines a perturbation of the initial problem. In many cases this takes the form of shifting the constraints.

In applied mathematics, weak duality is a concept in optimization which states that the duality gap is always greater than or equal to 0. This means that for any minimization problem, called the primal problem, the solution to the primal problem is always greater than or equal to the solution to the dual maximization problem. Alternatively, the solution to a primal maximization problem is always less than or equal to the solution to the dual minimization problem.

In mathematical optimization, Wolfe duality, named after Philip Wolfe, is type of dual problem in which the objective function and constraints are all differentiable functions. Using this concept a lower bound for a minimization problem can be found because of the weak duality principle.

The quadratic knapsack problem (QKP), first introduced in 19th century, is an extension of knapsack problem that allows for quadratic terms in the objective function: Given a set of items, each with a weight, a value, and an extra profit that can be earned if two items are selected, determine the number of items to include in a collection without exceeding capacity of the knapsack, so as to maximize the overall profit. Usually, quadratic knapsack problems come with a restriction on the number of copies of each kind of item: either 0, or 1. This special type of QKP forms the 0-1 quadratic knapsack problem, which was first discussed by Gallo et al. The 0-1 quadratic knapsack problem is a variation of knapsack problems, combining the features of unbounded knapsack problem, 0-1 knapsack problem and quadratic knapsack problem.

In dual decomposition a problem is broken into smaller subproblems and a solution to the relaxed problem is found. This method can be employed for MRF optimization. Dual decomposition is applied to markov logic programs as an inference technique.

References

Books

Articles