Ellipsoid method

Last updated

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

Contents

When specialized to solving feasible linear optimization problems with rational data, the ellipsoid method is an algorithm which finds an optimal solution in a number of steps that is polynomial in the input size.

History

The ellipsoid method has a long history. As an iterative method, a preliminary version was introduced by Naum Z. Shor. In 1972, an approximation algorithm for real convex minimization was studied by Arkadi Nemirovski and David B. Yudin (Judin).

As an algorithm for solving linear programming problems with rational data, the ellipsoid algorithm was studied by Leonid Khachiyan; Khachiyan's achievement was to prove the polynomial-time solvability of linear programs. This was a notable step from a theoretical perspective: The standard algorithm for solving linear problems at the time was the simplex algorithm, which has a run time that typically is linear in the size of the problem, but for which examples exist for which it is exponential in the size of the problem. As such, having an algorithm that is guaranteed to be polynomial for all cases seemed like a theoretical breakthrough.

Khachiyan's work showed, for the first time, that there can be algorithms for solving linear programs whose runtime can be proven to be polynomial. In practice, however, the algorithm is fairly slow and of little practical interest, though it provided inspiration for later work that turned out to be of much greater practical use. Specifically, Karmarkar's algorithm, an interior-point method, is much faster than the ellipsoid method in practice. Karmarkar's algorithm is also faster in the worst case.

The ellipsoidal algorithm allows complexity theorists to achieve (worst-case) bounds that depend on the dimension of the problem and on the size of the data, but not on the number of rows, so it remained important in combinatorial optimization theory for many years. [1] [2] [3] [4] Only in the 21st century have interior-point algorithms with similar complexity properties appeared.[ citation needed ]

Description

A convex minimization problem consists of the following ingredients.

We are also given an initial ellipsoid defined as

containing a minimizer , where and is the center of .

Finally, we require the existence of a separation oracle for the convex set . Given a point , the oracle should return one of two answers: [5]

The output of the ellipsoid method is either:

Inequality-constrained minimization of a function that is zero everywhere corresponds to the problem of simply identifying any feasible point. It turns out that any linear programming problem can be reduced to a linear feasibility problem (e.g. minimize the zero function subject to some linear inequality and equality constraints). One way to do this is by combining the primal and dual linear programs together into one program, and adding the additional (linear) constraint that the value of the primal solution is no worse than the value of the dual solution. Another way is to treat the objective of the linear program as an additional constraint, and use binary search to find the optimum value.[ citation needed ]

Unconstrained minimization

At the k-th iteration of the algorithm, we have a point at the center of an ellipsoid

We query the cutting-plane oracle to obtain a vector such that

We therefore conclude that

We set to be the ellipsoid of minimal volume containing the half-ellipsoid described above and compute . The update is given by

where

The stopping criterion is given by the property that

Sample sequence of iterates

Inequality-constrained minimization

At the k-th iteration of the algorithm for constrained minimization, we have a point at the center of an ellipsoid as before. We also must maintain a list of values recording the smallest objective value of feasible iterates so far. Depending on whether or not the point is feasible, we perform one of two tasks:

for all feasible z.

Performance in convex programs

Theoretical run-time complexity guarantee

The run-time complexity guarantee of the ellipsoid method in the real RAM model is given by the following theorem. [6] :Thm.8.3.1

Consider a family of convex optimization problems of the form: minimize f(x) s.t. x is in G, where f is a convex function and G is a convex set (a subset of an Euclidean space Rn). Each problem p in the family is represented by a data-vector Data(p), e.g., the real-valued coefficients in matrices and vectors representing the function f and the feasible region G. The size of a problem p, Size(p), is defined as the number of elements (real numbers) in Data(p). The following assumptions are needed:

  1. G (the feasible region) is:
    • Bounded;
    • Has a non-empty interior (so there is a strictly-feasible point);
  2. Given Data(p), one can compute using poly(Size(p)) arithmetic operations:
    • An ellipsoid that contains G;
    • A lower bound MinVol(p)>0 on the volume of G.
  3. Given Data(p) and a point x in Rn, one can compute using poly(Size(p)) arithmetic operations:
    • A separation oracle for G (that is: either assert that x is in G, or return a hyperplane separating x from G).
    • A first-order oracle for f (that is: compute the value of f(x) and a subgradient f'(x)).

Under these assumptions, the ellipsoid method is "R-polynomial". This means that there exists a polynomial Poly such that, for every problem-instance p and every approximation-ratio ε>0, the method finds a solution x satisfying :

,

using at most the following number of arithmetic operations on real numbers:

where V(p) is a data-dependent quantity. Intuitively, it means that the number of operations required for each additional digit of accuracy is polynomial in Size(p). In the case of the ellipsoid method, we have:

.

The ellipsoid method requires at most steps, and each step requires Poly(Size(p)) arithmetic operations.

Practical performance

The ellipsoid method is used on low-dimensional problems, such as planar location problems, where it is numerically stable. Nemirovsky and BenTal [6] :Sec.8.3.3 say that it is efficient if the number of variables is at most 20-30; this is so even if there are thousands of constraints, as the number of iterations does not depend on the number of constraints. However, in problems with many variables, the ellipsoid method is very inefficient, as the number of iterations grows as O(n2).

Even on "small"-sized problems, it suffers from numerical instability and poor performance in practice [ citation needed ].

Theoretical importance

The ellipsoid method is an important theoretical technique in combinatorial optimization. In computational complexity theory, the ellipsoid algorithm is attractive because its complexity depends on the number of columns and the digital size of the coefficients, but not on the number of rows.

The ellipsoid method can be used to show that many algorithmic problems on convex sets are polynomial-time equivalent.

Performance in linear programs

Leonid Khachiyan applied the ellipsoid method to the special case of linear programming: minimize cTx s.t. Ax ≤ b, where all coefficients in A,b,c are rational numbers. He showed that linear programs can be solved in polynomial time. Here is a sketch of Khachiyan's theorem. [6] :Sec.8.4.2

Step 1: reducing optimization to search. The theorem of linear programming duality says that we can reduce the above minimization problem to the search problem: find x,y s.t. Ax ≤ b ; ATy = c ; y ≤ 0 ; cTx=bTy. The first problem is solvable iff the second problem is solvable; in case the problem is solvable, the x-components of the solution to the second problem are an optimal solution to the first problem. Therefore, from now on, we can assume that we need to solve the following problem: find z ≥ 0 s.t. Rzr. Multiplying all rational coefficients by the common denominator, we can assume that all coefficients are integers.

Step 2: reducing search to feasibility-check. The problem find z ≥ 0 s.t. Rzr can be reduced to the binary decision problem: "is there a z ≥ 0 such that Rzr?". This can be done as follows. If the answer to the decision problem is "no", then the answer to the search problem is "None", and we are done. Otherwise, take the first inequality constraint R1zr1; replace it with an equality R1z = r1; and apply the decision problem again. If the answer is "yes", we keep the equality; if the answer is "no", it means that the inequality is redundant, and we can remove it. Then we proceed to the next inequality constraint. For each constraint, we either convert it to equality or remove it. Finally, we have only equality constraints, which can be solved by any method for solving a system of linear equations.

Step 3: the decision problem can be reduced to a different optimization problem. Define the residual function f(z) := max[(Rz)1-r1, (Rz)2-r2, (Rz)3-r3,...]. Clearly, f(z)≤0 iff Rzr. Therefore, to solve the decnsion problem, it is sufficient to solve the minimization problem: minzf(z). The function f is convex (it is a maximum of linear functions). Denote the minimum value by f*. Then the answer to the decision problem is "yes" iff f*≤0.

Step 4: In the optimization problem minzf(z), we can assume that z is in a box of side-length 2L, where L is the bit length of the problem data. Thus, we have a bounded convex program, that can be solved up to any accuracy ε by the ellipsoid method, in time polynomial in L.

Step 5: It can be proved that, if f*>0, then f*>2-poly(L), for some polynomial. Therefore, we can pick the accuracy ε=2-poly(L). Then, the ε-approximate solution found by the ellipsoid method will be positive, iff f*>0, iff the decision problem is unsolvable.

Variants

The ellipsoid method has several variants, depending on what cuts exactly are used in each step. [1] :Sec. 3

Different cuts

In the central-cut ellipsoid method, [1] :82,87–94 the cuts are always through the center of the current ellipsoid. The input is a rational number ε>0, a convex body K given by a weak separation oracle, and a number R such that S(0,R) (the ball of radius R around the origin) contains K. The output is one of the following:

The number of steps is , the number of required accuracy digits is p := 8N, and the required accuracy of the separation oracle is d := 2-p.

In the deep-cut ellipsoid method, [1] :83 the cuts remove more than half of the ellipsoid in each step. This makes it faster to discover that K is empty. However, when K is nonempty, there are examples in which the central-cut method finds a feasible point faster. The use of deep cuts does not change the order of magnitude of the run-time.

In the shallow-cut ellipsoid method, [1] :83,94–101 the cuts remove less than half of the ellipsoid in each step. This variant is not very useful in practice, but it has theoretical importance: it allows to prove results that cannot be derived from other variants. The input is a rational number ε>0, a convex body K given by a shallow separation oracle, and a number R such that S(0,R) contains K. The output is a positive definite matrix A and a point a such that one of the following holds:

The number of steps is , and the number of required accuracy digits is p := 8N.

Different ellipsoids

There is also a distinction between the circumscribed ellipsoid and the inscribed ellipsoid methods: [7]

The methods differ in their runtime complexity (below, n is the number of variables and epsilon is the accuracy):

The relative efficiency of the methods depends on the time required for finding a separating hyperplane, which depends on the application: if the runtime is for then the circumscribed method is more efficient, but if then the inscribed method is more efficient. [7]

Notes

  1. 1 2 3 4 5 Grötschel, Martin; Lovász, László; Schrijver, Alexander (1993), Geometric algorithms and combinatorial optimization, Algorithms and Combinatorics, vol. 2 (2nd ed.), Springer-Verlag, Berlin, doi:10.1007/978-3-642-78240-4, ISBN   978-3-642-78242-8, MR   1261419
  2. L. Lovász: An Algorithmic Theory of Numbers, Graphs, and Convexity, CBMS-NSF Regional Conference Series in Applied Mathematics 50, SIAM, Philadelphia, Pennsylvania, 1986.
  3. V. Chandru and M.R.Rao, Linear Programming, Chapter 31 in Algorithms and Theory of Computation Handbook, edited by M. J. Atallah, CRC Press 1999, 31-1 to 31-37.
  4. V. Chandru and M.R.Rao, Integer Programming, Chapter 32 in Algorithms and Theory of Computation Handbook, edited by M.J.Atallah, CRC Press 1999, 32-1 to 32-45.
  5. "MIT 6.854 Spring 2016 Lecture 12: From Separation to Optimization and Back; Ellipsoid Method - YouTube". www.youtube.com. Archived from the original on 2021-12-22. Retrieved 2021-01-03.
  6. 1 2 3 Nemirovsky and Ben-Tal (2023). "Optimization III: Convex Optimization" (PDF).[ permanent dead link ]
  7. 1 2 Newman, D. J.; Primak, M. E. (1992-12-01). "Complexity of circumscribed and inscribed ellipsoid methods for solving equilibrium economical models". Applied Mathematics and Computation. 52 (2): 223–231. doi:10.1016/0096-3003(92)90079-G. ISSN   0096-3003.
  8. https://elibrary.ru/item.asp?id=38308898
  9. Primak, M. E.; Kheyfets, B. L. (1995-06-01). "A modification of the inscribed ellipsoid method". Mathematical and Computer Modelling. 21 (11): 69–76. doi:10.1016/0895-7177(95)00080-L. ISSN   0895-7177.

Further reading

Related Research Articles

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

<span class="mw-page-title-main">Linear programming</span> Method to solve optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming.

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints are linear.

In computer science and operations research, approximation algorithms are efficient algorithms that find approximate solutions to optimization problems with provable guarantees on the distance of the returned solution to the optimal one. Approximation algorithms naturally arise in the field of theoretical computer science as a consequence of the widely believed P ≠ NP conjecture. Under this conjecture, a wide class of optimization problems cannot be solved exactly in polynomial time. The field of approximation algorithms, therefore, tries to understand how closely it is possible to approximate optimal solutions to such problems in polynomial time. In an overwhelming majority of the cases, the guarantee of such algorithms is a multiplicative one expressed as an approximation ratio or approximation factor i.e., the optimal solution is always guaranteed to be within a (predetermined) multiplicative factor of the returned solution. However, there are also many approximation algorithms that provide an additive guarantee on the quality of the returned solution. A notable example of an approximation algorithm that provides both is the classic approximation algorithm of Lenstra, Shmoys and Tardos for scheduling on unrelated parallel machines.

In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints or the objective function are nonlinear. An optimization problem is one of calculation of the extrema of an objective function over a set of unknown real variables and conditional to the satisfaction of a system of equalities and inequalities, collectively termed constraints. It is the sub-field of mathematical optimization that deals with problems that are not linear.

In optimization, line search is a basic iterative approach to find a local minimum of an objective function . It first finds a descent direction along which the objective function will be reduced, and then computes a step size that determines how far should move along that direction. The descent direction can be computed by various methods, such as gradient descent or quasi-Newton method. The step size can be determined either exactly or inexactly.

<span class="mw-page-title-main">Interior-point method</span> Algorithms for solving convex optimization problems

Interior-point methods are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

Semidefinite programming (SDP) is a subfield of mathematical programming concerned with the optimization of a linear objective function over the intersection of the cone of positive semidefinite matrices with an affine space, i.e., a spectrahedron.

Penalty methods are a certain class of algorithms for solving constrained optimization problems.

Subgradient methods are convex optimization methods which use subderivatives. Originally developed by Naum Z. Shor and others in the 1960s and 1970s, subgradient methods are convergent when applied even to a non-differentiable objective function. When the objective function is differentiable, sub-gradient methods for unconstrained problems use the same search direction as the method of steepest descent.

A sum-of-squares optimization program is an optimization problem with a linear cost function and a particular type of constraint on the decision variables. These constraints are of the form that when the decision variables are used as coefficients in certain polynomials, those polynomials should have the polynomial SOS property. When fixing the maximum degree of the polynomials involved, sum-of-squares optimization is also known as the Lasserre hierarchy of relaxations in semidefinite programming.

In applied mathematics, Graver bases enable iterative solutions of linear and various nonlinear integer programming problems in polynomial time. They were introduced by Jack E. Graver. Their connection to the theory of Gröbner bases was discussed by Bernd Sturmfels. The algorithmic theory of Graver bases and its application to integer programming is described by Shmuel Onn.

A separation oracle is a concept in the mathematical theory of convex optimization. It is a method to describe a convex set that is given as an input to an optimization algorithm. Separation oracles are used as input to ellipsoid methods.

The Karmarkar–Karp (KK) bin packing algorithms are several related approximation algorithm for the bin packing problem. The bin packing problem is a problem of packing items of different sizes into bins of identical capacity, such that the total number of bins is as small as possible. Finding the optimal solution is computationally hard. Karmarkar and Karp devised an algorithm that runs in polynomial time and finds a solution with at most bins, where OPT is the number of bins in the optimal solution. They also devised several other algorithms with slightly different approximation guarantees and run-time bounds.

Lexicographic max-min optimization is a kind of multi-objective optimization. In general, multi-objective optimization deals with optimization problems with two or more objective functions to be optimized simultaneously. Lexmaxmin optimization presumes that the decision-maker would like the smallest objective value to be as high as possible; subject to this, the second-smallest objective should be as high as possible; and so on. In other words, the decision-maker ranks the possible solutions according to a leximin order of their objective function values.

The center-of-gravity method is a theoretic algorithm for convex optimization. It can be seen as a generalization of the bisection method from one-dimensional functions to multi-dimensional functions. It is theoretically important as it attains the optimal convergence rate. However, it has little practical value as each step is very computationally expensive.

Many problems in mathematical programming can be formulated as problems on convex sets or convex bodies. Six kinds of problems are particularly important: optimization, violation, validity, separation, membership and emptiness. Each of these problems has a strong (exact) variant, and a weak (approximate) variant.