Interior-point method

Last updated

Example search for a solution. Blue lines show constraints, red points show iterated solutions. Karmarkar.svg
Example search for a solution. Blue lines show constraints, red points show iterated solutions.

Interior-point methods (also referred to as barrier methods or IPMs) are algorithms for solving linear and non-linear convex optimization problems. IPMs combine two advantages of previously-known algorithms:

Contents

In contrast to the simplex method which traverses the boundary of the feasible region, and the ellipsoid method which bounds the feasible region from outside, an IPM reaches a best solution by traversing the interior of the feasible region—hence the name.

History

An interior point method was discovered by Soviet mathematician I. I. Dikin in 1967. [1] The method was reinvented in the U.S. in the mid-1980s. In 1984, Narendra Karmarkar developed a method for linear programming called Karmarkar's algorithm, [2] which runs in provably polynomial time ( operations on L-bit numbers, where n is the number of variables and constants), and is also very efficient in practice. Karmarkar's paper created a surge of interest in interior point methods. Two years later, James Renegar invented the first path-following interior-point method, with run-time . The method was later extended from linear to convex optimization problems, based on a self-concordant barrier function used to encode the convex set. [3]

Any convex optimization problem can be transformed into minimizing (or maximizing) a linear function over a convex set by converting to the epigraph form. [4] :143 The idea of encoding the feasible set using a barrier and designing barrier methods was studied by Anthony V. Fiacco, Garth P. McCormick, and others in the early 1960s. These ideas were mainly developed for general nonlinear programming, but they were later abandoned due to the presence of more competitive methods for this class of problems (e.g. sequential quadratic programming).

Yurii Nesterov and Arkadi Nemirovski came up with a special class of such barriers that can be used to encode any convex set. They guarantee that the number of iterations of the algorithm is bounded by a polynomial in the dimension and accuracy of the solution. [5] [3]

The class of primal-dual path-following interior-point methods is considered the most successful. Mehrotra's predictor–corrector algorithm provides the basis for most implementations of this class of methods. [6]

Definitions

We are given a convex program of the form:

where f is a convex function and G is a convex set. Without loss of generality, we can assume that the objective f is a linear function. Usually, the convex set G is represented by a set of convex inequalities and linear equalities; the linear equalities can be eliminated using linear algebra, so for simplicity we assume there are only convex inequalities, and the program can be described as follows, where the gi are convex functions:

We assume that the constraint functions belong to some family (e.g. quadratic functions), so that the program can be represented by a finite vector of coefficients (e.g. the coefficients to the quadratic functions). The dimension of this coefficient vector is called the size of the program. A numerical solver for a given family of programs is an algorithm that, given the coefficient vector, generates a sequence of approximate solutions xt for t=1,2,..., using finitely many arithmetic operations. A numerical solver is called convergent if, for any program from the family and any positive ε>0, there is some T (which may depend on the program and on ε) such that, for any t>T, the approximate solution xt is ε-approximate, that is:

f(x_t) - f*ε

gi(x_t) ≤ ε for i in 1,...,m,

x in G,

where f* is the optimal solution. A solver is called polynomial if the total number of arithmetic operations in the first T steps is at most

poly(problem-size) * log(V/ε),

where V is some data-dependent constant, e.g., the difference between the largest and smallest value in the feasible set. In other words, V/ε is the "relative accuracy" of the solution - the accuracy w.r.t. the largest coefficient. log(V/ε) represents the number of "accuracy digits". Therefore, a solver is 'polynomial' if each additional digit of accuracy requires a number of operations that is polynomial in the problem size.

Types

Types of interior point methods include:

Path-following methods

Idea

Given a convex optimization program (P) with constraints, we can convert it to an unconstrained program by adding a barrier function. Specifically, let b be a smooth convex function, defined in the interior of the feasible region G, such that for any sequence {xj in interior(G)} whose limit is on the boundary of G: . We also assume that b is non-degenerate, that is: is positive definite for all x in interior(G). Now, consider the family of programs:

(Pt) minimize t * f(x) + b(x)

Technically the program is restricted, since b is defined only in the interior of G. But practically, it is possible to solve it as an unconstrained program, since any solver trying to minimize the function will not approach the boundary, where b approaches infinity. Therefore, (Pt) has a unique solution - denote it by x*(t). The function x* is a continuous function of t, which is called the central path. All limit points of x*, as t approaches infinity, are optimal solutions of the original program (P).

A path-following method is a method of tracking the function x* along a certain increasing sequence t1,t2,..., that is: computing a good-enough approximation xi to the point x*(ti), such that the difference xi - x*(ti) approaches 0 as i approaches infinity; then the sequence xi approaches the optimal solution of (P). This requires to specify three things:

The main challenge in proving that the method is polytime is that, as the penalty parameter grows, the solution gets near the boundary, and the function becomes steeper. The run-time of solvers such as Newton's method becomes longer, and it is hard to prove that the total runtime is polynomial.

Renegar [7] and Gonzaga [8] proved that a specific instance of a path-following method is polytime:

They proved that, in this case, the difference xi - x*(ti) remains at most 0.01, and f(xi) - f* is at most 2*m/ti. Thus, the solution accuracy is proportional to 1/ti, so to add a single accuracy-digit, it is suffiicent to multiply ti by 2 (or any other constant factor), which requires O(sqrt(m)) Newton steps. Since each Newton step takes O(m n2) operations, the total complexity is O(m3/2 n2) operations for accuracy digit.

Yuri Nesterov extended the idea from linear to non-linear programs. He noted that the main property of the logarithmic barrier, used in the above proofs, is that it is self-concordant with a finite barrier parameter. Therefore, many other classes of convex programs can be solved in polytime using a path-following method, if we can find a suitable self-concordant barrier function for their feasible region. [3] :Sec.1

Details

We are given a convex optimization problem (P) in "standard form":

minimize cTx s.t. x in G,

where G is convex and closed. We can also assume that G is bounded (we can easily make it bounded by adding a constraint |x|≤R for some sufficiently large R). [3] :Sec.4

To use the interior-point method, we need a self-concordant barrier for G. Let b be an M-self-concordant barrier for G, where M≥1 is the self-concordance parameter. We assume that we can compute efficiently the value of b, its gradient, and its Hessian, for every point x in the interior of G.

For every t>0, we define the penalized objectiveft(x) := cTx + b(x). We define the path of minimizers by: x*(t) := arg min ft(x). We apporimate this path along an increasing sequence ti. The sequence is initialized by a certain non-trivial two-phase initialization procedure. Then, it is updated according to the following rule: .

For each ti, we find an approximate minimum of fti, denoted by xi. The approximate minimum is chosen to satisfy the following "closeness condition" (where L is the path tolerance):

.

To find xi+1, we start with xi and apply the damped Newton method. We apply several steps of this method, until the above "closeness relation" is satisfied. The first point that satisfies this relation is denoted by xi+1. [3] :Sec.4

Convergence and complexity

The convergence rate of the method is given by the following formula, for every i: [3] :Prop.4.4.1

Taking , the number of Newton steps required to go from xi to xi+1 is at most a fixed number, that depends only on r and L. In particular, the total number of Newton steps required to find an ε-approximate solution (i.e., finding x in G such that cTx - c* ≤ ε) is at most: [3] :Thm.4.4.1

where the constant factor O(1) depends only on r and L. The number of Newton steps required for the two-step initialization procedure is at most: [3] :Thm.4.5.1

[ clarification needed ]

where the constant factor O(1) depends only on r and L, and , and is some point in the interior of G. Overall, the overall Newton complexity of finding an ε-approximate solution is at most

, where V is some problem-dependent constant: .

Each Newton step takes O(n3) arithmetic operations.

Initialization: phase-I methods

To initialize the path-following methods, we need a point in the relative interior of the feasible region G. In other words: if G is defined by the inequalities gi(x) ≤ 0, then we need some x for which gi(x) < 0 for all i in 1,...,m. If we do not have such a point, we need to find one using a so-called phase I method. [4] :11.4 A simple phase-I method is to solve the following convex program:

Denote the optimal solution by x*,s*.

For this program it is easy to get an interior point: we can take arbitrarily x=0, and take s to be any number larger than max(f1(0),...,fm(0)). Therefore, it can be solved using interior-point methods. However, the run-time is proportional to log(1/s*). As s* comes near 0, it becomes harder and harder to find an exact solution to the phase-I problem, and thus harder to decide whether the original problem is feasible.

Practical considerations

The theoretic guarantees assume that the penalty parameter is increased at the rate , so the worst-case number of required Newton steps is . In theory, if μ is larger (e.g. 2 or more), then the worst-case number of required Newton steps is in . However, in practice, larger μ leads to a much faster convergence. These methods are called long-step methods. [3] :Sec.4.6 In practice, if μ is between 3 and 100, then the program converges within 20-40 Newton steps, regardless of the number of constraints (though the runtime of each Newton step of course grows with the number of constraints). The exact value of μ within this range has little effect on the performane. [4] :chpt.11

Potential-reduction methods

For potential-reduction methods, the problem is presented in the conic form: [3] :Sec.5

minimize cTx s.t. x in {b+L} ᚢ K,

where b is a vector in Rn, L is a linear subspace in Rn (so b+L is an affine plane), and K is a closed pointed convex cone with a nonempty interior. Every convex program can be converted to the conic form. To use the potential-reduction method (specifically, the extension of Karmarkar's algorithm to convex programming), we need the following assumptions: [3] :Sec.6

Assumptions A, B and D are needed in most interior-point methods. Assumption C is specific to Karmarkar's approach; it can be alleviated by using a "sliding objective value". It is possible to further reduce the program to the Karmarkar format:

minimize sTx s.t. x in M ᚢ K and eTx = 1

where M is a linear subspace of in Rn, and the optimal objective value is 0. The method is based on the following scalar potential function:

v(x) = F(x) + M ln (sTx)

where F is the M-self-concordant barrier for the feasible cone. It is possible to prove that, when x is strictly feasible and v(x) is very small (- very negative), x is approximately-optimal. The idea of the potential-reduction method is to modify x such that the potential at each iteration drops by at least a fixed constant X (specifically, X=1/3-ln(4/3)). This implies that, after i iterations, the difference between objective value and the optimal objective value is at most V * exp(-i X / M), where V is a data-dependent constant. Therefore, the number of Newton steps required for an ε-approximate solution is at most .

Note that in path-following methods the expression is rather than M, which is better in theory. But in practice, Karmarkar's method allows taking much larger steps towards the goal, so it may converge much faster than the theoretical guarantees.

Primal-dual methods

The primal-dual method's idea is easy to demonstrate for constrained nonlinear optimization. [9] [10] For simplicity, consider the following nonlinear optimization problem with inequality constraints:

This inequality-constrained optimization problem is solved by converting it into an unconstrained objective function whose minimum we hope to find efficiently. Specifically, the logarithmic barrier function associated with (1) is

Here is a small positive scalar, sometimes called the "barrier parameter". As converges to zero the minimum of should converge to a solution of (1).

The gradient of a differentiable function is denoted . The gradient of the barrier function is

In addition to the original ("primal") variable we introduce a Lagrange multiplier-inspired dual variable

Equation (4) is sometimes called the "perturbed complementarity" condition, for its resemblance to "complementary slackness" in KKT conditions.

We try to find those for which the gradient of the barrier function is zero.

Substituting from (4) into (3), we get an equation for the gradient:

where the matrix is the Jacobian of the constraints .

The intuition behind (5) is that the gradient of should lie in the subspace spanned by the constraints' gradients. The "perturbed complementarity" with small (4) can be understood as the condition that the solution should either lie near the boundary , or that the projection of the gradient on the constraint component normal should be almost zero.

Let be the search direction for iteratively updating . Applying Newton's method to (4) and (5), we get an equation for :

where is the Hessian matrix of , is a diagonal matrix of , and is the diagonal matrix of .

Because of (1), (4) the condition

should be enforced at each step. This can be done by choosing appropriate :

Trajectory of the iterates of x by using the interior point method.

Types of Convex Programs Solvable via Interior-Point Methods

Here are some special cases of convex programs that can be solved efficiently by interior-point methods. [3] :Sec.10

Linear programs

Consider a linear program of the form:

We can apply path-following methods with the barrier

The function is self-concordant with parameter M=m (the number of constraints). Therefore, the number of required Newton steps for the path-following method is O(mn2), and the total runtime complexity is O(m3/2n2).[ clarification needed ]

Quadratically constrained quadratic programs

Given a quadratically constrained quadratic program of the form:

where all matrices Aj are positive-semidefinite matrices. We can apply path-following methods with the barrier

The function is a self-concordant barrier with parameter M=m. The Newton complexity is O((m+n)n2), and the total runtime complexity is O(m1/2 (m+n) n2).

Lp norm approximation

Consider a problem of the form

where each is a vector, each is a scalar, and is an Lp norm with After converting to the standard form, we can apply path-following methods with a self-concordant barrier with parameter M=4m. The Newton complexity is O((m+n)n2), and the total runtime complexity is O(m1/2 (m+n) n2).

Geometric programs

Consider the problem

There is a self-concordant barrier with parameter 2k+m. The path-following method has Newton complexity O(mk2+k3+n3) and total complexity O((k+m)1/2[mk2+k3+n3]).

Semidefinite programs

Interior point methods can be used to solve semidefinite programs. [3] :Sec.11

See also

Related Research Articles

Quadratic programming (QP) is the process of solving certain mathematical optimization problems involving quadratic functions. Specifically, one seeks to optimize a multivariate quadratic function subject to linear constraints on the variables. Quadratic programming is a type of nonlinear programming.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In mathematics, the Gauss–Kuzmin–Wirsing operator is the transfer operator of the Gauss map that takes a positive number to the fractional part of its reciprocal. It is named after Carl Gauss, Rodion Kuzmin, and Eduard Wirsing. It occurs in the study of continued fractions; it is also related to the Riemann zeta function.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

The birth–death process is a special case of continuous-time Markov process where the state transitions are of only two types: "births", which increase the state variable by one and "deaths", which decrease the state by one. It was introduced by William Feller. The model's name comes from a common application, the use of such models to represent the current size of a population where the transitions are literal births and deaths. Birth–death processes have many applications in demography, queueing theory, performance engineering, epidemiology, biology and other areas. They may be used, for example, to study the evolution of bacteria, the number of people with a disease within a population, or the number of customers in line at the supermarket.

In mathematical optimization, the Karush–Kuhn–Tucker (KKT) conditions, also known as the Kuhn–Tucker conditions, are first derivative tests for a solution in nonlinear programming to be optimal, provided that some regularity conditions are satisfied.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

In mathematical optimization, the ellipsoid method is an iterative method for minimizing convex functions over convex sets. The ellipsoid method generates a sequence of ellipsoids whose volume uniformly decreases at every step, thus enclosing a minimizer of a convex function.

In theoretical physics, scalar electrodynamics is a theory of a U(1) gauge field coupled to a charged spin 0 scalar field that takes the place of the Dirac fermions in "ordinary" quantum electrodynamics. The scalar field is charged, and with an appropriate potential, it has the capacity to break the gauge symmetry via the Abelian Higgs mechanism.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members often asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.

In mathematics, the Brunn–Minkowski theorem is an inequality relating the volumes of compact subsets of Euclidean space. The original version of the Brunn–Minkowski theorem applied to convex sets; the generalization to compact nonconvex sets stated here is due to Lazar Lyusternik (1935).

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, over the generation sequence, individuals with better and better -values are generated.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

A self-concordant function is a function satisfying a certain differential inequality, which makes it particularly easy for optimization using Newton's method A self-concordant barrier is a particular self-concordant function, that is also a barrier function for a particular convex set. Self-concordant barriers are important ingredients in interior point methods for optimization.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

The near-infrared (NIR) window defines the range of wavelengths from 650 to 1350 nanometre (nm) where light has its maximum depth of penetration in tissue. Within the NIR window, scattering is the most dominant light-tissue interaction, and therefore the propagating light becomes diffused rapidly. Since scattering increases the distance travelled by photons within tissue, the probability of photon absorption also increases. Because scattering has weak dependence on wavelength, the NIR window is primarily limited by the light absorption of blood at short wavelengths and water at long wavelengths. The technique using this window is called NIRS. Medical imaging techniques such as fluorescence image-guided surgery often make use of the NIR window to detect deep structures.

Least-squares support-vector machines (LS-SVM) for statistics and in statistical modeling, are least-squares versions of support-vector machines (SVM), which are a set of related supervised learning methods that analyze data and recognize patterns, and which are used for classification and regression analysis. In this version one finds the solution by solving a set of linear equations instead of a convex quadratic programming (QP) problem for classical SVMs. Least-squares SVM classifiers were proposed by Johan Suykens and Joos Vandewalle. LS-SVMs are a class of kernel-based learning methods.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

(Stochastic) variance reduction is an algorithmic approach to minimizing functions that can be decomposed into finite sums. By exploiting the finite sum structure, variance reduction techniques are able to achieve convergence rates that are impossible to achieve with methods that treat the objective as an infinite sum, as in the classical Stochastic approximation setting.

References

  1. Dikin, I.I. (1967). "Iterative solution of problems of linear and quadratic programming". Dokl. Akad. Nauk SSSR. 174 (1): 747–748.
  2. Karmarkar, N. (1984). "A new polynomial-time algorithm for linear programming" (PDF). Proceedings of the sixteenth annual ACM symposium on Theory of computing – STOC '84. p. 302. doi: 10.1145/800057.808695 . ISBN   0-89791-133-4. Archived from the original (PDF) on 28 December 2013.
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 Arkadi Nemirovsky (2004). Interior point polynomial-time methods in convex programming.
  4. 1 2 3 Boyd, Stephen; Vandenberghe, Lieven (2004). Convex Optimization. Cambridge: Cambridge University Press. ISBN   978-0-521-83378-3. MR   2061575.
  5. Wright, Margaret H. (2004). "The interior-point revolution in optimization: History, recent developments, and lasting consequences". Bulletin of the American Mathematical Society. 42: 39–57. doi: 10.1090/S0273-0979-04-01040-7 . MR   2115066.
  6. Potra, Florian A.; Stephen J. Wright (2000). "Interior-point methods". Journal of Computational and Applied Mathematics. 124 (1–2): 281–302. doi: 10.1016/S0377-0427(00)00433-7 .
  7. 1 2 Renegar, James (1 January 1988). "A polynomial-time algorithm, based on Newton's method, for linear programming". Mathematical Programming. 40 (1): 59–93. doi:10.1007/BF01580724. ISSN   1436-4646.
  8. 1 2 Gonzaga, Clovis C. (1989), Megiddo, Nimrod (ed.), "An Algorithm for Solving Linear Programming Problems in O(n3L) Operations", Progress in Mathematical Programming: Interior-Point and Related Methods, New York, NY: Springer, pp. 1–28, doi:10.1007/978-1-4613-9617-8_1, ISBN   978-1-4613-9617-8 , retrieved 22 November 2023
  9. Mehrotra, Sanjay (1992). "On the Implementation of a Primal-Dual Interior Point Method". SIAM Journal on Optimization. 2 (4): 575–601. doi:10.1137/0802028.
  10. Wright, Stephen (1997). Primal-Dual Interior-Point Methods. Philadelphia, PA: SIAM. ISBN   978-0-89871-382-4.