Adaptive coordinate descent

Last updated

Adaptive coordinate descent [1] is an improvement of the coordinate descent algorithm to non-separable optimization by the use of adaptive encoding. [2] The adaptive coordinate descent approach gradually builds a transformation of the coordinate system such that the new coordinates are as decorrelated as possible with respect to the objective function. The adaptive coordinate descent was shown to be competitive to the state-of-the-art evolutionary algorithms and has the following invariance properties:

Contents

  1. Invariance with respect to monotonous transformations of the function (scaling)
  2. Invariance with respect to orthogonal transformations of the search space (rotation).

CMA-like Adaptive Encoding Update (b) mostly based on principal component analysis (a) is used to extend the coordinate descent method (c) to the optimization of non-separable problems (d).

Adaptive Coordinate Descent illustration.png

The adaptation of an appropriate coordinate system allows adaptive coordinate descent to outperform coordinate descent on non-separable functions. The following figure illustrates the convergence of both algorithms on 2-dimensional Rosenbrock function up to a target function value , starting from the initial point .

Rosenbrock2D.png

The adaptive coordinate descent method reaches the target value after only 325 function evaluations (about 70 times faster than coordinate descent), that is comparable to gradient-based methods. The algorithm has linear time complexity if update coordinate system every D iterations, it is also suitable for large-scale (D>>100) non-linear optimization.

Relevant approaches

First approaches to optimization using adaptive coordinate system were proposed already in the 1960s (see, e.g., Rosenbrock's method). PRincipal Axis (PRAXIS) algorithm, also referred to as Brent's algorithm, is a derivative-free algorithm which assumes quadratic form of the optimized function and repeatedly updates a set of conjugate search directions. [3] The algorithm, however, is not invariant to scaling of the objective function and may fail under its certain rank-preserving transformations (e.g., will lead to a non-quadratic shape of the objective function). A recent analysis of PRAXIS can be found in . [4] For practical applications see, [5] where an adaptive coordinate descent approach with step-size adaptation and local coordinate system rotation was proposed for robot-manipulator path planning in 3D space with static polygonal obstacles.

See also

Related Research Articles

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In computer science, an evolution strategy (ES) is an optimization technique based on ideas of evolution. It belongs to the general class of evolutionary computation or artificial evolution methodologies.

<span class="mw-page-title-main">Invariant (mathematics)</span> Property that is not changed by mathematical transformations

In mathematics, an invariant is a property of a mathematical object which remains unchanged after operations or transformations of a certain type are applied to the objects. The particular class of objects and type of transformations are usually indicated by the context in which the term is used. For example, the area of a triangle is an invariant with respect to isometries of the Euclidean plane. The phrases "invariant under" and "invariant to" a transformation are both used. More generally, an invariant with respect to an equivalence relation is a property that is constant on each equivalence class.

Convex optimization is a subfield of mathematical optimization that studies the problem of minimizing convex functions over convex sets. Many classes of convex optimization problems admit polynomial-time algorithms, whereas mathematical optimization is in general NP-hard.

<span class="mw-page-title-main">Rosenbrock function</span> Function used as a performance test problem for optimization algorithms

In mathematical optimization, the Rosenbrock function is a non-convex function, introduced by Howard H. Rosenbrock in 1960, which is used as a performance test problem for optimization algorithms. It is also known as Rosenbrock's valley or Rosenbrock's banana function.

A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables. For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.

Pattern recognition is a very active field of research intimately bound to machine learning. Also known as classification or statistical classification, pattern recognition aims at building a classifier that can determine the class of an input pattern. This procedure, known as training, corresponds to learning an unknown decision function based only on a set of input-output pairs that form the training data. Nonetheless, in real world applications such as character recognition, a certain amount of information on the problem is usually known beforehand. The incorporation of this prior knowledge into the training is the key element that will allow an increase of performance in many applications.

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value . Like this, over the generation sequence, individuals with better and better -values are generated.

Rosenbrock methods refers to either of two distinct ideas in numerical computation, both named for Howard H. Rosenbrock.

In applied mathematics, multimodal optimization deals with optimization tasks that involve finding all or most of the multiple solutions of a problem, as opposed to a single best solution. Evolutionary multimodal optimization is a branch of evolutionary computation, which is closely related to machine learning. Wong provides a short survey, wherein the chapter of Shir and the book of Preuss cover the topic in more detail.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

Coordinate descent is an optimization algorithm that successively minimizes along coordinate directions to find the minimum of a function. At each iteration, the algorithm determines a coordinate or coordinate block via a coordinate selection rule, then exactly or inexactly minimizes over the corresponding coordinate hyperplane while fixing all other coordinates or coordinate blocks. A line search along the coordinate direction can be performed at the current iterate to determine the appropriate step size. Coordinate descent is applicable in both differentiable and derivative-free contexts.

Randomized (Block) Coordinate Descent Method is an optimization algorithm popularized by Nesterov (2010) and Richtárik and Takáč (2011). The first analysis of this method, when applied to the problem of minimizing a smooth convex function, was performed by Nesterov (2010). In Nesterov's analysis the method needs to be applied to a quadratic perturbation of the original function with an unknown scaling factor. Richtárik and Takáč (2011) give iteration complexity bounds which do not require this, i.e., the method is applied to the objective function directly. Furthermore, they generalize the setting to the problem of minimizing a composite function, i.e., sum of a smooth convex and a convex block-separable function:

Optimus is a Process Integration and Design Optimization (PIDO) platform developed by Noesis Solutions. Noesis Solutions takes part in key research projects, such as PHAROS and MATRIX.

Derivative-free optimization is a discipline in mathematical optimization that does not use derivative information in the classical sense to find optimal solutions: Sometimes information about the derivative of the objective function f is unavailable, unreliable or impractical to obtain. For example, f might be non-smooth, or time-consuming to evaluate, or in some way noisy, so that methods that rely on derivatives or approximate them via finite differences are of little use. The problem to find optimal points in such situations is referred to as derivative-free optimization, algorithms that do not use derivatives or finite differences are called derivative-free algorithms.

Peter Richtarik is a Slovak mathematician and computer scientist working in the area of big data optimization and machine learning, known for his work on randomized coordinate descent algorithms, stochastic gradient descent and federated learning. He is currently a Professor of Computer Science at the King Abdullah University of Science and Technology.

The following outline is provided as an overview of and topical guide to machine learning:

References

  1. Loshchilov, I.; M. Schoenauer; M. Sebag (2011). "Adaptive Coordinate Descent" (PDF). Genetic and Evolutionary Computation Conference (GECCO). ACM Press. pp. 885–892.
  2. Nikolaus Hansen. "Adaptive Encoding: How to Render Search Coordinate System Invariant". Parallel Problem Solving from Nature - PPSN X, Sep 2008, Dortmund, Germany. pp.205-214, 2008.
  3. Brent, R.P. (1972). Algorithms for minimization without derivatives. Prentice-Hall.
  4. Ali, U.; Kickmeier-Rust, M.D. (2008). "Implementation and Applications of a Three-Round User Strategy for Improved Principal Axis Minimization". Journal of Applied Quantitative Methods. pp. 505–513.
  5. Pavlov, D. (2006). "Manipulator path planning in 3-dimensional space". Computer Science--Theory and Applications. Springer. pp. 505–513.