Graduated optimization

Last updated

Graduated optimization is a global optimization technique that attempts to solve a difficult optimization problem by initially solving a greatly simplified problem, and progressively transforming that problem (while optimizing) until it is equivalent to the difficult optimization problem. [1] [2] [3]

Contents

Technique description

An illustration of graduated optimization. Graduated optimization.svg
An illustration of graduated optimization.

Graduated optimization is an improvement to hill climbing that enables a hill climber to avoid settling into local optima. [4] It breaks a difficult optimization problem into a sequence of optimization problems, such that the first problem in the sequence is convex (or nearly convex), the solution to each problem gives a good starting point to the next problem in the sequence, and the last problem in the sequence is the difficult optimization problem that it ultimately seeks to solve. Often, graduated optimization gives better results than simple hill climbing. Further, when certain conditions exist, it can be shown to find an optimal solution to the final problem in the sequence. These conditions are:

It can be shown inductively that if these conditions are met, then a hill climber will arrive at the global optimum for the difficult problem. Unfortunately, it can be difficult to find a sequence of optimization problems that meet these conditions. Often, graduated optimization yields good results even when the sequence of problems cannot be proven to strictly meet all of these conditions.

Some examples

Graduated optimization is commonly used in image processing for locating objects within a larger image. This problem can be made to be more convex by blurring the images. Thus, objects can be found by first searching the most-blurred image, then starting at that point and searching within a less-blurred image, and continuing in this manner until the object is located with precision in the original sharp image. The proper choice of the blurring operator depends on the geometric transformation relating the object in one image to the other. [5]

Graduated optimization can be used in manifold learning. The Manifold Sculpting algorithm, for example, uses graduated optimization to seek a manifold embedding for non-linear dimensionality reduction. [6] It gradually scales variance out of extra dimensions within a data set while optimizing in the remaining dimensions. It has also been used to calculate conditions for fractionation with tumors, [7] for object tracking in computer vision, [8] and other purposes.

A thorough review of the method and its applications can be found in. [3]

Simulated annealing is closely related to graduated optimization. Instead of smoothing the function over which it is optimizing, simulated annealing randomly perturbs the current solution by a decaying amount, which may have a similar effect.[ citation needed ] Because simulated annealing relies on random sampling to find improvements, however, its computation complexity is exponential in the number of dimensions being optimized.[ citation needed ] By contrast, graduated optimization smooths the function being optimized, so local optimization techniques that are efficient in high-dimensional space (such as gradient-based techniques, hill climbers, etc.) may still be used.

See also

Related Research Articles

<span class="mw-page-title-main">Differential topology</span> Branch of mathematics

In mathematics, differential topology is the field dealing with the topological properties and smooth properties of smooth manifolds. In this sense differential topology is distinct from the closely related field of differential geometry, which concerns the geometric properties of smooth manifolds, including notions of size, distance, and rigid shape. By comparison differential topology is concerned with coarser properties, such as the number of holes in a manifold, its homotopy type, or the structure of its diffeomorphism group. Because many of these coarser properties may be captured algebraically, differential topology has strong links to algebraic topology.

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

<span class="mw-page-title-main">Simulated annealing</span> Probabilistic optimization technique and metaheuristic

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optima. It is often used when the search space is discrete. For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.

<span class="mw-page-title-main">Gradient descent</span> Optimization algorithm

Gradient descent is a method for unconstrained mathematical optimization. It is a first-order iterative algorithm for finding a local minimum of a differentiable multivariate function

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Hill climbing</span> Optimization algorithm

In numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.

Global optimization is a branch of applied mathematics and numerical analysis that attempts to find the global minima or maxima of a function or a set of functions on a given set. It is usually described as a minimization problem because the maximization of the real-valued function is equivalent to the minimization of the function .

In mathematics, nonlinear programming (NLP) is the process of solving an optimization problem where some of the constraints or the objective function are nonlinear. An optimization problem is one of calculation of the extrema of an objective function over a set of unknown real variables and conditional to the satisfaction of a system of equalities and inequalities, collectively termed constraints. It is the sub-field of mathematical optimization that deals with problems that are not linear.

<span class="mw-page-title-main">Local optimum</span> Mathematical concept

In applied mathematics and computer science, a local optimum of an optimization problem is a solution that is optimal within a neighboring set of candidate solutions. This is in contrast to a global optimum, which is the optimal solution among all possible solutions, not just those in a particular neighborhood of values. Importantly, a global optimum is necessarily a local optimum, but a local optimum is not necessarily a global optimum.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

Adaptive simulated annealing (ASA) is a variant of simulated annealing (SA) algorithm in which the algorithm parameters that control temperature schedule and random step selection are automatically adjusted according to algorithm progress. This makes the algorithm more efficient and less sensitive to user defined parameters than canonical SA. These are in the standard variant often selected on the basis of experience and experimentation, which represents a significant deficiency in practice.

Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems.

As applied in the field of computer vision, graph cut optimization can be employed to efficiently solve a wide variety of low-level computer vision problems, such as image smoothing, the stereo correspondence problem, image segmentation, object co-segmentation, and many other computer vision problems that can be formulated in terms of energy minimization. Many of these energy minimization problems can be approximated by solving a maximum flow problem in a graph. Under most formulations of such problems in computer vision, the minimum energy solution corresponds to the maximum a posteriori estimate of a solution. Although many computer vision algorithms involve cutting a graph, the term "graph cuts" is applied specifically to those models which employ a max-flow/min-cut optimization.

Lateral computing is a lateral thinking approach to solving computing problems. Lateral thinking has been made popular by Edward de Bono. This thinking technique is applied to generate creative ideas and solve problems. Similarly, by applying lateral-computing techniques to a problem, it can become much easier to arrive at a computationally inexpensive, easy to implement, efficient, innovative or unconventional solution.

In computational engineering, Luus–Jaakola (LJ) denotes a heuristic for global optimization of a real-valued function. In engineering use, LJ is not an algorithm that terminates with an optimal solution; nor is it an iterative method that generates a sequence of points that converges to an optimal solution. However, when applied to a twice continuously differentiable function, the LJ heuristic is a proper iterative method, that generates a sequence that has a convergent subsequence; for this class of problems, Newton's method is recommended and enjoys a quadratic rate of convergence, while no convergence rate analysis has been given for the LJ heuristic. In practice, the LJ heuristic has been recommended for functions that need be neither convex nor differentiable nor locally Lipschitz: The LJ heuristic does not use a gradient or subgradient when one be available, which allows its application to non-differentiable and non-convex problems.

Derivative-free optimization, is a discipline in mathematical optimization that does not use derivative information in the classical sense to find optimal solutions: Sometimes information about the derivative of the objective function f is unavailable, unreliable or impractical to obtain. For example, f might be non-smooth, or time-consuming to evaluate, or in some way noisy, so that methods that rely on derivatives or approximate them via finite differences are of little use. The problem to find optimal points in such situations is referred to as derivative-free optimization, algorithms that do not use derivatives or finite differences are called derivative-free algorithms.

References

  1. Thacker, Neil; Cootes, Tim (1996). "Graduated Non-Convexity and Multi-Resolution Optimization Methods". Vision Through Optimization.
  2. Blake, Andrew; Zisserman, Andrew (1987). Visual Reconstruction. MIT Press. ISBN   0-262-02271-0.[ page needed ]
  3. 1 2 Hossein Mobahi, John W. Fisher III. On the Link Between Gaussian Homotopy Continuation and Convex Envelopes, In Lecture Notes in Computer Science (EMMCVPR 2015), Springer, 2015.
  4. Hulse, Daniel; Tumer, Kagan; Hoyle, Christopher; Tumer, Irem (February 2019). "Modeling multidisciplinary design with multiagent learning". AI EDAM. Vol. 33. pp. 85–99. doi:10.1017/S0890060418000161.
  5. Hossein Mobahi, C. Lawrence Zitnick, Yi Ma. Seeing through the Blur, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2012.
  6. Gashler, M.; Ventura, D.; Martinez, T. (2008). "Iterative Non-linear Dimensionality Reduction with Manifold Sculpting" (PDF). In Platt, J. C.; Koller, D.; Singer, Y.; et al. (eds.). Advances in Neural Information Processing Systems 20. Cambridge, MA: MIT Press. pp. 513–20.
  7. Afanasjev, BP; Akimov, AA; Kozlov, AP; Berkovic, AN (1989). "Graduated optimization of fractionation using a 2-component model". Radiobiologia, Radiotherapia. 30 (2): 131–5. PMID   2748803.
  8. Ming Ye; Haralick, R.M.; Shapiro, L.G. (2003). "Estimating piecewise-smooth optical flow with global matching and graduated optimization". IEEE Transactions on Pattern Analysis and Machine Intelligence. 25 (12): 1625–30. doi:10.1109/TPAMI.2003.1251156.