Random search

Last updated

Random search (RS) is a family of numerical optimization methods that do not require the gradient of the problem to be optimized, and RS can hence be used on functions that are not continuous or differentiable. Such optimization methods are also known as direct-search, derivative-free, or black-box methods.

Contents

Anderson in 1953 reviewed the progress of methods in finding maximum or minimum of problems using a series of guesses distributed with a certain order or pattern in the parameter searching space, e.g. a confounded design with exponentially distributed spacings/steps. [1] This search goes on sequentially on each parameter and refines iteratively on the best guesses from the last sequence. The pattern can be a grid (factorial) search of all parameters, a sequential search on each parameter, or a combination of both. The method was developed to screen the experimental conditions in chemical reactions by a number of scientists listed in Anderson's paper. A MATLAB code reproducing the sequential procedure for the general non-linear regression of an example mathematical model can be found here (JCFit @ GitHub). [2]

The name "random search" is attributed to Rastrigin [3] who made an early presentation of RS along with basic mathematical analysis. RS works by iteratively moving to better positions in the search space, which are sampled from a hypersphere surrounding the current position.

The algorithm described herein is a type of local random search, where every iteration is dependent on the prior iteration's candidate solution. There are alternative random search methods that sample from the entirety of the search space (for example pure random search or uniform global random search), but these are not described in this article.

Random search has been used in artificial neural network for hyper-parameter optimization. [4]

If good parts of the search space occupy 5% of the volume the chances of hitting a good configuration in search space is 5%. The probability of finding at least one good configuration is above 95% after trying out 60 configurations (, making use of the counterprobability).

Algorithm

Let f: n → ℝ be the fitness or cost function which must be minimized. Let x ∈ ℝn designate a position or candidate solution in the search-space. The basic RS algorithm can then be described as:

  1. Initialize x with a random position in the search-space.
  2. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following:
    1. Sample a new position y from the hypersphere of a given radius surrounding the current position x (see e.g. Marsaglia's technique for sampling a hypersphere.)
    2. If f(y) < f(x) then move to the new position by setting x = y

Variants

Scheme of random search using a non-linear regression problem as an example. The goal is to minimize the value of the penalty function. The right bottom shows a few example methods: 1. Non-structured random search, 2. structured random search, 3. Gauss-Newton algorithm, and 4. Levenberg-Marquardt algorithm. 1,2 do not need to know the gradient and 3,4 have to calculate the gradient and usually minimize on both A and k parameters at the same time (scheme only shows the k dimension). RandomSearchExample2.png
Scheme of random search using a non-linear regression problem as an example. The goal is to minimize the value of the penalty function. The right bottom shows a few example methods: 1. Non-structured random search, 2. structured random search, 3. Gauss-Newton algorithm, and 4. Levenberg-Marquardt algorithm. 1,2 do not need to know the gradient and 3,4 have to calculate the gradient and usually minimize on both A and k parameters at the same time (scheme only shows the k dimension).

Truly random search is purely by luck and varies from very costive to very lucky, but the structured random search is strategic. A number of RS variants have been introduced in the literature with structured sampling in the searching space:

See also

Related Research Articles

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

Random optimization (RO) is a family of numerical optimization methods that do not require the gradient of the problem to be optimized and RO can hence be used on functions that are not continuous or differentiable. Such optimization methods are also known as direct-search, derivative-free, or black-box methods.

<span class="mw-page-title-main">Simulated annealing</span> Probabilistic optimization technique and metaheuristic

Simulated annealing (SA) is a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. For large numbers of local optima, SA can find the global optima. It is often used when the search space is discrete. For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to exact algorithms such as gradient descent or branch and bound.

In computer science, local search is a heuristic method for solving computationally hard optimization problems. Local search can be used on problems that can be formulated as finding a solution maximizing a criterion among a number of candidate solutions. Local search algorithms move from solution to solution in the space of candidate solutions by applying local changes, until a solution deemed optimal is found or a time bound is elapsed.

<span class="mw-page-title-main">Hill climbing</span> Optimization algorithm

In numerical analysis, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

<span class="mw-page-title-main">Stochastic gradient descent</span> Optimization algorithm

Stochastic gradient descent is an iterative method for optimizing an objective function with suitable smoothness properties. It can be regarded as a stochastic approximation of gradient descent optimization, since it replaces the actual gradient by an estimate thereof. Especially in high-dimensional optimization problems this reduces the very high computational burden, achieving faster iterations in exchange for a lower convergence rate.

<i>k</i>-means clustering Vector quantization algorithm minimizing the sum of squared deviations

k-means clustering is a method of vector quantization, originally from signal processing, that aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. k-means clustering minimizes within-cluster variances, but not regular Euclidean distances, which would be the more difficult Weber problem: the mean optimizes squared errors, whereas only the geometric median minimizes Euclidean distances. For instance, better Euclidean solutions can be found using k-medians and k-medoids.

In mathematical optimization, a trust region is the subset of the region of the objective function that is approximated using a model function. If an adequate model of the objective function is found within the trust region, then the region is expanded; conversely, if the approximation is poor, then the region is contracted.

<span class="mw-page-title-main">Differential evolution</span>

In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the optimized problem and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

Histograms are most commonly used as visual representations of data. However, Database systems use histograms to summarize data internally and provide size estimates for queries. These histograms are not presented to users or displayed visually, so a wider range of options are available for their construction. Simple or exotic histograms are defined by four parameters, Sort Value, Source Value, Partition Class and Partition Rule. The most basic histogram is the equi-width histogram, where each bucket represents the same range of values. That histogram would be defined as having a Sort Value of Value, a Source Value of Frequency, be in the Serial Partition Class and have a Partition Rule stating that all buckets have the same range.

<span class="mw-page-title-main">Pattern search (optimization)</span>

Pattern search is a family of numerical optimization methods that does not require a gradient. As a result, it can be used on functions that are not continuous or differentiable. One such pattern search method is "convergence", which is based on the theory of positive bases. Optimization attempts to find the best match in a multidimensional analysis space of possibilities.

Augmented Lagrangian methods are a certain class of algorithms for solving constrained optimization problems. They have similarities to penalty methods in that they replace a constrained optimization problem by a series of unconstrained problems and add a penalty term to the objective, but the augmented Lagrangian method adds yet another term designed to mimic a Lagrange multiplier. The augmented Lagrangian is related to, but not identical with, the method of Lagrange multipliers.

In computational engineering, Luus–Jaakola (LJ) denotes a heuristic for global optimization of a real-valued function. In engineering use, LJ is not an algorithm that terminates with an optimal solution; nor is it an iterative method that generates a sequence of points that converges to an optimal solution. However, when applied to a twice continuously differentiable function, the LJ heuristic is a proper iterative method, that generates a sequence that has a convergent subsequence; for this class of problems, Newton's method is recommended and enjoys a quadratic rate of convergence, while no convergence rate analysis has been given for the LJ heuristic. In practice, the LJ heuristic has been recommended for functions that need be neither convex nor differentiable nor locally Lipschitz: The LJ heuristic does not use a gradient or subgradient when one be available, which allows its application to non-differentiable and non-convex problems.

In the study of algorithms, an LP-type problem is an optimization problem that shares certain properties with low-dimensional linear programs and that may be solved by similar algorithms. LP-type problems include many important optimization problems that are not themselves linear programs, such as the problem of finding the smallest circle containing a given set of planar points. They may be solved by a combination of randomized algorithms in an amount of time that is linear in the number of elements defining the problem, and subexponential in the dimension of the problem.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.

In evolutionary computation, Minimum Population Search (MPS) is a computational method that optimizes a problem by iteratively trying to improve a set of candidate solutions with regard to a given measure of quality. It solves a problem by evolving a small population of candidate solutions by means of relatively simple arithmetical operations.

References

  1. Anderson, R.L. (1953). "Recent Advances in Finding Best Operating Conditions". Journal of the American Statistical Association. 48 (264): 789–798. doi:10.2307/2281072.
  2. 1 2 "GitHub - Jixin Chen/jcfit: A Random Search Algorithm for general mathematical model(s) fittings".
  3. 1 2 Rastrigin, L.A. (1963). "The convergence of the random search method in the extremal control of a many parameter system". Automation and Remote Control. 24 (11): 1337–1342. Retrieved 30 November 2021. 1964 translation of Russian Avtomat. i Telemekh pages 1467–1473
  4. Bergstra, J.; Bengio, Y. (2012). "Random search for hyper-parameter optimization". Journal of Machine Learning Research. 13: 281–305.
  5. Friedman, M.; Savage, L.J. (1947). Planning experiments seeking maxima, chapter 13 of Techniques of Statistical Analysis, edited by Eisenhart, Hastay, and Wallis. McGraw-Hill Book Co., New York. pp. 363–372. Retrieved 30 November 2021 via Milton Friedman from Hoover Institution at Stanford University.
  6. 1 2 Schumer, M.A.; Steiglitz, K. (1968). "Adaptive step size random search". IEEE Transactions on Automatic Control. 13 (3): 270–276. CiteSeerX   10.1.1.118.9779 . doi:10.1109/tac.1968.1098903.
  7. Schrack, G.; Choit, M. (1976). "Optimized relative step size random searches". Mathematical Programming. 10 (1): 230–244. doi:10.1007/bf01580669.