No free lunch in search and optimization

Last updated
The problem is to rapidly find a solution among candidates a, b, and c that is as good as any other, where goodness is either 0 or 1. There are eight instances ("lunch plates") fxyz of the problem, where x, y, and z indicate the goodness of a, b, and c, respectively. Procedure ("restaurant") A evaluates candidates in the order a, b, c, and B evaluates candidates in reverse that order, but each "charges" 1 evaluation in 5 cases, 2 evaluations in 2 cases, and 3 evaluations in 1 case. No free lunch theorems figure.png
The problem is to rapidly find a solution among candidates a, b, and c that is as good as any other, where goodness is either 0 or 1. There are eight instances ("lunch plates") fxyz of the problem, where x,y, and z indicate the goodness of a, b, and c, respectively. Procedure ("restaurant") A evaluates candidates in the order a, b, c, and B evaluates candidates in reverse that order, but each "charges" 1 evaluation in 5 cases, 2 evaluations in 2 cases, and 3 evaluations in 1 case.

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure (e.g., is a differentiable function) that can be exploited more efficiently (e.g., Newton's method in optimization) than random search or even has closed-form solutions (e.g., the extrema of a quadratic polynomial) that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search [1] and optimization, [2] is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference). [3] Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction. [4]

Contents

In the "no free lunch" metaphor, each "restaurant" (problem-solving procedure) has a "menu" associating each "lunch plate" (problem) with a "price" (the performance of the procedure in solving the problem). The menus of restaurants are identical except in one regard – the prices are shuffled from one restaurant to the next. For an omnivore who is as likely to order each plate as any other, the average cost of lunch does not depend on the choice of restaurant. But a vegan who goes to lunch regularly with a carnivore who seeks economy might pay a high average cost for lunch. To methodically reduce the average cost, one must use advance knowledge of a) what one will order and b) what the order will cost at various restaurants. That is, improvement of performance in problem-solving hinges on using prior information to match procedures to problems. [2] [4]

In formal terms, there is no free lunch when the probability distribution on problem instances is such that all problem solvers have identically distributed results. In the case of search, a problem instance in this context is a particular objective function, and a result is a sequence of values obtained in evaluation of candidate solutions in the domain of the function. For typical interpretations of results, search is an optimization process. There is no free lunch in search if and only if the distribution on objective functions is invariant under permutation of the space of candidate solutions. [5] [6] [7] This condition does not hold precisely in practice, [6] but an "(almost) no free lunch" theorem suggests that it holds approximately. [8]

Overview

Some computational problems are solved by searching for good solutions in a space of candidate solutions. A description of how to repeatedly select candidate solutions for evaluation is called a search algorithm. On a particular problem, different search algorithms may obtain different results, but over all problems, they are indistinguishable. It follows that if an algorithm achieves superior results on some problems, it must pay with inferiority on other problems. In this sense there is no free lunch in search. [1] Alternatively, following Schaffer, [4] search performance is conserved. Usually search is interpreted as optimization, and this leads to the observation that there is no free lunch in optimization. [2]

"The 'no free lunch' theorem of Wolpert and Macready," as stated in plain language by Wolpert and Macready themselves, is that "any two algorithms are equivalent when their performance is averaged across all possible problems." [9] The "no free lunch" results indicate that matching algorithms to problems gives higher average performance than does applying a fixed algorithm to all.[ citation needed ] Igel and Toussaint [6] and English [7] have established a general condition under which there is no free lunch. While it is physically possible, it does not hold precisely. [6] Droste, Jansen, and Wegener have proved a theorem they interpret as indicating that there is "(almost) no free lunch" in practice. [8]

To make matters more concrete, consider an optimization practitioner confronted with a problem. Given some knowledge of how the problem arose, the practitioner may be able to exploit the knowledge in selection of an algorithm that will perform well in solving the problem. If the practitioner does not understand how to exploit the knowledge, or simply has no knowledge, then he or she faces the question of whether some algorithm generally outperforms others on real-world problems. The authors of the "(almost) no free lunch" theorem say that the answer is essentially no, but admit some reservations as to whether the theorem addresses practice. [8]

Theorems

A "problem" is, more formally, an objective function that associates candidate solutions with goodness values. A search algorithm takes an objective function as input and evaluates candidate solutions one-by-one. The output of the algorithm is the sequence of observed goodness values. [10] [11]

Wolpert and Macready stipulate that an algorithm never reevaluates a candidate solution, and that algorithm performance is measured on outputs. [2] For simplicity, we disallow randomness in algorithms. Under these conditions, when a search algorithm is run on every possible input, it generates each possible output exactly once. [7] Because performance is measured on the outputs, the algorithms are indistinguishable in how often they achieve particular levels of performance.

Some measures of performance indicate how well search algorithms do at optimization of the objective function. Indeed, there seems to be no interesting application of search algorithms in the class under consideration but to optimization problems. A common performance measure is the least index of the least value in the output sequence. This is the number of evaluations required to minimize the objective function. For some algorithms, the time required to find the minimum is proportional to the number of evaluations. [7]

The original no free lunch (NFL) theorems assume that all objective functions are equally likely to be input to search algorithms. [2] It has since been established that there is NFL if and only if, loosely speaking, "shuffling" objective functions has no impact on their probabilities. [6] [7] Although this condition for NFL is physically possible, it has been argued that it certainly does not hold precisely. [6]

The obvious interpretation of "not NFL" is "free lunch," but this is misleading. NFL is a matter of degree, not an all-or-nothing proposition. If the condition for NFL holds approximately, then all algorithms yield approximately the same results over all objective functions. [7] "Not NFL" implies only that algorithms are inequivalent overall by some measure of performance. For a performance measure of interest, algorithms may remain equivalent, or nearly so. [7]

Kolmogorov randomness

Almost all elements of the set of all possible functions (in the set-theoretic sense of "function") are Kolmogorov random, and hence the NFL theorems apply to a set of functions almost all of which cannot be expressed more compactly than as a lookup table that contains a distinct (and random) entry for each point in the search space. Functions that can be expressed more compactly (for example, by a mathematical expression of reasonable size) are by definition not Kolmogorov random.

Further, within the set of all possible objective functions, levels of goodness are equally represented among candidate solutions, hence good solutions are scattered throughout the space of candidates. Accordingly, a search algorithm will rarely evaluate more than a small fraction of the candidates before locating a very good solution. [11]

Almost all objective functions are of such high Kolmogorov complexity that they cannot be stored in a particular computer. [5] [7] [11] More precisely, if we model a given physical computer as a register machine with a given size memory on the order of the memories of modern computers, then most objective functions cannot be stored in their memories. There is more information in the typical objective function or algorithm than Seth Lloyd estimates the observable universe is capable of registering. [12] For instance, if each candidate solution is encoded as a sequence of 300 0's and 1's, and the goodness values are 0 and 1, then most objective functions have Kolmogorov complexity of at least 2300 bits, [13] and this is greater than Lloyd's bound of 1090 ≈ 2299 bits. It follows that the original "no free lunch" theorem does not apply to what can be stored in a physical computer; instead the so-called "tightened" no free lunch theorems need to be applied. It has also been shown that NFL results apply to incomputable functions. [14]

Formal synopsis

is the set of all objective functions f:XY, where is a finite solution space and is a finite poset. The set of all permutations of X is J. A random variable F is distributed on . For all j in J, F o j is a random variable distributed on , with P(F o j = f) = P(F = f o j−1) for all f in .

Let a(f) denote the output of search algorithm a on input f. If a(F) and b(F) are identically distributed for all search algorithms a and b, then F has an NFL distribution. This condition holds if and only if F and F o j are identically distributed for all j in J. [6] [7] In other words, there is no free lunch for search algorithms if and only if the distribution of objective functions is invariant under permutation of the solution space. [15] Set-theoretic NFL theorems have recently been generalized to arbitrary cardinality and . [16]

Origin

Wolpert and Macready give two principal NFL theorems, the first regarding objective functions that do not change while search is in progress, and the second regarding objective functions that may change. [2]

Theorem 1: For any pair of algorithms a1 and a2
where denotes the ordered set of size of the cost values associated to input values , is the function being optimized and is the conditional probability of obtaining a given sequence of cost values from algorithm run times on function .

In essence, this says that when all functions f are equally likely, the probability of observing an arbitrary sequence of m values in the course of search does not depend upon the search algorithm.

The second theorem establishes a "more subtle" NFL result for time-varying objective functions. [2]

Interpretations of results

A conventional, but not entirely accurate, interpretation of the NFL results is that "a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration". [17] Several comments are in order:

In practice, only highly compressible (far from random) objective functions fit in the storage of computers, and it is not the case that each algorithm performs well on almost all compressible functions. There is generally a performance advantage in incorporating prior knowledge of the problem into the algorithm. While the NFL results constitute, in a strict sense, full employment theorems for optimization professionals, it is important to bear the larger context in mind. For one thing, humans often have little prior knowledge to work with. For another, incorporating prior knowledge does not give much of a performance gain on some problems. Finally, human time is very expensive relative to computer time. There are many cases in which a company would choose to optimize a function slowly with an unmodified computer program rather than rapidly with a human-modified program.

The NFL results do not indicate that it is futile to take "pot shots" at problems with unspecialized algorithms. No one has determined the fraction of practical problems for which an algorithm yields good results rapidly. And there is a practical free lunch, not at all in conflict with theory. Running an implementation of an algorithm on a computer costs very little relative to the cost of human time and the benefit of a good solution. If an algorithm succeeds in finding a satisfactory solution in an acceptable amount of time, a small investment has yielded a big payoff. If the algorithm fails, then little is lost.

Recently some philosophers of science have argued that there are ways to circumvent the no free lunch theorems, by using "meta-induction". Wolpert addresses these arguments in [18] .

Coevolution

Wolpert and Macready have proved that there are free lunches in coevolutionary optimization. [9] Their analysis "covers 'self-play' problems. In these problems, the set of players work together to produce a champion, who then engages one or more antagonists in a subsequent multiplayer game." [9] That is, the objective is to obtain a good player, but without an objective function. The goodness of each player (candidate solution) is assessed by observing how well it plays against others. An algorithm attempts to use players and their quality of play to obtain better players. The player deemed best of all by the algorithm is the champion. Wolpert and Macready have demonstrated that some coevolutionary algorithms are generally superior to other algorithms in quality of champions obtained. Generating a champion through self-play is of interest in evolutionary computation and game theory. The results are inapplicable to coevolution of biological species, which does not yield champions. [9]

See also

Notes

  1. 1 2 Wolpert, D. H.; Macready, W. G. (1995). "No Free Lunch Theorems for Search". Technical Report SFI-TR-95-02-010. Santa Fe Institute. S2CID   12890367.
  2. 1 2 3 4 5 6 7 8 Wolpert, D. H.; Macready, W. G. (1997). "No Free Lunch Theorems for Optimization". IEEE Transactions on Evolutionary Computation. 1: 67–82. CiteSeerX   10.1.1.138.6606 . doi:10.1109/4235.585893. S2CID   5553697.
  3. Wolpert, David (1996). "The Lack of A Priori Distinctions between Learning Algorithms". Neural Computation. Vol. 8. pp. 1341–1390. doi:10.1162/neco.1996.8.7.1341. S2CID   207609360.
  4. 1 2 3 Schaffer, Cullen (1994). "A conservation law for generalization performance" (PDF). In Willian, H.; Cohen, W. (eds.). International Conference on Machine Learning. San Francisco: Morgan Kaufmann. pp. 259–265.
  5. 1 2 Streeter, M. (2003) "Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold," Genetic and Evolutionary Computation – GECCO 2003, pp. 1418–1430.
  6. 1 2 3 4 5 6 7 Igel, C., and Toussaint, M. (2004) "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions," Journal of Mathematical Modelling and Algorithms3, pp. 313–322.
  7. 1 2 3 4 5 6 7 8 9 English, T. (2004) No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227–234.
  8. 1 2 3 S. Droste, T. Jansen, and I. Wegener. 2002. "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions," Theoretical Computer Science, vol. 287, no. 1, pp. 131–144.
  9. 1 2 3 4 Wolpert, D.H., and Macready, W.G. (2005) "Coevolutionary free lunches," IEEE Transactions on Evolutionary Computation, 9(6): 721–735
  10. A search algorithm also outputs the sequence of candidate solutions evaluated, but that output is unused in this article.
  11. 1 2 3 4 5 English, T. M. (2000). "Optimization is easy and learning is hard in the typical function". Proceedings of the 2000 Congress on Evolutionary Computation. CEC00 (Cat. No.00TH8512). Vol. 2. pp. 924–931. doi:10.1109/CEC.2000.870741. ISBN   0-7803-6375-2. S2CID   11295575.
  12. Lloyd, S. (2002). "Computational capacity of the universe". Physical Review Letters. 88 (23): 237901–237904. arXiv: quant-ph/0110141 . Bibcode:2002PhRvL..88w7901L. doi:10.1103/PhysRevLett.88.237901. PMID   12059399. S2CID   6341263.
  13. Li, M.; Vitányi, P. (1997). An Introduction to Kolmogorov Complexity and Its Applications (2nd ed.). New York: Springer. ISBN   0-387-94868-6.
  14. Woodward, John R. (2009). "Computable and incomputable functions and search algorithms". IEEE International Conference on Intelligent Computing and Intelligent Systems, 2009. Vol. 1. IEEE. pp. 871–875. CiteSeerX   10.1.1.158.7782 .
  15. The "only if" part was first published by Schumacher, C. W. (2000). Black Box Search : Framework and Methods (PhD dissertation). The University of Tennessee, Knoxville. ProQuest   304620040.
  16. Rowe; Vose; Wright (2009). "Reinterpreting No Free Lunch". Evolutionary Computation. 17 (1): 117–129. doi:10.1162/evco.2009.17.1.117. PMID   19207090. S2CID   6251842.
  17. Ho, Y. C.; Pepyne, D. L. (2002). "Simple Explanation of the No-Free-Lunch Theorem and Its Implications". Journal of Optimization Theory and Applications. 115 (3): 549–570. doi:10.1023/A:1021251113462. S2CID   123041865.
  18. Wolpert, D. H. (2023). "The Implications of the No-Free-Lunch Theorems for Meta-induction". Journal for General Philosophy of Science. 1: 67–82. doi:10.1109/4235.585893. S2CID   5553697.

Related Research Articles

<span class="mw-page-title-main">Kolmogorov complexity</span> Measure of algorithmic complexity

In algorithmic information theory, the Kolmogorov complexity of an object, such as a piece of text, is the length of a shortest computer program that produces the object as output. It is a measure of the computational resources needed to specify the object, and is also known as algorithmic complexity, Solomonoff–Kolmogorov–Chaitin complexity, program-size complexity, descriptive complexity, or algorithmic entropy. It is named after Andrey Kolmogorov, who first published on the subject in 1963 and is a generalization of classical information theory.

<span class="mw-page-title-main">Linear programming</span> Method to solve optimization problems

Linear programming (LP), also called linear optimization, is a method to achieve the best outcome in a mathematical model whose requirements and objective are represented by linear relationships. Linear programming is a special case of mathematical programming.

<span class="mw-page-title-main">Mathematical optimization</span> Study of mathematical algorithms for optimization problems

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

<span class="mw-page-title-main">Particle swarm optimization</span> Iterative simulation method

In computational science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the search-space according to simple mathematical formula over the particle's position and velocity. Each particle's movement is influenced by its local best known position, but is also guided toward the best known positions in the search-space, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions.

Branch and bound is a method for solving optimization problems by breaking them down into smaller sub-problems and using a bounding function to eliminate sub-problems that cannot contain the optimal solution. It is an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm.

Global optimization is a branch of applied mathematics and numerical analysis that attempts to find the global minima or maxima of a function or a set of functions on a given set. It is usually described as a minimization problem because the maximization of the real-valued function is equivalent to the minimization of the function .

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

In mathematical folklore, the "no free lunch" (NFL) theorem of David Wolpert and William Macready, alludes to the saying "no such thing as a free lunch", that is, there are no easy shortcuts to success. It appeared in the 1997 "No Free Lunch Theorems for Optimization". Wolpert had previously derived no free lunch theorems for machine learning.

<span class="mw-page-title-main">Differential evolution</span> Method of mathematical optimization

In evolutionary computation, differential evolution (DE) is a method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. Such methods are commonly known as metaheuristics as they make few or no assumptions about the optimized problem and can search very large spaces of candidate solutions. However, metaheuristics such as DE do not guarantee an optimal solution is ever found.

In mathematical optimization theory, duality or the duality principle is the principle that optimization problems may be viewed from either of two perspectives, the primal problem or the dual problem. If the primal is a minimization problem then the dual is a maximization problem. Any feasible solution to the primal (minimization) problem is at least as large as any feasible solution to the dual (maximization) problem. Therefore, the solution to the primal is an upper bound to the solution of the dual, and the solution of the dual is a lower bound to the solution of the primal. This fact is called weak duality.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

In mathematical optimization, constrained optimization is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function, which is to be minimized, or a reward function or utility function, which is to be maximized. Constraints can be either hard constraints, which set conditions for the variables that are required to be satisfied, or soft constraints, which have some variable values that are penalized in the objective function if, and based on the extent that, the conditions on the variables are not satisfied.

Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems.

Multi-objective optimization or Pareto optimization is an area of multiple-criteria decision making that is concerned with mathematical optimization problems involving more than one objective function to be optimized simultaneously. Multi-objective is a type of vector optimization that has been applied in many fields of science, including engineering, economics and logistics where optimal decisions need to be taken in the presence of trade-offs between two or more conflicting objectives. Minimizing cost while maximizing comfort while buying a car, and maximizing performance whilst minimizing fuel consumption and emission of pollutants of a vehicle are examples of multi-objective optimization problems involving two and three objectives, respectively. In practical problems, there can be more than three objectives.

Kimeme is an open platform for multi-objective optimization and multidisciplinary design optimization. It is intended to be coupled with external numerical software such as computer-aided design (CAD), finite element analysis (FEM), structural analysis and computational fluid dynamics tools. It was developed by Cyber Dyne Srl and provides both a design environment for problem definition and analysis and a software network infrastructure to distribute the computational load.

Biogeography-based optimization (BBO) is an evolutionary algorithm (EA) that optimizes a function by stochastically and iteratively improving candidate solutions with regard to a given measure of quality, or fitness function. BBO belongs to the class of metaheuristics since it includes many variations, and since it does not make any assumptions about the problem and can therefore be applied to a wide class of problems.

David Hilton Wolpert is an American physicist and computer scientist. He is a professor at Santa Fe Institute. He is the author of three books, three patents, over one hundred refereed papers, and has received two awards. His name is particularly associated with a theorem in computer science known as "no free lunch".

<span class="mw-page-title-main">Simulation-based optimization</span>

Simulation-based optimization integrates optimization techniques into simulation modeling and analysis. Because of the complexity of the simulation, the objective function may become difficult and expensive to evaluate. Usually, the underlying simulation model is stochastic, so that the objective function must be estimated using statistical estimation techniques.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.