David Wolpert

Last updated
David H. Wolpert
Nationality American
Alma mater Princeton University
University of California, Santa Barbara
Scientific career
Fields Mathematics
Computer science
Institutions Santa Fe Institute
Doctoral advisor Anthony Zee

David Hilton Wolpert is an American physicist and computer scientist. He is a professor at Santa Fe Institute. He is the author of three books, three patents, over one hundred refereed papers, and has received two awards. His name is particularly associated with a theorem in computer science known as "no free lunch".

Contents

Career

David Wolpert took a B.A. in Physics at Princeton University (1984), then attended the University of California, Santa Barbara, where he took the degrees of M.A. (1987) and Ph.D. (1989).

Between 1989 and 1997 he pursued a research career at Los Alamos National Laboratory, IBM, TXN Inc. and Santa Fe Institute.

From 1997 to 2011 he worked as senior computer scientist at NASA Ames Research Center, and became visiting scholar at the Max Planck Institute. He spent the year 2010-11 as Ulam Scholar at the Center for Nonlinear Studies at Los Alamos. [1]

He joined the faculty of Santa Fe Institute in 2011 and became a professor there in September 2013. [2] His research interests have included statistics, game theory, machine learning applications, information theory, optimization methods and complex systems theory.

"No free lunch"

One of Wolpert’s most discussed achievements is known as No free lunch in search and optimization. [3] [4] [5] [6] By this theorem, all algorithms for search and optimization perform equally well averaged over all problems in the class with which they are designed to deal. However, in a machine learning context, the theorem makes an implicit artificial assumption regarding the lack of overlap between training and test data that is rarely true in practice [7] . More generally, the theorem holds only under certain conditions that are not often encountered precisely in real life, [8] [9] [10] although it has been claimed that the conditions can be met approximately. [11] The theorem lies within the domain of computer science, but a weaker version known as the “folkloric no free lunch theorem” has been drawn upon by William A. Dembski in support of intelligent design. [12] This use of the theorem has been rejected by Wolpert himself [13] and others. [14] [15]

Limitation on knowledge

Wolpert has formalized an argument to show that it is in principle impossible for any intellect to know everything about the universe of which it forms a part, in other words disproving "Laplace's demon". [16] This has been seen as an extension of the limitative theorems of the twentieth century such as those of Heisenberg and Gödel. [17] In 2018 Wolpert published a proof revealing the fundamental limits of scientific knowledge. [18]

Machine learning

Wolpert made many contributions to the early work on machine learning. These include a Bayesian estimator of the entropy of a distribution based on samples of the distribution, [19] [20] disproving formal claims that the "evidence procedure" is equivalent to hierarchical Bayes, [21] a Bayesian alternative to the chi-squared test, [22] a proof that there is no prior for which the bootstrap procedure is Bayes-optimal, [23] and Bayesian extensions of the bias-plus-variance decomposition. [24] Most prominently, he introduced "stacked generalization", [25] a more sophisticated version of cross-validation that uses held-in / held-out partitions of a data set to combine learning algorithms rather than just choose one of them. This work was developed further by Breiman, Smyth, Clarke and many others, and in particular the top two winners of 2009 Netflix competition made use of stacked generalization (rebranded as "blending"). [26]

Academic memberships

Awards

Publications (books only)

Related Research Articles

Information theory is the mathematical study of the quantification, storage, and communication of information. The field was originally established by the works of Harry Nyquist and Ralph Hartley, in the 1920s, and Claude Shannon in the 1940s. The field, in applied mathematics, is at the intersection of probability theory, statistics, computer science, statistical mechanics, information engineering, and electrical engineering.

<span class="mw-page-title-main">Quantum information</span> Information held in the state of a quantum system

Quantum information is the information of the state of a quantum system. It is the basic entity of study in quantum information theory, and can be manipulated using quantum information processing techniques. Quantum information refers to both the technical definition in terms of Von Neumann entropy and the general computational term.

<span class="mw-page-title-main">Genetic algorithm</span> Competitive algorithm for searching a problem space

In computer science and operations research, a genetic algorithm (GA) is a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on biologically inspired operators such as mutation, crossover and selection. Some examples of GA applications include optimizing decision trees for better performance, solving sudoku puzzles, hyperparameter optimization, causal inference, etc.

<span class="mw-page-title-main">Evolutionary algorithm</span> Subset of evolutionary computation

In computational intelligence (CI), an evolutionary algorithm (EA) is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions. Evolution of the population then takes place after the repeated application of the above operators.

In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they are a family of population-based trial and error problem solvers with a metaheuristic or stochastic optimization character.

<span class="mw-page-title-main">Santa Fe Institute</span> Nonprofit theoretical research institute in Santa Fe, New Mexico, USA

The Santa Fe Institute (SFI) is an independent, nonprofit theoretical research institute located in Santa Fe, New Mexico, United States and dedicated to the multidisciplinary study of the fundamental principles of complex adaptive systems, including physical, computational, biological, and social systems. The institute is ranked 24th among the world's "Top Science and Technology Think Tanks" and 24th among the world's "Best Transdisciplinary Research Think Tanks" according to the 2020 edition of the Global Go To Think Tank Index Reports, published annually by the University of Pennsylvania.

In computer science and mathematical optimization, a metaheuristic is a higher-level procedure or heuristic designed to find, generate, tune, or select a heuristic that may provide a sufficiently good solution to an optimization problem or a machine learning problem, especially with incomplete or imperfect information or limited computation capacity. Metaheuristics sample a subset of solutions which is otherwise too large to be completely enumerated or otherwise explored. Metaheuristics may make relatively few assumptions about the optimization problem being solved and so may be usable for a variety of problems.

Specified complexity is a creationist argument introduced by William Dembski, used by advocates to promote the pseudoscience of intelligent design. According to Dembski, the concept can formalize a property that singles out patterns that are both specified and complex, where in Dembski's terminology, a specified pattern is one that admits short descriptions, whereas a complex pattern is one that is unlikely to occur by chance. Proponents of intelligent design use specified complexity as one of their two main arguments, alongside irreducible complexity.

In mathematical folklore, the "no free lunch" (NFL) theorem of David Wolpert and William Macready, alludes to the saying "no such thing as a free lunch", that is, there are no easy shortcuts to success. It appeared in the 1997 "No Free Lunch Theorems for Optimization". Wolpert had previously derived no free lunch theorems for machine learning.

<span class="mw-page-title-main">No free lunch in search and optimization</span> Average solution cost is the same with any method

In computational complexity and optimization the no free lunch theorem is a result that states that for certain types of mathematical problems, the computational cost of finding a solution, averaged over all problems in the class, is the same for any solution method. The name alludes to the saying "no such thing as a free lunch", that is, no method offers a "short cut". This is under the assumption that the search space is a probability density function. It does not apply to the case where the search space has underlying structure that can be exploited more efficiently than random search or even has closed-form solutions that can be determined without search at all. For such probabilistic assumptions, the outputs of all procedures solving a particular type of problem are statistically identical. A colourful way of describing such a circumstance, introduced by David Wolpert and William G. Macready in connection with the problems of search and optimization, is to say that there is no free lunch. Wolpert had previously derived no free lunch theorems for machine learning. Before Wolpert's article was published, Cullen Schaffer independently proved a restricted version of one of Wolpert's theorems and used it to critique the current state of machine learning research on the problem of induction.

<span class="mw-page-title-main">Estimation of distribution algorithm</span>

Estimation of distribution algorithms (EDAs), sometimes called probabilistic model-building genetic algorithms (PMBGAs), are stochastic optimization methods that guide the search for the optimum by building and sampling explicit probabilistic models of promising candidate solutions. Optimization is viewed as a series of incremental updates of a probabilistic model, starting with the model encoding an uninformative prior over admissible solutions and ending with the model that generates only the global optima.

Formal epistemology uses formal methods from decision theory, logic, probability theory and computability theory to model and reason about issues of epistemological interest. Work in this area spans several academic fields, including philosophy, computer science, economics, and statistics. The focus of formal epistemology has tended to differ somewhat from that of traditional epistemology, with topics like uncertainty, induction, and belief revision garnering more attention than the analysis of knowledge, skepticism, and issues with justification.

A memetic algorithm (MA) in computer science and operations research, is an extension of the traditional genetic algorithm (GA) or more general evolutionary algorithm (EA). It may provide a sufficiently good solution to an optimization problem. It uses a suitable heuristic or local search technique to improve the quality of solutions generated by the EA and to reduce the likelihood of premature convergence.

Stochastic optimization (SO) methods are optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems.

<span class="mw-page-title-main">Robert J. Marks II</span> American engineer and intelligent design advocate (born 1950)

Robert Jackson Marks II is an American electrical engineer, computer scientist and Distinguished Professor at Baylor University. His contributions include the Zhao-Atlas-Marks (ZAM) time-frequency distribution in the field of signal processing, the Cheung–Marks theorem in Shannon sampling theory and the Papoulis-Marks-Cheung (PMC) approach in multidimensional sampling. He was instrumental in the defining of the field of computational intelligence and co-edited the first book using computational intelligence in the title. A Christian and an old earth creationist, he is a subject of the 2008 pro-intelligent design motion picture, Expelled: No Intelligence Allowed.

<span class="mw-page-title-main">Ensemble learning</span> Statistics and machine learning technique

In statistics and machine learning, ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Unlike a statistical ensemble in statistical mechanics, which is usually infinite, a machine learning ensemble consists of only a concrete finite set of alternative models, but typically allows for much more flexible structure to exist among those alternatives.

Kimeme is an open platform for multi-objective optimization and multidisciplinary design optimization. It is intended to be coupled with external numerical software such as computer-aided design (CAD), finite element analysis (FEM), structural analysis and computational fluid dynamics tools. It was developed by Cyber Dyne Srl and provides both a design environment for problem definition and analysis and a software network infrastructure to distribute the computational load.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.

Multi-task optimization is a paradigm in the optimization literature that focuses on solving multiple self-contained tasks simultaneously. The paradigm has been inspired by the well-established concepts of transfer learning and multi-task learning in predictive analytics.

References

  1. "CNLS Ulam Scholar". Archived from the original on 2014-10-26. Retrieved 2014-09-22.
  2. David Wolpert, Santa Fe Institute
  3. Wolpert, D.H., Macready, W.G. (1995), No Free Lunch Theorems for Search, Technical Report SFI-TR-95-02-010 (Santa Fe Institute).
  4. Wolpert D.H., Macready W.G. (1997). "No Free Lunch Theorems for Optimization" (PDF). IEEE Transactions on Evolutionary Computation. 1: 67. CiteSeerX   10.1.1.138.6606 . doi:10.1109/4235.585893. S2CID   5553697.
  5. Wolpert, David (1996), The Lack of A Priori Distinctions between Learning Algorithms, Neural Computation, pp. 1341–1390.
  6. David H. Wolpert, What the No Free Lunch Theorems Really Mean; How to Improve Search Algorithms, SFI Working Paper 2012-10-017, Santa Fe Institute 2012
  7. Baxter, Jonathan (1999). "Some Observations Concerning Off-Training-Set (OTS) Error". arXiv: 1912.05915 .
  8. Streeter, M. (2003) Two Broad Classes of Functions for Which a No Free Lunch Result Does Not Hold, Genetic and Evolutionary Computation – GECCO 2003, pp. 1418–1430.
  9. Igel C., Toussaint M. (2004). "A No-Free-Lunch Theorem for Non-Uniform Distributions of Target Functions". Journal of Mathematical Modelling and Algorithms. 3 (4): 313–322. CiteSeerX   10.1.1.71.9744 . doi:10.1023/b:jmma.0000049381.24625.f7. S2CID   195292166.
  10. English, T. (2004), No More Lunch: Analysis of Sequential Search, Proceedings of the 2004 IEEE Congress on Evolutionary Computation, pp. 227–234.
  11. Droste S., Jansen T., Wegener I. (2002). "Optimization with randomized search heuristics: the (A)NFL theorem, realistic scenarios, and difficult functions". Theoretical Computer Science. 287 (1): 131–144. doi:10.1016/s0304-3975(02)00094-4. hdl: 2003/5394 .{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. Dembski, W. A. (2002) No Free Lunch, Rowman & Littlefield, ISBN   0-7425-1297-5
  13. Wolpert, D. (2003), William Dembski's treatment of the No Free Lunch theorems is written in jello, Talk Reason
  14. Perakh, M. (2003), The No Free Lunch Theorems and Their Application to Evolutionary Algorithms, Talk Reason.
  15. Richard Wein (2002), Not a Free Lunch But a Box of Chocolates (Sect. 5.3), The TalkOrigins Archive
  16. David H. Wolpert (2008). "Physical limits of inference". Physica D. 237 (9): 1257–1281. arXiv: 0708.1362 . Bibcode:2008PhyD..237.1257W. doi:10.1016/j.physd.2008.03.040. S2CID   2033616. full text
  17. Graham P. Collins, Within Any Possible Universe, No Intellect Can Ever Know It All, Scientific American, 16 February 2009
  18. "New proof reveals fundamental limits of scientific knowledge" . Retrieved 2018-10-04.
  19. David H. Wolpert and David Wolf (1995). "Estimating Functions of Probability Distributions from a Finite Set of Samples". Physical Review E. 52 (6): 6841–6854. Bibcode:1995PhRvE..52.6841W. CiteSeerX   10.1.1.55.7122 . doi:10.1103/physreve.52.6841. PMID   9964199. S2CID   9795679.
  20. David H. Wolpert and Simon DeDeo (2013). "Estimating Functions of Distributions Defined over Spaces of Unknown Size". Entropy. 15 (12): 4668–4699. arXiv: 1311.4548 . Bibcode:2013Entrp..15.4668W. doi: 10.3390/e15114668 . S2CID   2737117.
  21. David H. Wolpert and Charles E. Strauss (1996). "What Bayes has to say about the evidence procedure". Maximum Entropy and Bayesian Methods 1993.
  22. David H. Wolpert (1996). "Determining Whether Two Data Sets are from the Same Distribution". Maximum Entropy and Bayesian Methods 1995.
  23. David H. Wolpert (1996). "The Bootstrap is Inconsistent with Probability Theory". Maximum Entropy and Bayesian Methods 1995.
  24. David H. Wolpert (1997). "On Bias plus Variance". Neural Computation. 9 (6): 1211–1243. doi:10.1162/neco.1997.9.6.1211. S2CID   15418441.
  25. David H. Wolpert (1992). "Stacked Generalization". Neural Networks. 5 (2): 241–259. CiteSeerX   10.1.1.133.8090 . doi:10.1016/s0893-6080(05)80023-1.
  26. Joseph Sill; et al. (2008). "Feature-Weighted Linear Stacking". Physica D: Nonlinear Phenomena. 237 (9): 1257–1281. arXiv: 0708.1362 . Bibcode:2008PhyD..237.1257W. doi:10.1016/j.physd.2008.03.040. S2CID   2033616.