Bayesian probability

Last updated • 8 min readFrom Wikipedia, The Free Encyclopedia

Bayesian probability ( /ˈbziən/ BAY-zee-ən or /ˈbʒən/ BAY-zhən) [1] is an interpretation of the concept of probability, in which, instead of frequency or propensity of some phenomenon, probability is interpreted as reasonable expectation [2] representing a state of knowledge [3] or as quantification of a personal belief. [4]

Contents

The Bayesian interpretation of probability can be seen as an extension of propositional logic that enables reasoning with hypotheses; [5] [6] that is, with propositions whose truth or falsity is unknown. In the Bayesian view, a probability is assigned to a hypothesis, whereas under frequentist inference, a hypothesis is typically tested without being assigned a probability.

Bayesian probability belongs to the category of evidential probabilities; to evaluate the probability of a hypothesis, the Bayesian probabilist specifies a prior probability. This, in turn, is then updated to a posterior probability in the light of new, relevant data (evidence). [7] The Bayesian interpretation provides a standard set of procedures and formulae to perform this calculation.

The term Bayesian derives from the 18th-century mathematician and theologian Thomas Bayes, who provided the first mathematical treatment of a non-trivial problem of statistical data analysis using what is now known as Bayesian inference. [8] :131 Mathematician Pierre-Simon Laplace pioneered and popularized what is now called Bayesian probability. [8] :97–98

Bayesian methodology

Bayesian methods are characterized by concepts and procedures as follows:

Objective and subjective Bayesian probabilities

Broadly speaking, there are two interpretations of Bayesian probability. For objectivists, who interpret probability as an extension of logic, probability quantifies the reasonable expectation that everyone (even a "robot") who shares the same knowledge should share in accordance with the rules of Bayesian statistics, which can be justified by Cox's theorem. [3] [10] For subjectivists, probability corresponds to a personal belief. [4] Rationality and coherence allow for substantial variation within the constraints they pose; the constraints are justified by the Dutch book argument or by decision theory and de Finetti's theorem. [4] The objective and subjective variants of Bayesian probability differ mainly in their interpretation and construction of the prior probability.

History

The term Bayesian derives from Thomas Bayes (1702–1761), who proved a special case of what is now called Bayes' theorem in a paper titled "An Essay Towards Solving a Problem in the Doctrine of Chances". [11] In that special case, the prior and posterior distributions were beta distributions and the data came from Bernoulli trials. It was Pierre-Simon Laplace (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics, medical statistics, reliability, and jurisprudence. [12] Early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability" (because it infers backwards from observations to parameters, or from effects to causes). [13] After the 1920s, "inverse probability" was largely supplanted by a collection of methods that came to be called frequentist statistics. [13]

In the 20th century, the ideas of Laplace developed in two directions, giving rise to objective and subjective currents in Bayesian practice. Harold Jeffreys' Theory of Probability (first published in 1939) played an important role in the revival of the Bayesian view of probability, followed by works by Abraham Wald (1950) and Leonard J. Savage (1954). The adjective Bayesian itself dates to the 1950s; the derived Bayesianism, neo-Bayesianism is of 1960s coinage. [14] [15] [16] In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. [17] No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo methods and the consequent removal of many of the computational problems, and to an increasing interest in nonstandard, complex applications. [18] While frequentist statistics remains strong (as demonstrated by the fact that much of undergraduate teaching is based on it [19] ), Bayesian methods are widely accepted and used, e.g., in the field of machine learning. [20]

Justification

The use of Bayesian probabilities as the basis of Bayesian inference has been supported by several arguments, such as Cox axioms, the Dutch book argument, arguments based on decision theory and de Finetti's theorem.

Axiomatic approach

Richard T. Cox showed that Bayesian updating follows from several axioms, including two functional equations and a hypothesis of differentiability. [10] [21] The assumption of differentiability or even continuity is controversial; Halpern found a counterexample based on his observation that the Boolean algebra of statements may be finite. [22] Other axiomatizations have been suggested by various authors with the purpose of making the theory more rigorous. [9]

Dutch book approach

Bruno de Finetti proposed the Dutch book argument based on betting. A clever bookmaker makes a Dutch book by setting the odds and bets to ensure that the bookmaker profits—at the expense of the gamblers—regardless of the outcome of the event (a horse race, for example) on which the gamblers bet. It is associated with probabilities implied by the odds not being coherent.

However, Ian Hacking noted that traditional Dutch book arguments did not specify Bayesian updating: they left open the possibility that non-Bayesian updating rules could avoid Dutch books. For example, Hacking writes [23] [24] "And neither the Dutch book argument, nor any other in the personalist arsenal of proofs of the probability axioms, entails the dynamic assumption. Not one entails Bayesianism. So the personalist requires the dynamic assumption to be Bayesian. It is true that in consistency a personalist could abandon the Bayesian model of learning from experience. Salt could lose its savour."

In fact, there are non-Bayesian updating rules that also avoid Dutch books (as discussed in the literature on "probability kinematics" [25] following the publication of Richard C. Jeffrey's rule, which is itself regarded as Bayesian [26] ). The additional hypotheses sufficient to (uniquely) specify Bayesian updating are substantial [27] and not universally seen as satisfactory. [28]

Decision theory approach

A decision-theoretic justification of the use of Bayesian inference (and hence of Bayesian probabilities) was given by Abraham Wald, who proved that every admissible statistical procedure is either a Bayesian procedure or a limit of Bayesian procedures. [29] Conversely, every Bayesian procedure is admissible. [30]

Personal probabilities and objective methods for constructing priors

Following the work on expected utility theory of Ramsey and von Neumann, decision-theorists have accounted for rational behavior using a probability distribution for the agent. Johann Pfanzagl completed the Theory of Games and Economic Behavior by providing an axiomatization of subjective probability and utility, a task left uncompleted by von Neumann and Oskar Morgenstern: their original theory supposed that all the agents had the same probability distribution, as a convenience. [31] Pfanzagl's axiomatization was endorsed by Oskar Morgenstern: "Von Neumann and I have anticipated ... [the question whether probabilities] might, perhaps more typically, be subjective and have stated specifically that in the latter case axioms could be found from which could derive the desired numerical utility together with a number for the probabilities (cf. p. 19 of The Theory of Games and Economic Behavior). We did not carry this out; it was demonstrated by Pfanzagl ... with all the necessary rigor". [32]

Ramsey and Savage noted that the individual agent's probability distribution could be objectively studied in experiments. Procedures for testing hypotheses about probabilities (using finite samples) are due to Ramsey (1931) and de Finetti (1931, 1937, 1964, 1970). Both Bruno de Finetti [33] [34] and Frank P. Ramsey [34] [35] acknowledge their debts to pragmatic philosophy, particularly (for Ramsey) to Charles S. Peirce. [34] [35]

The "Ramsey test" for evaluating probability distributions is implementable in theory, and has kept experimental psychologists occupied for a half century. [36] This work demonstrates that Bayesian-probability propositions can be falsified, and so meet an empirical criterion of Charles S. Peirce, whose work inspired Ramsey. (This falsifiability-criterion was popularized by Karl Popper. [37] [38] )

Modern work on the experimental evaluation of personal probabilities uses the randomization, blinding, and Boolean-decision procedures of the Peirce-Jastrow experiment. [39] Since individuals act according to different probability judgments, these agents' probabilities are "personal" (but amenable to objective study).

Personal probabilities are problematic for science and for some applications where decision-makers lack the knowledge or time to specify an informed probability-distribution (on which they are prepared to act). To meet the needs of science and of human limitations, Bayesian statisticians have developed "objective" methods for specifying prior probabilities.

Indeed, some Bayesians have argued the prior state of knowledge defines the (unique) prior probability-distribution for "regular" statistical problems; cf. well-posed problems. Finding the right method for constructing such "objective" priors (for appropriate classes of regular problems) has been the quest of statistical theorists from Laplace to John Maynard Keynes, Harold Jeffreys, and Edwin Thompson Jaynes. These theorists and their successors have suggested several methods for constructing "objective" priors (Unfortunately, it is not always clear how to assess the relative "objectivity" of the priors proposed under these methods):

Each of these methods contributes useful priors for "regular" one-parameter problems, and each prior can handle some challenging statistical models (with "irregularity" or several parameters). Each of these methods has been useful in Bayesian practice. Indeed, methods for constructing "objective" (alternatively, "default" or "ignorance") priors have been developed by avowed subjective (or "personal") Bayesians like James Berger (Duke University) and José-Miguel Bernardo (Universitat de València), simply because such priors are needed for Bayesian practice, particularly in science. [40] The quest for "the universal method for constructing priors" continues to attract statistical theorists. [40]

Thus, the Bayesian statistician needs either to use informed priors (using relevant expertise or previous data) or to choose among the competing methods for constructing "objective" priors.

See also

Related Research Articles

<span class="mw-page-title-main">Frequentist probability</span> Interpretation of probability

Frequentist probability or frequentism is an interpretation of probability; it defines an event's probability as the limit of its relative frequency in infinitely many trials . Probabilities can be found by a repeatable objective process. The continued use of frequentist methods in scientific inference, however, has been called into question.

The word probability has been used in a variety of ways since it was first applied to the mathematical study of games of chance. Does probability measure the real, physical, tendency of something to occur, or is it a measure of how strongly one believes it will occur, or does it draw on both these elements? In answering such questions, mathematicians interpret the probability values of probability theory.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

<span class="mw-page-title-main">Statistical hypothesis test</span> Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently supports a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to calculate a probability of a hypothesis, given prior evidence, and update it as more information becomes available. Fundamentally, Bayesian inference uses a prior distribution to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, interval estimation is the use of sample data to estimate an interval of possible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.

<span class="mw-page-title-main">Bruno de Finetti</span> Italian mathematician (1906–1985)

Bruno de Finetti was an Italian probabilist statistician and actuary, noted for the "operational subjective" conception of probability. The classic exposition of his distinctive theory is the 1937 "La prévision: ses lois logiques, ses sources subjectives", which discussed probability founded on the coherence of betting odds and the consequences of exchangeability.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

<span class="mw-page-title-main">Decision theory</span> Branch of applied probability theory

Decision theory or the theory of rational choice is a branch of probability, economics, and analytic philosophy that uses the tools of expected utility and probability to model how individuals should behave rationally under uncertainty. It differs from the cognitive and behavioral sciences in that it is prescriptive and concerned with identifying optimal decisions for a rational agent, rather than describing how people really do make decisions. Despite this, the field is important to the study of real human behavior by social scientists, as it lays the foundations for the rational agent models used to mathematically model and analyze individuals in fields such as sociology, economics, criminology, cognitive science, and political science.

<i>Theory of Games and Economic Behavior</i> Book by John von Neumann

Theory of Games and Economic Behavior, published in 1944 by Princeton University Press, is a book by mathematician John von Neumann and economist Oskar Morgenstern which is considered the groundbreaking text that created the interdisciplinary research field of game theory. In the introduction of its 60th anniversary commemorative edition from the Princeton University Press, the book is described as "the classic work upon which modern-day game theory is based."

The expected utility hypothesis is a foundational assumption in mathematical economics concerning decision making under uncertainty. It postulates that rational agents maximize utility, meaning the subjective desirability of their actions. Rational choice theory, a cornerstone of microeconomics, builds this postulate to model aggregate social behaviour.

In decision theory, subjective expected utility is the attractiveness of an economic opportunity as perceived by a decision-maker in the presence of risk. Characterizing the behavior of decision-makers as using subjective expected utility was promoted and axiomatized by L. J. Savage in 1954 following previous work by Ramsey and von Neumann. The theory of subjective expected utility combines two subjective concepts: first, a personal utility function, and second a personal probability distribution.

Imprecise probability generalizes probability theory to allow for partial probability specifications, and is applicable when information is scarce, vague, or conflicting, in which case a unique probability distribution may be hard to identify. Thereby, the theory aims to represent the available knowledge more accurately. Imprecision is useful for dealing with expert elicitation, because:

Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in favour of frequentist inference, Bayesian inference and decision theory. However, fiducial inference is important in the history of statistics since its development led to the parallel development of concepts and tools in theoretical statistics that are widely used. Some current research in statistical methodology is either explicitly linked to fiducial inference or is closely connected to it.

Statistics, in the modern sense of the word, began evolving in the 18th century in response to the novel needs of industrializing sovereign states.

The Foundations of Statistics are the mathematical and philosophical bases for statistical methods. These bases are the theoretical frameworks that ground and justify methods of statistical inference, estimation, hypothesis testing, uncertainty quantification, and the interpretation of statistical conclusions. Further, a foundation can be used to explain statistical paradoxes, provide descriptions of statistical laws, and guide the application of statistics to real-world problems.

Frequentist inference is a type of statistical inference based in frequentist probability, which treats “probability” in equivalent terms to “frequency” and draws conclusions from sample-data by means of emphasizing the frequency or proportion of findings in the data. Frequentist inference underlies frequentist statistics, in which the well-established methodologies of statistical hypothesis testing and confidence intervals are founded.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

<span class="mw-page-title-main">Bayesian inference in marketing</span>

In marketing, Bayesian inference allows for decision making and market research evaluation under uncertainty and with limited data. The communication between marketer and market can be seen as a form of Bayesian persuasion.

References

  1. "Bayesian". Merriam-Webster.com Dictionary . Merriam-Webster.
  2. Cox, R.T. (1946). "Probability, Frequency, and Reasonable Expectation". American Journal of Physics. 14 (1): 1–10. Bibcode:1946AmJPh..14....1C. doi:10.1119/1.1990764.
  3. 1 2 Jaynes, E.T. (1986). "Bayesian Methods: General Background". In Justice, J. H. (ed.). Maximum-Entropy and Bayesian Methods in Applied Statistics. Cambridge: Cambridge University Press. CiteSeerX   10.1.1.41.1055 .
  4. 1 2 3 de Finetti, Bruno (2017). Theory of Probability: A critical introductory treatment. Chichester: John Wiley & Sons Ltd. ISBN   9781119286370.
  5. Hailperin, Theodore (1996). Sentential Probability Logic: Origins, Development, Current Status, and Technical Applications. London: Associated University Presses. ISBN   0934223459.
  6. Howson, Colin (2001). "The Logic of Bayesian Probability". In Corfield, D.; Williamson, J. (eds.). Foundations of Bayesianism. Dordrecht: Kluwer. pp. 137–159. ISBN   1-4020-0223-8.
  7. Paulos, John Allen (5 August 2011). "The Mathematics of Changing Your Mind [by Sharon Bertsch McGrayne]". Book Review. New York Times. Archived from the original on 2022-01-01. Retrieved 2011-08-06.
  8. 1 2 Stigler, Stephen M. (March 1990). The history of statistics. Harvard University Press. ISBN   9780674403413.
  9. 1 2 Dupré, Maurice J.; Tipler, Frank J. (2009). "New axioms for rigorous Bayesian probability". Bayesian Analysis. 4 (3): 599–606. CiteSeerX   10.1.1.612.3036 . doi:10.1214/09-BA422.
  10. 1 2 Cox, Richard T. (1961). The algebra of probable inference (Reprint ed.). Baltimore, MD; London, UK: Johns Hopkins Press; Oxford University Press [distributor]. ISBN   9780801869822.
  11. McGrayne, Sharon Bertsch (2011). The Theory that Would not Die . [https://archive.org/details/theorythatwouldn0000mcgr/page/10 10  ], p. 10, at Google Books.
  12. Stigler, Stephen M. (1986). "Chapter 3" . The History of Statistics. Harvard University Press. ISBN   9780674403406.
  13. 1 2 Fienberg, Stephen. E. (2006). "When did Bayesian Inference become "Bayesian"?" (PDF). Bayesian Analysis. 1 (1): 5, 1–40. doi: 10.1214/06-BA101 . Archived from the original (PDF) on 10 September 2014.
  14. Harris, Marshall Dees (1959). "Recent developments of the so-called Bayesian approach to statistics". Agricultural Law Center. Legal-Economic Research. University of Iowa: 125 (fn. #52), 126. The works of Wald, Statistical Decision Functions (1950) and Savage, The Foundation of Statistics (1954) are commonly regarded starting points for current Bayesian approaches
  15. Annals of the Computation Laboratory of Harvard University. Vol. 31. 1962. p. 180. This revolution, which may or may not succeed, is neo-Bayesianism. Jeffreys tried to introduce this approach, but did not succeed at the time in giving it general appeal.
  16. Kempthorne, Oscar (1967). The Classical Problem of Inference—Goodness of Fit. Fifth Berkeley Symposium on Mathematical Statistics and Probability. p. 235. It is curious that even in its activities unrelated to ethics, humanity searches for a religion. At the present time, the religion being 'pushed' the hardest is Bayesianism.
  17. Bernardo, J.M. (2005). "Reference analysis". Bayesian Thinking - Modeling and Computation. Handbook of Statistics. Vol. 25. Handbook of Statistics. pp. 17–90. doi:10.1016/S0169-7161(05)25002-2. ISBN   9780444515391.
  18. Wolpert, R.L. (2004). "A conversation with James O. Berger". Statistical Science. 9: 205–218. doi: 10.1214/088342304000000053 .
  19. Bernardo, José M. (2006). A Bayesian mathematical statistics primer (PDF). ICOTS-7. Bern. Archived (PDF) from the original on 2022-10-09.
  20. Bishop, C.M. (2007). Pattern Recognition and Machine Learning. Springer.
  21. Smith, C. Ray; Erickson, Gary (1989). "From Rationality and Consistency to Bayesian Probability". In Skilling, John (ed.). Maximum Entropy and Bayesian Methods. Dordrecht: Kluwer. pp. 29–44. doi:10.1007/978-94-015-7860-8_2. ISBN   0-7923-0224-9.
  22. Halpern, J. (1999). "A counterexample to theorems of Cox and Fine" (PDF). Journal of Artificial Intelligence Research. 10: 67–85. doi: 10.1613/jair.536 . S2CID   1538503. Archived (PDF) from the original on 2022-10-09.
  23. Hacking (1967), Section 3, page 316
  24. Hacking (1988, page 124)
  25. Skyrms, Brian (1 January 1987). "Dynamic Coherence and Probability Kinematics". Philosophy of Science. 54 (1): 1–20. CiteSeerX   10.1.1.395.5723 . doi:10.1086/289350. JSTOR   187470. S2CID   120881078.
  26. Joyce, James (30 September 2003). "Bayes' Theorem". The Stanford Encyclopedia of Philosophy. stanford.edu.
  27. Fuchs, Christopher A.; Schack, Rüdiger (1 January 2012). "Bayesian Conditioning, the Reflection Principle, and Quantum Decoherence". In Ben-Menahem, Yemima; Hemmo, Meir (eds.). Probability in Physics . The Frontiers Collection. Springer Berlin Heidelberg. pp.  233–247. arXiv: 1103.5950 . doi:10.1007/978-3-642-21329-8_15. ISBN   9783642213281. S2CID   119215115.
  28. van Frassen, Bas (1989). Laws and Symmetry. Oxford University Press. ISBN   0-19-824860-1.
  29. Wald, Abraham (1950). Statistical Decision Functions. Wiley.
  30. Bernardo, José M.; Smith, Adrian F.M. (1994). Bayesian Theory. John Wiley. ISBN   0-471-92416-4.
  31. Pfanzagl (1967, 1968)
  32. Morgenstern (1976, page 65)
  33. Galavotti, Maria Carla (1 January 1989). "Anti-Realism in the Philosophy of Probability: Bruno de Finetti's Subjectivism". Erkenntnis. 31 (2/3): 239–261. doi:10.1007/bf01236565. JSTOR   20012239. S2CID   170802937.
  34. 1 2 3 Galavotti, Maria Carla (1 December 1991). "The notion of subjective probability in the work of Ramsey and de Finetti". Theoria. 57 (3): 239–259. doi:10.1111/j.1755-2567.1991.tb00839.x. ISSN   1755-2567.
  35. 1 2 Dokic, Jérôme; Engel, Pascal (2003). Frank Ramsey: Truth and Success. Routledge. ISBN   9781134445936.
  36. Davidson et al. (1957)
  37. Thornton, Stephen (7 August 2018). "Karl Popper". Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University.
  38. Popper, Karl (2002) [1959]. The Logic of Scientific Discovery (2nd ed.). Routledge. p. 57. ISBN   0-415-27843-0 via Google Books. (translation of 1935 original, in German).
  39. Peirce & Jastrow (1885)
  40. 1 2 Bernardo, J. M. (2005). "Reference Analysis". In Dey, D.K.; Rao, C. R. (eds.). Handbook of Statistics (PDF). Vol. 25. Amsterdam: Elsevier. pp. 17–90. Archived (PDF) from the original on 2022-10-09.

Bibliography

(Partly reprinted in Gärdenfors, Peter; Sahlin, Nils-Eric (1988). Decision, Probability, and Utility: Selected Readings. Cambridge University Press. ISBN   0-521-33658-9.)