Cox's theorem

Last updated

Cox's theorem, named after the physicist Richard Threlkeld Cox, is a derivation of the laws of probability theory from a certain set of postulates. [1] [2] This derivation justifies the so-called "logical" interpretation of probability, as the laws of probability derived by Cox's theorem are applicable to any proposition. Logical (also known as objective Bayesian) probability is a type of Bayesian probability. Other forms of Bayesianism, such as the subjective interpretation, are given other justifications.

Contents

Cox's assumptions

Cox wanted his system to satisfy the following conditions:

  1. Divisibility and comparability  The plausibility of a proposition is a real number and is dependent on information we have related to the proposition.
  2. Common sense  Plausibilities should vary sensibly with the assessment of plausibilities in the model.
  3. Consistency  If the plausibility of a proposition can be derived in many ways, all the results must be equal.

The postulates as stated here are taken from Arnborg and Sjödin. [3] [4] [5] "Common sense" includes consistency with Aristotelian logic in the sense that logically equivalent propositions shall have the same plausibility.

The postulates as originally stated by Cox were not mathematically rigorous (although more so than the informal description above), as noted by Halpern. [6] [7] However it appears to be possible to augment them with various mathematical assumptions made either implicitly or explicitly by Cox to produce a valid proof.

Cox's notation:

The plausibility of a proposition given some related information is denoted by .

Cox's postulates and functional equations are:

In form of a functional equation
Because of the associative nature of the conjunction in propositional logic, the consistency with logic gives a functional equation saying that the function is an associative binary operation.
All strictly increasing associative binary operations on the real numbers are isomorphic to multiplication of numbers in a subinterval of [0, +∞], which means that there is a monotonic function mapping plausibilities to [0, +∞] such that
This shall hold for any proposition , which leads to
This shall hold for any proposition , which, without loss of generality, leads to a solution
Due to the requirement of monotonicity, this means that maps plausibilities to interval [0, 1].
This postulates the existence of a function such that
Because "a double negative is an affirmative", consistency with logic gives a functional equation
saying that the function is an involution, i.e., it is its own inverse.
The above functional equations and consistency with logic imply that
Since is logically equivalent to , we also get
If, in particular, , then also and and we get
and
Abbreviating and we get the functional equation

Implications of Cox's postulates

The laws of probability derivable from these postulates are the following. [8] Let be the plausibility of the proposition given satisfying Cox's postulates. Then there is a function mapping plausibilities to interval [0,1] and a positive number such that

  1. Certainty is represented by

It is important to note that the postulates imply only these general properties. We may recover the usual laws of probability by setting a new function, conventionally denoted or , equal to . Then we obtain the laws of probability in a more familiar form:

  1. Certain truth is represented by , and certain falsehood by

Rule 2 is a rule for negation, and rule 3 is a rule for conjunction. Given that any proposition containing conjunction, disjunction, and negation can be equivalently rephrased using conjunction and negation alone (the conjunctive normal form), we can now handle any compound proposition.

The laws thus derived yield finite additivity of probability, but not countable additivity. The measure-theoretic formulation of Kolmogorov assumes that a probability measure is countably additive. This slightly stronger condition is necessary for the proof of certain theorems.[ citation needed ]

Interpretation and further discussion

Cox's theorem has come to be used as one of the justifications for the use of Bayesian probability theory. For example, in Jaynes it is discussed in detail in chapters 1 and 2 and is a cornerstone for the rest of the book. [8] Probability is interpreted as a formal system of logic, the natural extension of Aristotelian logic (in which every statement is either true or false) into the realm of reasoning in the presence of uncertainty.

It has been debated to what degree the theorem excludes alternative models for reasoning about uncertainty. For example, if certain "unintuitive" mathematical assumptions were dropped then alternatives could be devised, e.g., an example provided by Halpern. [6] However Arnborg and Sjödin [3] [4] [5] suggest additional "common sense" postulates, which would allow the assumptions to be relaxed in some cases while still ruling out the Halpern example. Other approaches were devised by Hardy [9] or Dupré and Tipler. [10]

The original formulation of Cox's theorem is in Cox (1946), which is extended with additional results and more discussion in Cox (1961). Jaynes [8] cites Abel [11] for the first known use of the associativity functional equation. János Aczél [12] provides a long proof of the "associativity equation" (pages 256-267). Jaynes [8] :27 reproduces the shorter proof by Cox in which differentiability is assumed. A guide to Cox's theorem by Van Horn aims at comprehensively introducing the reader to all these references. [13]

Baoding Liu, the founder of uncertainty theory, criticizes Cox's theorem for presuming that the truth value of conjunction is a twice differentiable function of truth values of the two propositions and , i.e., , which excludes uncertainty theory's "uncertain measure" from its start, because the function , used in uncertainty theory, is not differentiable with respect to and . [14] According to Liu, "there does not exist any evidence that the truth value of conjunction is completely determined by the truth values of individual propositions, let alone a twice differentiable function." [14]

See also

Related Research Articles

An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word ἀξίωμα (axíōma), meaning 'that which is thought worthy or fit' or 'that which commends itself as evident'.

In propositional logic, modus tollens (MT), also known as modus tollendo tollens and denying the consequent, is a deductive argument form and a rule of inference. Modus tollens is a mixed hypothetical syllogism that takes the form of "If P, then Q. Not Q. Therefore, not P." It is an application of the general truth that if a statement is true, then so is its contrapositive. The form shows that inference from P implies Q to the negation of Q implies the negation of P is a valid argument.

<span class="mw-page-title-main">Statistical inference</span> Process of using data analysis

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the individual is typical of the population as a whole.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

<span class="mw-page-title-main">Mathematical proof</span> Reasoning for mathematical statements

A mathematical proof is a deductive argument for a mathematical statement, showing that the stated assumptions logically guarantee the conclusion. The argument may use other previously established statements, such as theorems; but every proof can, in principle, be constructed using only certain basic or original assumptions known as axioms, along with the accepted rules of inference. Proofs are examples of exhaustive deductive reasoning which establish logical certainty, to be distinguished from empirical arguments or non-exhaustive inductive reasoning which establish "reasonable expectation". Presenting many cases in which the statement holds is not enough for a proof, which must demonstrate that the statement is true in all possible cases. A proposition that has not been proved but is believed to be true is known as a conjecture, or a hypothesis if frequently used as an assumption for further mathematical work.

The principle of maximum entropy states that the probability distribution which best represents the current state of knowledge about a system is the one with largest entropy, in the context of precisely stated prior data.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). While it is one of several forms of causal notation, causal networks are special cases of Bayesian networks. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Dempster–Shafer theory</span> Mathematical framework to model epistemic uncertainty

The theory of belief functions, also referred to as evidence theory or Dempster–Shafer theory (DST), is a general framework for reasoning with uncertainty, with understood connections to other frameworks such as probability, possibility and imprecise probability theories. First introduced by Arthur P. Dempster in the context of statistical inference, the theory was later developed by Glenn Shafer into a general framework for modeling epistemic uncertainty—a mathematical theory of evidence. The theory allows one to combine evidence from different sources and arrive at a degree of belief that takes into account all the available evidence.

Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability, where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation, which views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.

A prior probability distribution of an uncertain quantity, often simply called the prior, is its assumed probability distribution before some evidence is taken into account. For example, the prior could be the probability distribution representing the relative proportions of voters who will vote for a particular politician in a future election. The unknown quantity may be a parameter of the model or a latent variable rather than an observable variable.

<span class="mw-page-title-main">Conditional independence</span> Probability theory concept

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If is the hypothesis, and and are observations, conditional independence can be stated as an equality:

Probabilistic logic involves the use of probability and logic to deal with uncertain situations. Probabilistic logic extends traditional logic truth tables with probabilistic expressions. A difficulty of probabilistic logics is their tendency to multiply the computational complexities of their probabilistic and logical components. Other difficulties include the possibility of counter-intuitive results, such as in case of belief fusion in Dempster–Shafer theory. Source trust and epistemic uncertainty about the probabilities they provide, such as defined in subjective logic, are additional elements to consider. The need to deal with a broad variety of contexts and issues has led to many different proposals.

Axiomatic constructive set theory is an approach to mathematical constructivism following the program of axiomatic set theory. The same first-order language with "" and "" of classical set theory is usually used, so this is not to be confused with a constructive types approach. On the other hand, some constructive theories are indeed motivated by their interpretability in type theories.

In statistics, an exchangeable sequence of random variables is a sequence X1X2X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences

Subjective logic is a type of probabilistic logic that explicitly takes epistemic uncertainty and source trust into account. In general, subjective logic is suitable for modeling and analysing situations involving uncertainty and relatively unreliable sources. For example, it can be used for modeling and analysing trust networks and Bayesian networks.

<span class="mw-page-title-main">Revenue equivalence</span>

Revenue equivalence is a concept in auction theory that states that given certain conditions, any mechanism that results in the same outcomes also has the same expected revenue.

<span class="mw-page-title-main">Conditional probability</span> Probability of an event occurring, given that another event has already occurred

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B) or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): .

<span class="mw-page-title-main">Bayesian programming</span> Statistics concept

Bayesian programming is a formalism and a methodology for having a technique to specify probabilistic models and solve problems when less than the necessary information is available.

References

  1. Cox, R. T. (1946). "Probability, Frequency and Reasonable Expectation". American Journal of Physics. 14 (1): 1–10. Bibcode:1946AmJPh..14....1C. doi:10.1119/1.1990764.
  2. Cox, R. T. (1961). The Algebra of Probable Inference. Baltimore, MD: Johns Hopkins University Press.
  3. 1 2 Stefan Arnborg and Gunnar Sjödin, On the foundations of Bayesianism, Preprint: Nada, KTH (1999) ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/06arnborg.ps%5B%5D ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/06arnborg.pdf%5B%5D
  4. 1 2 Stefan Arnborg and Gunnar Sjödin, A note on the foundations of Bayesianism, Preprint: Nada, KTH (2000a) ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/fobshle.ps%5B%5D ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/fobshle.pdf%5B%5D
  5. 1 2 Stefan Arnborg and Gunnar Sjödin, "Bayes rules in finite models," in European Conference on Artificial Intelligence, Berlin, (2000b) ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/fobc1.ps%5B%5D ftp://ftp.nada.kth.se/pub/documents/Theory/Stefan-Arnborg/fobc1.pdf%5B%5D
  6. 1 2 Joseph Y. Halpern, "A counterexample to theorems of Cox and Fine," Journal of AI research, 10, 6785 (1999) http://www.jair.org/media/536/live-536-2054-jair.ps.Z Archived 2015-11-25 at the Wayback Machine
  7. Joseph Y. Halpern, "Technical Addendum, Cox's theorem Revisited," Journal of AI research, 11, 429435 (1999) http://www.jair.org/media/644/live-644-1840-jair.ps.Z Archived 2015-11-25 at the Wayback Machine
  8. 1 2 3 4 Edwin Thompson Jaynes, Probability Theory: The Logic of Science, Cambridge University Press (2003). preprint version (1996) at "Archived copy". Archived from the original on 2016-01-19. Retrieved 2016-01-19.{{cite web}}: CS1 maint: archived copy as title (link); Chapters 1 to 3 of published version at http://bayes.wustl.edu/etj/prob/book.pdf
  9. Michael Hardy, "Scaled Boolean algebras", Advances in Applied Mathematics , August 2002, pages 243292 (or preprint); Hardy has said, "I assert there that I think Cox's assumptions are too strong, although I don't really say why. I do say what I would replace them with." (The quote is from a Wikipedia discussion page, not from the article.)
  10. Dupré, Maurice J. & Tipler, Frank J. (2009). "New Axioms for Rigorous Bayesian Probability", Bayesian Analysis, 4(3): 599-606.
  11. Niels Henrik Abel "Untersuchung der Functionen zweier unabhängig veränderlichen Gröszen x und y, wie f(x, y), welche die Eigenschaft haben, dasz f[z, f(x,y)] eine symmetrische Function von z, x und y ist.", Jour. Reine u. angew. Math. (Crelle's Jour.), 1, 1115, (1826).
  12. János Aczél, Lectures on Functional Equations and their Applications, Academic Press, New York, (1966).
  13. Van Horn, K. S. (2003). "Constructing a logic of plausible inference: A guide to Cox's theorem". International Journal of Approximate Reasoning. 34: 3–24. doi:10.1016/S0888-613X(03)00051-3.
  14. 1 2 Liu, Baoding (2015). Uncertainty Theory. Springer Uncertainty Research (4th ed. 2015 ed.). Berlin, Heidelberg: Springer Berlin Heidelberg : Imprint: Springer. pp. 459–460. ISBN   978-3-662-44354-5.

Further reading