De Finetti's theorem

Last updated

In probability theory, de Finetti's theorem states that exchangeable observations are conditionally independent relative to some latent variable. An epistemic probability distribution could then be assigned to this variable. It is named in honor of Bruno de Finetti.

Contents

For the special case of an exchangeable sequence of Bernoulli random variables it states that such a sequence is a "mixture" of sequences of independent and identically distributed (i.i.d.) Bernoulli random variables.

A sequence of random variables is called exchangeable if the joint distribution of the sequence is unchanged by any permutation of the indices. In general, while the variables of the exchangeable sequence are not themselves independent, only exchangeable, there is an underlying family of i.i.d. random variables. That is, there are underlying, generally unobservable, quantities that are i.i.d. – exchangeable sequences are mixtures of i.i.d. sequences.

Background

A Bayesian statistician often seeks the conditional probability distribution of a random quantity given the data. The concept of exchangeability was introduced by de Finetti. De Finetti's theorem explains a mathematical relationship between independence and exchangeability. [1]

An infinite sequence

of random variables is said to be exchangeable if for any natural number n and any finite sequence i1, ..., in and any permutation of the sequence π:{i1, ..., in } → {i1, ..., in },

both have the same joint probability distribution.

If an identically distributed sequence is independent, then the sequence is exchangeable; however, the converse is false—there exist exchangeable random variables that are not statistically independent, for example the Pólya urn model.

Statement of the theorem

A random variable X has a Bernoulli distribution if Pr(X = 1) = p and Pr(X = 0) = 1  p for some p  (0, 1).

De Finetti's theorem states that the probability distribution of any infinite exchangeable sequence of Bernoulli random variables is a "mixture" of the probability distributions of independent and identically distributed sequences of Bernoulli random variables. "Mixture", in this sense, means a weighted average, but this need not mean a finite or countably infinite (i.e., discrete) weighted average: it can be an integral over a measure rather than a sum.

More precisely, suppose X1, X2, X3, ... is an infinite exchangeable sequence of Bernoulli-distributed random variables. Then there is some probability measure m on the interval [0, 1] and some random variable Y such that

Another way of stating the theorem

Suppose is an infinite exchangeable sequence of Bernoulli random variables. Then are conditionally independent and identically distributed given the exchangeable sigma-algebra (i.e., the sigma-algebra consisting of events that are measurable with respect to and invariant under finite permutations of the indices).

Example

Here is a concrete example. We construct a sequence

of random variables, by "mixing" two i.i.d. sequences as follows.

We assume p = 2/3 with probability 1/2 and p = 9/10 with probability 1/2. Given the event p = 2/3, the conditional distribution of the sequence is that the Xi are independent and identically distributed and X1 = 1 with probability 2/3 and X1 = 0 with probability 1  2/3. Given the event p = 9/10, the conditional distribution of the sequence is that the Xi are independent and identically distributed and X1 = 1 with probability 9/10 and X1 = 0 with probability 1  9/10.

This can be interpreted as follows: Make two biased coins, one showing "heads" with 2/3 probability and one showing "heads" with 9/10 probability. Flip a fair coin once to decide which biased coin to use for all flips that are recorded. Here "heads" at flip i means Xi=1.

The independence asserted here is conditional independence, i.e. the Bernoulli random variables in the sequence are conditionally independent given the event that p = 2/3, and are conditionally independent given the event that p = 9/10. But they are not unconditionally independent; they are positively correlated.

In view of the strong law of large numbers, we can say that

Rather than concentrating probability 1/2 at each of two points between 0 and 1, the "mixing distribution" can be any probability distribution supported on the interval from 0 to 1; which one it is depends on the joint distribution of the infinite sequence of Bernoulli random variables.

The definition of exchangeability, and the statement of the theorem, also makes sense for finite length sequences

but the theorem is not generally true in that case. It is true if the sequence can be extended to an exchangeable sequence that is infinitely long. The simplest example of an exchangeable sequence of Bernoulli random variables that cannot be so extended is the one in which X1 = 1  X2 and X1 is either 0 or 1, each with probability 1/2. This sequence is exchangeable, but cannot be extended to an exchangeable sequence of length 3, let alone an infinitely long one.

As a categorical limit

De Finetti's theorem can be expressed as a categorical limit in the category of Markov kernels. [2] [3] [4]

Let be a standard Borel space, and consider the space of sequences on , the countable product (equipped with the product sigma-algebra).

Given a finite permutation , denote again by the permutation action on , as well as the Markov kernel induced by it. In terms of category theory, we have a diagram with a single object, , and a countable number of arrows, one for each permutation.

Recall now that a probability measure is equivalently a Markov kernel from the one-point measurable space. A probability measure on is exchangeable if and only if, as Markov kernels, for every permutation . More generally, given any standard Borel space , one can call a Markov kernel exchangeable if for every , i.e. if the following diagram commutes,

Definetti-cone.svg

giving a cone.

De Finetti's theorem can be now stated as the fact that the space of probability measures over (Giry monad) forms a universal (or limit) cone. [3] More in detail, consider the Markov kernel constructed as follows, using the Kolmogorov extension theorem:

for all measurable subsets of . Note that we can interpret this kernel as taking a probability measure as input and returning an iid sequence on distributed according to . Since iid sequences are exchangeable, is an exchangeable kernel in the sense defined above. The kernel doesn't just form a cone, but a limit cone: given any exchangeable kernel , there exists a unique kernel such that , i.e. making the following diagram commute:

Definetti-limit.svg

In particular, for any exchangeable probability measure on , there exists a unique probability measure on (i.e. a probability measure over probability measures) such that , i.e. such that for all measurable subsets of ,

In other words, is a mixture of iid measures on (the ones formed by in the integral above).

Extensions

Versions of de Finetti's theorem for finite exchangeable sequences, [5] [6] and for Markov exchangeable sequences [7] have been proved by Diaconis and Freedman and by Kerns and Szekely. Two notions of partial exchangeability of arrays, known as separate and joint exchangeability lead to extensions of de Finetti's theorem for arrays by Aldous and Hoover. [8]

The computable de Finetti theorem shows that if an exchangeable sequence of real random variables is given by a computer program, then a program which samples from the mixing measure can be automatically recovered. [9]

In the setting of free probability, there is a noncommutative extension of de Finetti's theorem which characterizes noncommutative sequences invariant under quantum permutations. [10]

Extensions of de Finetti's theorem to quantum states have been found to be useful in quantum information, [11] [12] [13] in topics like quantum key distribution [14] and entanglement detection. [15] A multivariate extension of de Finetti’s theorem can be used to derive Bose–Einstein statistics from the statistics of classical (i.e. independent) particles. [16]

See also

Related Research Articles

<span class="mw-page-title-main">Probability theory</span> Branch of mathematics concerning probability

Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Bernoulli process</span> Random process of binary (boolean) random variables

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesXi are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable Xi in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

<span class="mw-page-title-main">Martingale (probability theory)</span> Model in probability theory

In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is n−1/2, where n is the sample size, and the constant is estimated in terms of the third absolute normalized moment.

Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) holding times that have finite mean. A renewal-reward process additionally has a random sequence of rewards incurred at each holding time, which are IID but need not be independent of the holding times.

In mathematics and statistics, an asymptotic distribution is a probability distribution that is in a sense the "limiting" distribution of a sequence of distributions. One of the main uses of the idea of an asymptotic distribution is in providing approximations to the cumulative distribution functions of statistical estimators.

In probability theory, an empirical process is a stochastic process that characterizes the deviation of the empirical distribution function from its expectation. In mean field theory, limit theorems are considered and generalise the central limit theorem for empirical measures. Applications of the theory of empirical processes arise in non-parametric statistics.

In probability theory, the theory of large deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced to Laplace, the formalization started with insurance mathematics, namely ruin theory with Cramér and Lundberg. A unified formalization of large deviation theory was developed in 1966, in a paper by Varadhan. Large deviations theory formalizes the heuristic ideas of concentration of measures and widely generalizes the notion of convergence of probability measures.

The Hewitt–Savage zero–one law is a theorem in probability theory, similar to Kolmogorov's zero–one law and the Borel–Cantelli lemma, that specifies that a certain type of event will either almost surely happen or almost surely not happen. It is sometimes known as the Savage-Hewitt law for symmetric events. It is named after Edwin Hewitt and Leonard Jimmie Savage.

In statistics, an exchangeable sequence of random variables is a sequence X1X2X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. In other words, the joint distribution is invariant to finite permutation. Thus, for example the sequences

In probability theory, regular conditional probability is a concept that formalizes the notion of conditioning on the outcome of a random variable. The resulting conditional probability distribution is a parametrized family of probability measures called a Markov kernel.

In probability theory, a Markov kernel is a map that in the general theory of Markov processes plays the role that the transition matrix does in the theory of Markov processes with a finite state space.

<span class="mw-page-title-main">Independent and identically distributed random variables</span> Important notion in probability and statistics

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d., iid, or IID. IID was first defined in statistics and finds application in different fields such as data mining and signal processing.

In mathematics and information theory, Sanov's theorem gives a bound on the probability of observing an atypical sequence of samples from a given probability distribution. In the language of large deviations theory, Sanov's theorem identifies the rate function for large deviations of the empirical measure of a sequence of i.i.d. random variables.

Cramér's theorem is a fundamental result in the theory of large deviations, a subdiscipline of probability theory. It determines the rate function of a series of iid random variables. A weak version of this result was first shown by Harald Cramér in 1938.

In mathematics, the category of Markov kernels, often denoted Stoch, is the category whose objects are measurable spaces and whose morphisms are Markov kernels. It is analogous to the category of sets and functions, but where the arrows can be interpreted as being stochastic.

In mathematics, the Giry monad is a construction that assigns to a measurable space a space of probability measures over it, equipped with a canonical sigma-algebra. It is one of the main examples of a probability monad.

In mathematics, especially in probability theory and ergodic theory, the invariant sigma-algebra is a sigma-algebra formed by sets which are invariant under a group action or dynamical system. It can be interpreted as of being "indifferent" to the dynamics.

References

  1. See the Oxford lecture notes of Steffen Lauritzen http://www.stats.ox.ac.uk/~steffen/teaching/grad/definetti.pdf
  2. Jacobs, Bart; Staton, Sam (2020). "De Finetti's theorem as a categorical limit". CMCS '20: Proceedings of the 15th IFIP WG 1.3 International Workshop of Coalgebraic Methods in Computer Science. arXiv: 2003.01964 .
  3. 1 2 Fritz, Tobias; Gonda, Tomáš; Perrone, Paolo (2021). "De Finetti's theorem in categorical probability". Journal of Stochastic Analysis. 2 (4). arXiv: 2105.02639 . doi:10.31390/josa.2.4.06.
  4. Moss, Sean; Perrone, Paolo (2022). "Probability monads with submonads of deterministic states". LICS '22: Proceedings of the 37th Annual ACM/IEEE Symposium on Logic in Computer Science. arXiv: 2204.07003 . doi:10.1145/3531130.3533355.
  5. Diaconis, P.; Freedman, D. (1980). "Finite exchangeable sequences". Annals of Probability. 8 (4): 745–764. doi: 10.1214/aop/1176994663 . MR   0577313. Zbl   0434.60034.
  6. Szekely, G. J.; Kerns, J. G. (2006). "De Finetti's theorem for abstract finite exchangeable sequences". Journal of Theoretical Probability. 19 (3): 589–608. doi:10.1007/s10959-006-0028-z. S2CID   119981020.
  7. Diaconis, P.; Freedman, D. (1980). "De Finetti's theorem for Markov chains". Annals of Probability. 8 (1): 115–130. doi: 10.1214/aop/1176994828 . MR   0556418. Zbl   0426.60064.
  8. Persi Diaconis and Svante Janson (2008) "Graph Limits and Exchangeable Random Graphs",Rendiconti di Matematica, Ser. VII 28(1), 33–61.
  9. Cameron Freer and Daniel Roy (2009) "Computable exchangeable sequences have computable de Finetti measures", Proceedings of the 5th Conference on Computability in Europe: Mathematical Theory and Computational Practice, Lecture Notes in Computer Science, Vol. 5635, pp. 218231.
  10. Koestler, Claus; Speicher, Roland (2009). "A noncommutative de Finetti theorem: Invariance under quantum permutations is equivalent to freeness with amalgamation". Commun. Math. Phys. 291 (2): 473–490. arXiv: 0807.0677 . Bibcode:2009CMaPh.291..473K. doi:10.1007/s00220-009-0802-8. S2CID   115155584.
  11. Caves, Carlton M.; Fuchs, Christopher A.; Schack, Ruediger (2002-08-20). "Unknown quantum states: The quantum de Finetti representation". Journal of Mathematical Physics. 43 (9): 4537–4559. arXiv: quant-ph/0104088 . Bibcode:2002JMP....43.4537C. doi:10.1063/1.1494475. ISSN   0022-2488. S2CID   17416262.
  12. J. Baez (2007). "This Week's Finds in Mathematical Physics (Week 251)" . Retrieved 29 April 2012.
  13. Brandao, Fernando G.S.L.; Harrow, Aram W. (2013-01-01). "Quantum de finetti theorems under local measurements with applications". Proceedings of the forty-fifth annual ACM symposium on Theory of Computing. STOC '13. New York, NY, USA: ACM. pp. 861–870. arXiv: 1210.6367 . doi:10.1145/2488608.2488718. ISBN   9781450320290. S2CID   1772280.
  14. Renner, Renato (2005-12-30). "Security of Quantum Key Distribution". arXiv: quant-ph/0512258 .
  15. Doherty, Andrew C.; Parrilo, Pablo A.; Spedalieri, Federico M. (2005-01-01). "Detecting multipartite entanglement". Physical Review A. 71 (3): 032333. arXiv: quant-ph/0407143 . Bibcode:2005PhRvA..71c2333D. doi:10.1103/PhysRevA.71.032333. S2CID   44241800.
  16. Bach, A.; Blank, H.; Francke, H. (1985). "Bose-Einstein statistics derived from the statistics of classical particles". Lettere al Nuovo Cimento. 43 (4): 195–198. doi:10.1007/BF02746978. S2CID   121413539.