Probability axioms

Last updated April 15, 2024

The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933.^[1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.^[2]

Kolmogorov axioms

The assumptions as to setting up the axioms can be summarised as follows: Let $(\Omega ,F,P)$ be a measure space with $P(E)$ being the probability of some event $E$ , and $P(\Omega )=1$ . Then $(\Omega ,F,P)$ is a probability space, with sample space $\Omega$ , event space $F$ and probability measure $P$ .^[1]

First axiom

The probability of an event is a non-negative real number:

P(E)\in \mathbb {R} ,P(E)\geq 0\qquad \forall E\in F

where $F$ is the event space. It follows (when combined with the second axiom) that $P(E)$ is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1.

P(\Omega )=1

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events)

E_{1},E_{2},\ldots

satisfies

P\left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }P(E_{i}).

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra.^[5] Quasiprobability distributions in general relax the third axiom.

Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs^[6]^[7]^[8] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the prior two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

\quad {\text{if}}\quad A\subseteq B\quad {\text{then}}\quad P(A)\leq P(B).

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity^[6]

In order to verify the monotonicity property, we set $E_{1}=A$ and $E_{2}=B\setminus A$ , where $A\subseteq B$ and $E_{i}=\varnothing$ for $i\geq 3$ . From the properties of the empty set ( $\varnothing$ ), it is easy to see that the sets $E_{i}$ are pairwise disjoint and $E_{1}\cup E_{2}\cup \cdots =B$ . Hence, we obtain from the third axiom that

P(A)+P(B\setminus A)+\sum _{i=3}^{\infty }P(E_{i})=P(B).

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to $P(B)$ which is finite, we obtain both $P(A)\leq P(B)$ and $P(\varnothing )=0$ .

The probability of the empty set

P(\varnothing )=0.

In many cases, $\varnothing$ is not the only event with probability 0.

Proof of the probability of the empty set

$P(\varnothing \cup \varnothing )=P(\varnothing )$ since $\varnothing \cup \varnothing =\varnothing$ ,

$P(\varnothing )+P(\varnothing )=P(\varnothing )$ by applying the third axiom to the left-hand side (note $\varnothing$ is disjoint with itself), and so

$P(\varnothing )=0$ by subtracting $P(\varnothing )$ from each side of the equation.

The complement rule

$P\left(A^{c}\right)=P(\Omega -A)=1-P(A)$

Proof of the complement rule

Given $A$ and $A^{c}$ are mutually exclusive and that $A\cup A^{c}=\Omega$ :

$P(A\cup A^{c})=P(A)+P(A^{c})$ ... (by axiom 3)

and, $P(A\cup A^{c})=P(\Omega )=1$ ... (by axiom 2)

$\Rightarrow P(A)+P(A^{c})=1$

$\therefore P(A^{c})=1-P(A)$

The numeric bound

It immediately follows from the monotonicity property that

0\leq P(E)\leq 1\qquad \forall E\in F.

Proof of the numeric bound

Given the complement rule $P(E^{c})=1-P(E)$ and axiom 1 $P(E^{c})\geq 0$ :

$1-P(E)\geq 0$

$\Rightarrow 1\geq P(E)$

$\therefore 0\leq P(E)\leq 1$

Further consequences

Another important property is:

P(A\cup B)=P(A)+P(B)-P(A\cap B).

This is called the addition law of probability, or the sum rule. That is, the probability that an event in AorB will happen is the sum of the probability of an event in A and the probability of an event in B, minus the probability of an event that is in both AandB. The proof of this is as follows:

Firstly,

P(A\cup B)=P(A)+P(B\setminus A)

... (by Axiom 3)

So,

P(A\cup B)=P(A)+P(B\setminus (A\cap B))

(by

B\setminus A=B\setminus (A\cap B)

).

Also,

P(B)=P(B\setminus (A\cap B))+P(A\cap B)

and eliminating $P(B\setminus (A\cap B))$ from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement A^c of A in the addition law gives

P\left(A^{c}\right)=P(\Omega \setminus A)=1-P(A)

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair or as to whether or not any bias depends on how the coin is tossed.^[9]

We may define:

\Omega =\{H,T\}

F=\{\varnothing ,\{H\},\{T\},\{H,T\}\}

Kolmogorov's axioms imply that:

P(\varnothing )=0

The probability of neither heads nor tails, is 0.

P(\{H,T\}^{c})=0

The probability of either heads or tails, is 1.

P(\{H\})+P(\{T\})=1

The sum of the probability of heads and the probability of tails, is 1.

Related Research Articles

In mathematics, the concept of a measure is a generalization and formalization of geometrical measures and other common notions, such as magnitude, mass, and probability of events. These seemingly distinct concepts have many similarities and can often be treated together in a single mathematical context. Measures are foundational in probability theory, integration theory, and can be generalized to assume negative values, as with electrical charge. Far-reaching generalizations of measure are widely used in quantum physics and physics in general.

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

In mathematical analysis and in probability theory, a σ-algebra on a set X is a nonempty collection Σ of subsets of X closed under complement, countable unions, and countable intersections. The ordered pair $is called a measurable space.$

In probability theory, a probability space or a probability triple $is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models the throwing of a die.$

<span class="mw-page-title-main">Symmetric difference</span> Elements in exactly one of two sets

In mathematics, the symmetric difference of two sets, also known as the disjunctive union and set sum, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets $and is .$

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In mathematics, a filtration $is an indexed family of subobjects of a given algebraic structure, with the index running over some totally ordered index set, subject to the condition that$

Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. It uses measures of possibility and necessity between 0 and 1, ranging from impossible to possible and unnecessary to necessary, respectively. Professor Lotfi Zadeh first introduced possibility theory in 1978 as an extension of his theory of fuzzy sets and fuzzy logic. Didier Dubois and Henri Prade further contributed to its development. Earlier, in the 1950s, economist G. L. S. Shackle proposed the min/max algebra to describe degrees of potential surprise.

A Dynkin system, named after Eugene Dynkin, is a collection of subsets of another universal set $satisfying a set of axioms weaker than those of 𝜎-algebra. Dynkin systems are sometimes referred to as 𝜆-systems or d-system . These set families have applications in measure theory and probability.$

In mathematics, the Hahn decomposition theorem, named after the Austrian mathematician Hans Hahn, states that for any measurable space $and any signed measure defined on the -algebra, there exist two -measurable sets, and, of such that:$

$and .$
For every $such that, one has, i.e., is a positive set for .$
For every $such that, one has, i.e., is a negative set for .$

In measure theory, Carathéodory's extension theorem states that any pre-measure defined on a given ring of subsets R of a given set Ω can be extended to a measure on the σ-ring generated by R, and this extension is unique if the pre-measure is σ-finite. Consequently, any pre-measure on a ring containing all intervals of real numbers can be extended to the Borel algebra of the set of real numbers. This is an extremely powerful result of measure theory, and leads, for example, to the Lebesgue measure.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

In probability theory, a standard probability space, also called Lebesgue–Rokhlin probability space or just Lebesgue space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940. Informally, it is a probability space consisting of an interval and/or a finite or countable number of atoms.

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$ ", or "the probability of $A$ under the condition $B$ ", is usually written as $P(A | B)$ or occasionally $P B (A)$ . This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): $.$

In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line $which consists of the real numbers and$

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

In the mathematical field of set theory, an ultrafilter on a set $is a maximal filter on the set In other words, it is a collection of subsets of that satisfies the definition of a filter on and that is maximal with respect to inclusion, in the sense that there does not exist a strictly larger collection of subsets of that is also a filter. Equivalently, an ultrafilter on the set can also be characterized as a filter on with the property that for every subset of either or its complement belongs to the ultrafilter.$

References

1 2 Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, US: Chelsea Publishing Company.
↑ Aldous, David. "What is the significance of the Kolmogorov axioms?". David Aldous. Retrieved November 19, 2019.
↑ Cox, R. T. (1946). "Probability, Frequency and Reasonable Expectation". American Journal of Physics. 14 (1): 1–10. Bibcode:1946AmJPh..14....1C. doi:10.1119/1.1990764.
↑ Cox, R. T. (1961). The Algebra of Probable Inference. Baltimore, MD: Johns Hopkins University Press.
↑ Hájek, Alan (August 28, 2019). "Interpretations of Probability". Stanford Encyclopedia of Philosophy. Retrieved November 17, 2019.
1 2 Ross, Sheldon M. (2014). A first course in probability (Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN 978-0-321-79477-2. OCLC 827003384.{{cite book}}: CS1 maint: location missing publisher (link)
↑ Gerard, David (December 9, 2017). "Proofs from axioms" (PDF). Retrieved November 20, 2019.
↑ Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)" (PDF). School of Mathematics, Queen Mary University of London. Retrieved November 20, 2019.
↑ Diaconis, Persi; Holmes, Susan; Montgomery, Richard (2007). "Dynamical Bias in the Coin Toss" (PDF). Siam Revue. 49 (211–235): 211–235. Bibcode:2007SIAMR..49..211D. doi:10.1137/S0036144504446436 . Retrieved 5 January 2024.

Probability axioms

Contents

Kolmogorov axioms

First axiom

Second axiom

Third axiom

Consequences

Monotonicity

Proof of monotonicity^[6]

The probability of the empty set

Proof of the probability of the empty set

The complement rule

Proof of the complement rule

The numeric bound

Proof of the numeric bound

Further consequences

Simple example: coin toss

See also

Related Research Articles

References

Further reading

Probability axioms

Contents

Kolmogorov axioms

First axiom

Second axiom

Third axiom

Consequences

Monotonicity

Proof of monotonicity [6]

The probability of the empty set

Proof of the probability of the empty set

The complement rule

Proof of the complement rule

The numeric bound

Proof of the numeric bound

Further consequences

Simple example: coin toss

See also

Related Research Articles

References

Further reading

Proof of monotonicity^[6]