Probability axioms

Last updated

The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933. [1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases. [2]

Contents

There are several other (equivalent) approaches to formalising probability. Bayesians will often motivate the Kolmogorov axioms by invoking Cox's theorem or the Dutch book arguments instead. [3] [4]

Kolmogorov axioms

The assumptions as to setting up the axioms can be summarised as follows: Let be a measure space with being the probability of some event , and . Then is a probability space, with sample space , event space and probability measure . [1]

First axiom

The probability of an event is a non-negative real number:

where is the event space. It follows (when combined with the second axiom) that is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.

Second axiom

This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1.

Third axiom

This is the assumption of σ-additivity:

Any countable sequence of disjoint sets (synonymous with mutually exclusive events) satisfies

Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra. [5] Quasiprobability distributions in general relax the third axiom.

Consequences

From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs [6] [7] [8] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the prior two axioms. Four of the immediate corollaries and their proofs are shown below:

Monotonicity

If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.

Proof of monotonicity [6]

In order to verify the monotonicity property, we set and , where and for . From the properties of the empty set (), it is easy to see that the sets are pairwise disjoint and . Hence, we obtain from the third axiom that

Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to which is finite, we obtain both and .

The probability of the empty set

In many cases, is not the only event with probability 0.

Proof of the probability of the empty set

since ,

by applying the third axiom to the left-hand side (note is disjoint with itself), and so

by subtracting from each side of the equation.

The complement rule

Proof of the complement rule

Given and are mutually exclusive and that :

... (by axiom 3)

and, ... (by axiom 2)

The numeric bound

It immediately follows from the monotonicity property that

Proof of the numeric bound

Given the complement rule and axiom 1:

Further consequences

Another important property is:

This is called the addition law of probability, or the sum rule. That is, the probability that an event in AorB will happen is the sum of the probability of an event in A and the probability of an event in B, minus the probability of an event that is in both AandB. The proof of this is as follows:

Firstly,

... (by Axiom 3)

So,

(by ).

Also,

and eliminating from both equations gives us the desired result.

An extension of the addition law to any number of sets is the inclusion–exclusion principle.

Setting B to the complement Ac of A in the addition law gives

That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.

Simple example: coin toss

Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair or as to whether or not any bias depends on how the coin is tossed. [9]

We may define:

Kolmogorov's axioms imply that:

The probability of neither heads nor tails, is 0.

The probability of either heads or tails, is 1.

The sum of the probability of heads and the probability of tails, is 1.

See also

Related Research Articles

<span class="mw-page-title-main">Measure (mathematics)</span> Generalization of mass, length, area and volume

In mathematics, the concept of a measure is a generalization and formalization of geometrical measures and other common notions, such as magnitude, mass, and probability of events. These seemingly distinct concepts have many similarities and can often be treated together in a single mathematical context. Measures are foundational in probability theory, integration theory, and can be generalized to assume negative values, as with electrical charge. Far-reaching generalizations of measure are widely used in quantum physics and physics in general.

<span class="mw-page-title-main">Probability distribution</span> Mathematical function for the probability a given outcome occurs in an experiment

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as its mathematical definition is not actually random nor a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.

<span class="mw-page-title-main">Independence (probability theory)</span> When the occurrence of one event does not affect the likelihood of another

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

In mathematical analysis and in probability theory, a σ-algebra on a set X is a nonempty collection Σ of subsets of X closed under complement, countable unions, and countable intersections. The ordered pair is called a measurable space.

<span class="mw-page-title-main">Probability space</span> Mathematical concept

In probability theory, a probability space or a probability triple is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models the throwing of a die.

<span class="mw-page-title-main">Symmetric difference</span> Elements in exactly one of two sets

In mathematics, the symmetric difference of two sets, also known as the disjunctive union and set sum, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets and is .

In mathematics, Fatou's lemma establishes an inequality relating the Lebesgue integral of the limit inferior of a sequence of functions to the limit inferior of integrals of these functions. The lemma is named after Pierre Fatou.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In mathematics, a filtration is an indexed family of subobjects of a given algebraic structure , with the index running over some totally ordered index set , subject to the condition that

Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. It uses measures of possibility and necessity between 0 and 1, ranging from impossible to possible and unnecessary to necessary, respectively. Professor Lotfi Zadeh first introduced possibility theory in 1978 as an extension of his theory of fuzzy sets and fuzzy logic. Didier Dubois and Henri Prade further contributed to its development. Earlier, in the 1950s, economist G. L. S. Shackle proposed the min/max algebra to describe degrees of potential surprise.

A Dynkin system, named after Eugene Dynkin, is a collection of subsets of another universal set satisfying a set of axioms weaker than those of 𝜎-algebra. Dynkin systems are sometimes referred to as 𝜆-systems or d-system. These set families have applications in measure theory and probability.

In mathematics, the Hahn decomposition theorem, named after the Austrian mathematician Hans Hahn, states that for any measurable space and any signed measure defined on the -algebra , there exist two -measurable sets, and , of such that:

  1. and .
  2. For every such that , one has , i.e., is a positive set for .
  3. For every such that , one has , i.e., is a negative set for .

In measure theory, Carathéodory's extension theorem states that any pre-measure defined on a given ring of subsets R of a given set Ω can be extended to a measure on the σ-ring generated by R, and this extension is unique if the pre-measure is σ-finite. Consequently, any pre-measure on a ring containing all intervals of real numbers can be extended to the Borel algebra of the set of real numbers. This is an extremely powerful result of measure theory, and leads, for example, to the Lebesgue measure.

In mathematics, a π-system on a set is a collection of certain subsets of such that

In probability theory, a standard probability space, also called Lebesgue–Rokhlin probability space or just Lebesgue space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940. Informally, it is a probability space consisting of an interval and/or a finite or countable number of atoms.

<span class="mw-page-title-main">Conditional probability</span> Probability of an event occurring, given that another event has already occurred

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B) or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): .

In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line which consists of the real numbers and

In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.

<span class="mw-page-title-main">Ultrafilter on a set</span> Maximal proper filter

In the mathematical field of set theory, an ultrafilter on a set is a maximal filter on the set In other words, it is a collection of subsets of that satisfies the definition of a filter on and that is maximal with respect to inclusion, in the sense that there does not exist a strictly larger collection of subsets of that is also a filter. Equivalently, an ultrafilter on the set can also be characterized as a filter on with the property that for every subset of either or its complement belongs to the ultrafilter.

References

  1. 1 2 Kolmogorov, Andrey (1950) [1933]. Foundations of the theory of probability. New York, US: Chelsea Publishing Company.
  2. Aldous, David. "What is the significance of the Kolmogorov axioms?". David Aldous. Retrieved November 19, 2019.
  3. Cox, R. T. (1946). "Probability, Frequency and Reasonable Expectation". American Journal of Physics. 14 (1): 1–10. Bibcode:1946AmJPh..14....1C. doi:10.1119/1.1990764.
  4. Cox, R. T. (1961). The Algebra of Probable Inference. Baltimore, MD: Johns Hopkins University Press.
  5. Hájek, Alan (August 28, 2019). "Interpretations of Probability". Stanford Encyclopedia of Philosophy. Retrieved November 17, 2019.
  6. 1 2 Ross, Sheldon M. (2014). A first course in probability (Ninth ed.). Upper Saddle River, New Jersey. pp. 27, 28. ISBN   978-0-321-79477-2. OCLC   827003384.{{cite book}}: CS1 maint: location missing publisher (link)
  7. Gerard, David (December 9, 2017). "Proofs from axioms" (PDF). Retrieved November 20, 2019.
  8. Jackson, Bill (2010). "Probability (Lecture Notes - Week 3)" (PDF). School of Mathematics, Queen Mary University of London. Retrieved November 20, 2019.
  9. Diaconis, Persi; Holmes, Susan; Montgomery, Richard (2007). "Dynamical Bias in the Coin Toss" (PDF). Siam Revue. 49 (211–235): 211–235. Bibcode:2007SIAMR..49..211D. doi:10.1137/S0036144504446436 . Retrieved 5 January 2024.

Further reading