Part of a series on statistics |
Probability theory |
---|
The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933. [1] These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases. [2]
There are several other (equivalent) approaches to formalising probability. Bayesians will often motivate the Kolmogorov axioms by invoking Cox's theorem or the Dutch book arguments instead. [3] [4]
The assumptions as to setting up the axioms can be summarised as follows: Let be a measure space with being the probability of some event , and . Then is a probability space, with sample space , event space and probability measure . [1]
The probability of an event is a non-negative real number:
where is the event space. It follows (when combined with the second axiom) that is always finite, in contrast with more general measure theory. Theories which assign negative probability relax the first axiom.
This is the assumption of unit measure: that the probability that at least one of the elementary events in the entire sample space will occur is 1.
This is the assumption of σ-additivity:
Some authors consider merely finitely additive probability spaces, in which case one just needs an algebra of sets, rather than a σ-algebra. [5] Quasiprobability distributions in general relax the third axiom.
From the Kolmogorov axioms, one can deduce other useful rules for studying probabilities. The proofs [6] [7] [8] of these rules are a very insightful procedure that illustrates the power of the third axiom, and its interaction with the prior two axioms. Four of the immediate corollaries and their proofs are shown below:
If A is a subset of, or equal to B, then the probability of A is less than, or equal to the probability of B.
In order to verify the monotonicity property, we set and , where and for . From the properties of the empty set (), it is easy to see that the sets are pairwise disjoint and . Hence, we obtain from the third axiom that
Since, by the first axiom, the left-hand side of this equation is a series of non-negative numbers, and since it converges to which is finite, we obtain both and .
In many cases, is not the only event with probability 0.
since ,
by applying the third axiom to the left-hand side (note is disjoint with itself), and so
by subtracting from each side of the equation.
Given and are mutually exclusive and that :
... (by axiom 3)
and, ... (by axiom 2)
It immediately follows from the monotonicity property that
Given the complement rule and axiom 1:
Another important property is:
This is called the addition law of probability, or the sum rule. That is, the probability that an event in AorB will happen is the sum of the probability of an event in A and the probability of an event in B, minus the probability of an event that is in both AandB. The proof of this is as follows:
Firstly,
So,
Also,
and eliminating from both equations gives us the desired result.
An extension of the addition law to any number of sets is the inclusion–exclusion principle.
Setting B to the complement Ac of A in the addition law gives
That is, the probability that any event will not happen (or the event's complement) is 1 minus the probability that it will.
Consider a single coin-toss, and assume that the coin will either land heads (H) or tails (T) (but not both). No assumption is made as to whether the coin is fair or as to whether or not any bias depends on how the coin is tossed. [9]
We may define:
Kolmogorov's axioms imply that:
The probability of neither heads nor tails, is 0.
The probability of either heads or tails, is 1.
The sum of the probability of heads and the probability of tails, is 1.
Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.
In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.
A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which
Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.
In mathematical analysis and in probability theory, a σ-algebra on a set X is a nonempty collection Σ of subsets of X closed under complement, countable unions, and countable intersections. The ordered pair is called a measurable space.
In probability theory, a probability space or a probability triple is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models the throwing of a die.
In the mathematical discipline of set theory, forcing is a technique for proving consistency and independence results. Intuitively, forcing can be thought of as a technique to expand the set theoretical universe to a larger universe by introducing a new "generic" object .
In mathematics, the symmetric difference of two sets, also known as the disjunctive union and set sum, is the set of elements which are in either of the sets, but not in their intersection. For example, the symmetric difference of the sets and is .
In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.
Possibility theory is a mathematical theory for dealing with certain types of uncertainty and is an alternative to probability theory. It uses measures of possibility and necessity between 0 and 1, ranging from impossible to possible and unnecessary to necessary, respectively. Professor Lotfi Zadeh first introduced possibility theory in 1978 as an extension of his theory of fuzzy sets and fuzzy logic. Didier Dubois and Henri Prade further contributed to its development. Earlier, in the 1950s, economist G. L. S. Shackle proposed the min/max algebra to describe degrees of potential surprise.
A Dynkin system, named after Eugene Dynkin, is a collection of subsets of another universal set satisfying a set of axioms weaker than those of 𝜎-algebra. Dynkin systems are sometimes referred to as 𝜆-systems or d-system. These set families have applications in measure theory and probability.
In measure theory, Carathéodory's extension theorem states that any pre-measure defined on a given ring of subsets R of a given set Ω can be extended to a measure on the σ-ring generated by R, and this extension is unique if the pre-measure is σ-finite. Consequently, any pre-measure on a ring containing all intervals of real numbers can be extended to the Borel algebra of the set of real numbers. This is an extremely powerful result of measure theory, and leads, for example, to the Lebesgue measure.
In mathematics, a π-system on a set is a collection of certain subsets of such that
In mathematics, a filter on a set is a family of subsets such that:
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B) or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): .
In mathematics, especially measure theory, a set function is a function whose domain is a family of subsets of some given set and that (usually) takes its values in the extended real number line which consists of the real numbers and
In mathematics, a submodular set function is a set function that, informally, describes the relationship between a set of inputs and an output, where adding more of one input has a decreasing additional benefit. The natural diminishing returns property which makes them suitable for many applications, including approximation algorithms, game theory and electrical networks. Recently, submodular functions have also found utility in several real world problems in machine learning and artificial intelligence, including automatic summarization, multi-document summarization, feature selection, active learning, sensor placement, image collection summarization and many other domains.
Filters in topology, a subfield of mathematics, can be used to study topological spaces and define all basic topological notions such as convergence, continuity, compactness, and more. Filters, which are special families of subsets of some given set, also provide a common framework for defining various types of limits of functions such as limits from the left/right, to infinity, to a point or a set, and many others. Special types of filters called ultrafilters have many useful technical properties and they may often be used in place of arbitrary filters.
In functional analysis, every C*-algebra is isomorphic to a subalgebra of the C*-algebra of bounded linear operators on some Hilbert space This article describes the spectral theory of closed normal subalgebras of . A subalgebra of is called normal if it is commutative and closed under the operation: for all , we have and that .
In the mathematical field of set theory, an ultrafilter on a set is a maximal filter on the set In other words, it is a collection of subsets of that satisfies the definition of a filter on and that is maximal with respect to inclusion, in the sense that there does not exist a strictly larger collection of subsets of that is also a filter. Equivalently, an ultrafilter on the set can also be characterized as a filter on with the property that for every subset of either or its complement belongs to the ultrafilter.
{{cite book}}
: CS1 maint: location missing publisher (link)This article includes a list of general references, but it lacks sufficient corresponding inline citations .(November 2010) |