Almost surely

Last updated May 30, 2024

In probability theory, an event is said to happen almost surely (sometimes abbreviated as a.s.) if it happens with probability 1 (with respect to the probability measure).^[1] In other words, the set of outcomes on which the event does not occur has probability 0, even though the set might not be empty. The concept is analogous to the concept of "almost everywhere" in measure theory. In probability experiments on a finite sample space with a non-zero probability for each outcome, there is no difference between almost surely and surely (since having a probability of 1 entails including all the sample points); however, this distinction becomes important when the sample space is an infinite set,^[2] because an infinite set can have non-empty subsets of probability 0.

Some examples of the use of this concept include the strong and uniform versions of the law of large numbers, the continuity of the paths of Brownian motion, and the infinite monkey theorem. The terms almost certainly (a.c.) and almost always (a.a.) are also used. Almost never describes the opposite of almost surely: an event that happens with probability zero happens almost never.^[3]

Formal definition

Let $(\Omega ,{\mathcal {F}},P)$ be a probability space. An event $E\in {\mathcal {F}}$ happens almost surely if $P(E)=1$ . Equivalently, $E$ happens almost surely if the probability of $E$ not occurring is zero: $P(E^{C})=0$ . More generally, any set $E\subseteq \Omega$ (not necessarily in ${\mathcal {F}}$ ) happens almost surely if $E^{C}$ is contained in a null set: a subset $N$ in ${\mathcal {F}}$ such that $P(N)=0$ .^[4] The notion of almost sureness depends on the probability measure $P$ . If it is necessary to emphasize this dependence, it is customary to say that the event $E$ occurs P-almost surely, or almost surely $\left(\!P\right)$ .

Illustrative examples

In general, an event can happen "almost surely", even if the probability space in question includes outcomes which do not belong to the event—as the following examples illustrate.

Throwing a dart

Imagine throwing a dart at a unit square (a square with an area of 1) so that the dart always hits an exact point in the square, in such a way that each point in the square is equally likely to be hit. Since the square has area 1, the probability that the dart will hit any particular subregion of the square is equal to the area of that subregion. For example, the probability that the dart will hit the right half of the square is 0.5, since the right half has area 0.5.

Next, consider the event that the dart hits exactly a point in the diagonals of the unit square. Since the area of the diagonals of the square is 0, the probability that the dart will land exactly on a diagonal is 0. That is, the dart will almost never land on a diagonal (equivalently, it will almost surely not land on a diagonal), even though the set of points on the diagonals is not empty, and a point on a diagonal is no less possible than any other point.

Tossing a coin repeatedly

Consider the case where a (possibly biased) coin is tossed, corresponding to the probability space $(\{H,T\},2^{\{H,T\}},P)$ , where the event $\{H\}$ occurs if a head is flipped, and $\{T\}$ if a tail is flipped. For this particular coin, it is assumed that the probability of flipping a head is $P(H)=p\in (0,1)$ , from which it follows that the complement event, that of flipping a tail, has probability $P(T)=1-p$ .

Now, suppose an experiment were conducted where the coin is tossed repeatedly, with outcomes $\omega _{1},\omega _{2},\ldots$ and the assumption that each flip's outcome is independent of all the others (i.e., they are independent and identically distributed; i.i.d). Define the sequence of random variables on the coin toss space, $(X_{i})_{i\in \mathbb {N} }$ where $X_{i}(\omega )=\omega _{i}$ . i.e. each $X_{i}$ records the outcome of the $i$ th flip.

In this case, any infinite sequence of heads and tails is a possible outcome of the experiment. However, any particular infinite sequence of heads and tails has probability 0 of being the exact outcome of the (infinite) experiment. This is because the i.i.d. assumption implies that the probability of flipping all heads over $n$ flips is simply $P(X_{i}=H,\ i=1,2,\dots ,n)=\left(P(X_{1}=H)\right)^{n}=p^{n}$ . Letting $n\rightarrow \infty$ yields 0, since $p\in (0,1)$ by assumption. The result is the same no matter how much we bias the coin towards heads, so long as we constrain $p$ to be strictly between 0 and 1. In fact, the same result even holds in non-standard analysis—where infinitesimal probabilities are allowed.^[5]

Moreover, the event "the sequence of tosses contains at least one $T$ " will also happen almost surely (i.e., with probability 1). But if instead of an infinite number of flips, flipping stops after some finite time, say 1,000,000 flips, then the probability of getting an all-heads sequence, $p^{1,000,000}$ , would no longer be 0, while the probability of getting at least one tails, $1-p^{1,000,000}$ , would no longer be 1 (i.e., the event is no longer almost sure).

Asymptotically almost surely

In asymptotic analysis, a property is said to hold asymptotically almost surely (a.a.s.) if over a sequence of sets, the probability converges to 1. This is equivalent to convergence in probability. For instance, in number theory, a large number is asymptotically almost surely composite, by the prime number theorem; and in random graph theory, the statement " $G(n,p_{n})$ is connected" (where $G(n,p)$ denotes the graphs on $n$ vertices with edge probability $p$ ) is true a.a.s. when, for some $\varepsilon >0$

p_{n}>{\frac {(1+\varepsilon )\ln n}{n}}.

^[6]

In number theory, this is referred to as "almost all", as in "almost all numbers are composite". Similarly, in graph theory, this is sometimes referred to as "almost surely".^[7]

Notes

↑ Weisstein, Eric W. "Almost Surely". mathworld.wolfram.com. Retrieved 2019-11-16.
↑ "Almost surely - Math Central". mathcentral.uregina.ca. Retrieved 2019-11-16.
↑ Grädel, Erich; Kolaitis, Phokion G.; Libkin, Leonid; Marx, Maarten; Spencer, Joel; Vardi, Moshe Y.; Venema, Yde; Weinstein, Scott (2007). Finite Model Theory and Its Applications . Springer. p. 232. ISBN 978-3-540-00428-8.
↑ Jacod, Jean; Protter (2004). Probability Essentials . Springer. p. 37. ISBN 978-3-540-438717.
↑ Williamson, Timothy (2007-07-01). "How probable is an infinite sequence of heads?". Analysis. 67 (3): 173–180. doi:10.1093/analys/67.3.173. ISSN 0003-2638.
↑ Friedgut, Ehud; Rödl, Vojtech; Rucinski, Andrzej; Tetali, Prasad (January 2006). "A Sharp Threshold for Random Graphs with a Monochromatic Triangle in Every Edge Coloring". Memoirs of the American Mathematical Society. 179 (845). AMS Bookstore: 3–4. doi:10.1090/memo/0845. ISSN 0065-9266. S2CID 9143933.
↑ Spencer, Joel H. (2001). "0. Two Starting Examples". The Strange Logic of Random Graphs. Algorithms and Combinatorics. Vol. 22. Springer. p. 4. ISBN 978-3540416548.

Related Research Articles

Probability theory or probability calculus is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of the sample space is called an event.

In probability theory and statistics, a probability distribution is the mathematical function that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

In probability theory, a probability space or a probability triple $is a mathematical construct that provides a formal model of a random process or "experiment". For example, one can define a probability space which models the throwing of a die.$

In probability theory, there exist several different notions of convergence of sequences of random variables, including convergence in probability, convergence in distribution, and almost sure convergence. The different notions of convergence capture different properties about the sequence, with some notions of convergence being stronger than others. For example, convergence in distribution tells us about the limit distribution of a sequence of random variables. This is a weaker notion than convergence in probability, which tells us about the value a random variable will take, rather than just the distribution.

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesX_i are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable X_i in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

In probability theory, the law of large numbers (LLN) is a mathematical theorem that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In probability theory, Kolmogorov's zero–one law, named in honor of Andrey Nikolaevich Kolmogorov, specifies that a certain type of event, namely a tail event of independent σ-algebras, will either almost surely happen or almost surely not happen; that is, the probability of such an event occurring is zero or one.

In information theory, the asymptotic equipartition property (AEP) is a general property of the output samples of a stochastic source. It is fundamental to the concept of typical set used in theories of data compression.

In mathematics, random graph is the general term to refer to probability distributions over graphs. Random graphs may be described simply by a probability distribution, or by a random process which generates them. The theory of random graphs lies at the intersection between graph theory and probability theory. From a mathematical perspective, random graphs are used to answer questions about the properties of typical graphs. Its practical applications are found in all areas in which complex networks need to be modeled – many random graph models are thus known, mirroring the diverse types of complex networks encountered in different areas. In a mathematical context, random graph refers almost exclusively to the Erdős–Rényi random graph model. In other contexts, any graph model may be referred to as a random graph.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory, random element is a generalization of the concept of random variable to more complicated spaces than the simple real line. The concept was introduced by Maurice Fréchet who commented that the “development of probability theory and expansion of area of its applications have led to necessity to pass from schemes where (random) outcomes of experiments can be described by number or a finite set of numbers, to schemes where outcomes of experiments represent, for example, vectors, functions, processes, fields, series, transformations, and also sets or collections of sets.”

In mathematics, a random compact set is essentially a compact set-valued random variable. Random compact sets are useful in the study of attractors for random dynamical systems.

The Hewitt–Savage zero–one law is a theorem in probability theory, similar to Kolmogorov's zero–one law and the Borel–Cantelli lemma, that specifies that a certain type of event will either almost surely happen or almost surely not happen. It is sometimes known as the Savage-Hewitt law for symmetric events. It is named after Edwin Hewitt and Leonard Jimmie Savage.

In probability theory and statistics, a sequence of independent Bernoulli trials with probability 1/2 of success on each trial is metaphorically called a fair coin. One for which the probability is not 1/2 is called a biased or unfair coin. In theoretical studies, the assumption that a coin is fair is often made by referring to an ideal coin.

In mathematics, uniform integrability is an important concept in real analysis, functional analysis and measure theory, and plays a vital role in the theory of martingales.

In probability theory, a standard probability space, also called Lebesgue–Rokhlin probability space or just Lebesgue space is a probability space satisfying certain assumptions introduced by Vladimir Rokhlin in 1940. Informally, it is a probability space consisting of an interval and/or a finite or countable number of atoms.

In mathematics – specifically, in the theory of stochastic processes – Doob's martingale convergence theorems are a collection of results on the limits of supermartingales, named after the American mathematician Joseph L. Doob. Informally, the martingale convergence theorem typically refers to the result that any supermartingale satisfying a certain boundedness condition must converge. One may think of supermartingales as the random variable analogues of non-increasing sequences; from this perspective, the martingale convergence theorem is a random variable analogue of the monotone convergence theorem, which states that any bounded monotone sequence converges. There are symmetric results for submartingales, which are analogous to non-decreasing sequences.

In the mathematical theory of probability, the Ionescu-Tulcea theorem, sometimes called the Ionesco Tulcea extension theorem, deals with the existence of probability measures for probabilistic events consisting of a countably infinite number of individual probabilistic events. In particular, the individual events may be independent or dependent with respect to each other. Thus, the statement goes beyond the mere existence of countable product measures. The theorem was proved by Cassius Ionescu-Tulcea in 1949.

In probability theory, the van den Berg–Kesten (BK) inequality or van den Berg–Kesten–Reimer (BKR) inequality states that the probability for two random events to both happen, and at the same time one can find "disjoint certificates" to show that they both happen, is at most the product of their individual probabilities. The special case for two monotone events was first proved by van den Berg and Kesten in 1985, who also conjectured that the inequality holds in general, not requiring monotonicity. Reimer later proved this conjecture. The inequality is applied to probability spaces with a product structure, such as in percolation problems.

References

Rogers, L. C. G.; Williams, David (2000). Diffusions, Markov Processes, and Martingales. Vol. 1: Foundations. Cambridge University Press. ISBN 978-0521775946.
Williams, David (1991). Probability with Martingales. Cambridge Mathematical Textbooks. Cambridge University Press. ISBN 978-0521406055.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Weisstein, Eric W. "Almost Surely". mathworld.wolfram.com. Retrieved 2019-11-16.

[2] "Almost surely - Math Central". mathcentral.uregina.ca. Retrieved 2019-11-16.

[Gradel-3] Grädel, Erich; Kolaitis, Phokion G.; Libkin, Leonid; Marx, Maarten; Spencer, Joel; Vardi, Moshe Y.; Venema, Yde; Weinstein, Scott (2007). Finite Model Theory and Its Applications . Springer. p. 232. ISBN 978-3-540-00428-8.

[Jacod-4] Jacod, Jean; Protter (2004). Probability Essentials . Springer. p. 37. ISBN 978-3-540-438717.

[5] Williamson, Timothy (2007-07-01). "How probable is an infinite sequence of heads?". Analysis. 67 (3): 173–180. doi:10.1093/analys/67.3.173. ISSN 0003-2638.

[RandGraph-6] Friedgut, Ehud; Rödl, Vojtech; Rucinski, Andrzej; Tetali, Prasad (January 2006). "A Sharp Threshold for Random Graphs with a Monochromatic Triangle in Every Edge Coloring". Memoirs of the American Mathematical Society. 179 (845). AMS Bookstore: 3–4. doi:10.1090/memo/0845. ISSN 0065-9266. S2CID 9143933.

[Springer-7] Spencer, Joel H. (2001). "0. Two Starting Examples". The Strange Logic of Random Graphs. Algorithms and Combinatorics. Vol. 22. Springer. p. 4. ISBN 978-3540416548.

[1]

[2]

[3]

[4]

[5]

[6]

[7]