Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions (rather than densities or mass functions).
This bounding approach permits analysts to make calculations without requiring overly precise assumptions about parameter values, dependence among variables, or even distribution shape. Probability bounds analysis is essentially a combination of the methods of standard interval analysis and classical probability theory. Probability bounds analysis gives the same answer as interval analysis does when only range information is available. It also gives the same answers as Monte Carlo simulation does when information is abundant enough to precisely specify input distributions and their dependencies. Thus, it is a generalization of both interval analysis and probability theory.
The diverse methods comprising probability bounds analysis provide algorithms to evaluate mathematical expressions when there is uncertainty about the input values, their dependencies, or even the form of mathematical expression itself. The calculations yield results that are guaranteed to enclose all possible distributions of the output variable if the input p-boxes were also sure to enclose their respective distributions. In some cases, a calculated p-box will also be best-possible in the sense that the bounds could be no tighter without excluding some of the possible distributions.
P-boxes are usually merely bounds on possible distributions. The bounds often also enclose distributions that are not themselves possible. For instance, the set of probability distributions that could result from adding random values without the independence assumption from two (precise) distributions is generally a proper subset of all the distributions enclosed by the p-box computed for the sum. That is, there are distributions within the output p-box that could not arise under any dependence between the two input distributions. The output p-box will, however, always contain all distributions that are possible, so long as the input p-boxes were sure to enclose their respective underlying distributions. This property often suffices for use in risk analysis and other fields requiring calculations under uncertainty.
The idea of bounding probability has a very long tradition throughout the history of probability theory. Indeed, in 1854 George Boole used the notion of interval bounds on probability in his The Laws of Thought . [1] [2] Also dating from the latter half of the 19th century, the inequality attributed to Chebyshev described bounds on a distribution when only the mean and variance of the variable are known, and the related inequality attributed to Markov found bounds on a positive variable when only the mean is known. Kyburg [3] reviewed the history of interval probabilities and traced the development of the critical ideas through the 20th century, including the important notion of incomparable probabilities favored by Keynes.
Of particular note is Fréchet's derivation in the 1930s of bounds on calculations involving total probabilities without dependence assumptions. Bounding probabilities has continued to the present day (e.g., Walley's theory of imprecise probability. [4] )
The methods of probability bounds analysis that could be routinely used in risk assessments were developed in the 1980s. Hailperin [2] described a computational scheme for bounding logical calculations extending the ideas of Boole. Yager [5] described the elementary procedures by which bounds on convolutions can be computed under an assumption of independence. At about the same time, Makarov, [6] and independently, Rüschendorf [7] solved the problem, originally posed by Kolmogorov, of how to find the upper and lower bounds for the probability distribution of a sum of random variables whose marginal distributions, but not their joint distribution, are known. Frank et al. [8] generalized the result of Makarov and expressed it in terms of copulas. Since that time, formulas and algorithms for sums have been generalized and extended to differences, products, quotients and other binary and unary functions under various dependence assumptions. [9] [10] [11] [12] [13] [14]
Arithmetic expressions involving operations such as additions, subtractions, multiplications, divisions, minima, maxima, powers, exponentials, logarithms, square roots, absolute values, etc., are commonly used in risk analyses and uncertainty modeling. Convolution is the operation of finding the probability distribution of a sum of independent random variables specified by probability distributions. We can extend the term to finding distributions of other mathematical functions (products, differences, quotients, and more complex functions) and other assumptions about the intervariable dependencies. There are convenient algorithms for computing these generalized convolutions under a variety of assumptions about the dependencies among the inputs. [5] [9] [10] [14]
Let denote the space of distribution functions on the real numbers i.e.,
A p-box is a quintuple
where are real intervals, and This quintuple denotes the set of distribution functions such that:
If a function satisfies all the conditions above it is said to be inside the p-box. In some cases, there may be no information about the moments or distribution family other than what is encoded in the two distribution functions that constitute the edges of the p-box. Then the quintuple representing the p-box can be denoted more compactly as [B1, B2]. This notation harkens to that of intervals on the real line, except that the endpoints are distributions rather than points.
The notation denotes the fact that is a random variable governed by the distribution function F, that is,
Let us generalize the tilde notation for use with p-boxes. We will write X ~ B to mean that X is a random variable whose distribution function is unknown except that it is inside B. Thus, X ~ F ∈ B can be contracted to X ~ B without mentioning the distribution function explicitly.
If X and Y are independent random variables with distributions F and G respectively, then X + Y = Z ~ H given by
This operation is called a convolution on F and G. The analogous operation on p-boxes is straightforward for sums. Suppose
If X and Y are stochastically independent, then the distribution of Z = X + Y is inside the p-box
Finding bounds on the distribution of sums Z = X + Ywithout making any assumption about the dependence between X and Y is actually easier than the problem assuming independence. Makarov [6] [8] [9] showed that
These bounds are implied by the Fréchet–Hoeffding copula bounds. The problem can also be solved using the methods of mathematical programming. [13]
The convolution under the intermediate assumption that X and Y have positive dependence is likewise easy to compute, as is the convolution under the extreme assumptions of perfect positive or perfect negative dependency between X and Y. [14]
Generalized convolutions for other operations such as subtraction, multiplication, division, etc., can be derived using transformations. For instance, p-box subtraction A − B can be defined as A + (−B), where the negative of a p-box B = [B1, B2] is [B2(−x), B1(−x)].
Logical or Boolean expressions involving conjunctions (AND operations), disjunctions (OR operations), exclusive disjunctions, equivalences, conditionals, etc. arise in the analysis of fault trees and event trees common in risk assessments. If the probabilities of events are characterized by intervals, as suggested by Boole [1] and Keynes [3] among others, these binary operations are straightforward to evaluate. For example, if the probability of an event A is in the interval P(A) = a = [0.2, 0.25], and the probability of the event B is in P(B) = b = [0.1, 0.3], then the probability of the conjunction is surely in the interval
so long as A and B can be assumed to be independent events. If they are not independent, we can still bound the conjunction using the classical Fréchet inequality. In this case, we can infer at least that the probability of the joint event A & B is surely within the interval
where env([x1,x2], [y1,y2]) is [min(x1,y1), max(x2,y2)]. Likewise, the probability of the disjunction is surely in the interval
if A and B are independent events. If they are not independent, the Fréchet inequality bounds the disjunction
It is also possible to compute interval bounds on the conjunction or disjunction under other assumptions about the dependence between A and B. For instance, one might assume they are positively dependent, in which case the resulting interval is not as tight as the answer assuming independence but tighter than the answer given by the Fréchet inequality. Comparable calculations are used for other logical functions such as negation, exclusive disjunction, etc. When the Boolean expression to be evaluated becomes complex, it may be necessary to evaluate it using the methods of mathematical programming [2] to get best-possible bounds on the expression. A similar problem one presents in the case of probabilistic logic (see for example Gerla 1994). If the probabilities of the events are characterized by probability distributions or p-boxes rather than intervals, then analogous calculations can be done to obtain distributional or p-box results characterizing the probability of the top event.
The probability that an uncertain number represented by a p-box D is less than zero is the interval Pr(D < 0) = [F(0), F̅(0)], where F̅(0) is the left bound of the probability box D and F(0) is its right bound, both evaluated at zero. Two uncertain numbers represented by probability boxes may then be compared for numerical magnitude with the following encodings:
Thus the probability that A is less than B is the same as the probability that their difference is less than zero, and this probability can be said to be the value of the expression A < B.
Like arithmetic and logical operations, these magnitude comparisons generally depend on the stochastic dependence between A and B, and the subtraction in the encoding should reflect that dependence. If their dependence is unknown, the difference can be computed without making any assumption using the Fréchet operation.
Some analysts [15] [16] [17] [18] [19] [20] use sampling-based approaches to computing probability bounds, including Monte Carlo simulation, Latin hypercube methods or importance sampling. These approaches cannot assure mathematical rigor in the result because such simulation methods are approximations, although their performance can generally be improved simply by increasing the number of replications in the simulation. Thus, unlike the analytical theorems or methods based on mathematical programming, sampling-based calculations usually cannot produce verified computations. However, sampling-based methods can be very useful in addressing a variety of problems which are computationally difficult to solve analytically or even to rigorously bound. One important example is the use of Cauchy-deviate sampling to avoid the curse of dimensionality in propagating interval uncertainty through high-dimensional problems. [21]
PBA belongs to a class of methods that use imprecise probabilities to simultaneously represent aleatoric and epistemic uncertainties. PBA is a generalization of both interval analysis and probabilistic convolution such as is commonly implemented with Monte Carlo simulation. PBA is also closely related to robust Bayes analysis, which is sometimes called Bayesian sensitivity analysis. PBA is an alternative to second-order Monte Carlo simulation.
P-boxes and probability bounds analysis have been used in many applications spanning many disciplines in engineering and environmental science, including:
In mathematics, convolution is a mathematical operation on two functions that produces a third function that expresses how the shape of one is modified by the other. The term convolution refers to both the result function and to the process of computing it. It is defined as the integral of the product of the two functions after one is reflected about the y-axis and shifted. The choice of which function is reflected and shifted before the integral does not change the integral result. The integral is evaluated for all values of shift, producing the convolution function.
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to :
A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' can be misleading as it is not actually random or a variable, but rather it is a function from possible outcomes in a sample space to a measurable space, often to the real numbers.
In probability theory, a probability density function (PDF), or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.
In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.
In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent. Pairwise independent random variables with finite variance are uncorrelated.
In quantum mechanics, information theory, and Fourier analysis, the entropic uncertainty or Hirschman uncertainty is defined as the sum of the temporal and spectral Shannon entropies. It turns out that Heisenberg's uncertainty principle can be expressed as a lower bound on the sum of these entropies. This is stronger than the usual statement of the uncertainty principle in terms of the product of standard deviations.
In probability theory and mathematical physics, a random matrix is a matrix-valued random variable—that is, a matrix in which some or all elements are random variables. Many important properties of physical systems can be represented mathematically as matrix problems. For example, the thermal conductivity of a lattice can be computed from the dynamical matrix of the particle-particle interactions within the lattice.
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
Interval arithmetic is a mathematical technique used to mitigate rounding and measurement errors in mathematical computation by computing function bounds. Numerical methods involving interval arithmetic can guarantee reliable and mathematically correct results. Instead of representing a value as a single number, interval arithmetic represents each value as a range of possibilities.
In probability theory, especially in mathematical statistics, a location–scale family is a family of probability distributions parametrized by a location parameter and a non-negative scale parameter. For any random variable whose probability distribution function belongs to such a family, the distribution function of also belongs to the family.
In metrology, measurement uncertainty is the expression of the statistical dispersion of the values attributed to a measured quantity. All measurements are subject to uncertainty and a measurement result is complete only when it is accompanied by a statement of the associated uncertainty, such as the standard deviation. By international agreement, this uncertainty has a probabilistic basis and reflects incomplete knowledge of the quantity value. It is a non-negative parameter.
Imprecise probability generalizes probability theory to allow for partial probability specifications, and is applicable when information is scarce, vague, or conflicting, in which case a unique probability distribution may be hard to identify. Thereby, the theory aims to represent the available knowledge more accurately. Imprecision is useful for dealing with expert elicitation, because:
In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.
A probability box is a characterization of uncertain numbers consisting of both aleatoric and epistemic uncertainties that is often used in risk analysis or quantitative uncertainty modeling where numerical calculations must be performed. Probability bounds analysis is used to make arithmetic and logical calculations with p-boxes.
In mathematics, a credal set is a set of probability distributions or, more generally, a set of probability measures. A credal set is often assumed or constructed to be a closed convex set. It is intended to express uncertainty or doubt about the probability model that should be used, or to convey the beliefs of a Bayesian agent about the possible states of the world.
P-boxes and probability bounds analysis have been used in many applications spanning many disciplines in engineering and environmental science, including:
In probabilistic logic, the Fréchet inequalities, also known as the Boole–Fréchet inequalities, are rules implicit in the work of George Boole and explicitly derived by Maurice Fréchet that govern the combination of probabilities about logical propositions or events logically linked together in conjunctions or disjunctions as in Boolean expressions or fault or event trees common in risk assessments, engineering design and artificial intelligence. These inequalities can be considered rules about how to bound calculations involving probabilities without assuming independence or, indeed, without making any dependence assumptions whatsoever. The Fréchet inequalities are closely related to the Boole–Bonferroni–Fréchet inequalities, and to Fréchet bounds.
In probability theory and statistics, the Dirichlet process (DP) is one of the most popular Bayesian nonparametric models. It was introduced by Thomas Ferguson as a prior over probability distributions.
In regression analysis, an interval predictor model (IPM) is an approach to regression where bounds on the function to be approximated are obtained. This differs from other techniques in machine learning, where usually one wishes to estimate point values or an entire probability distribution. Interval Predictor Models are sometimes referred to as a nonparametric regression technique, because a potentially infinite set of functions are contained by the IPM, and no specific distribution is implied for the regressed variables.