Pairwise independence

Last updated

In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. [1] Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not mutually independent. Pairwise independent random variables with finite variance are uncorrelated.

Contents

A pair of random variables X and Y are independent if and only if the random vector (X, Y) with joint cumulative distribution function (CDF) satisfies

or equivalently, their joint density satisfies

That is, the joint distribution is equal to the product of the marginal distributions. [2]

Unless it is not clear in context, in practice the modifier "mutual" is usually dropped so that independence means mutual independence. A statement such as " X, Y, Z are independent random variables" means that X, Y, Z are mutually independent.

Example

Pairwise independence does not imply mutual independence, as shown by the following example attributed to S. Bernstein. [3]

Suppose X and Y are two independent tosses of a fair coin, where we designate 1 for heads and 0 for tails. Let the third random variable Z be equal to 1 if exactly one of those coin tosses resulted in "heads", and 0 otherwise (i.e., ). Then jointly the triple (X, Y, Z) has the following probability distribution:

Here the marginal probability distributions are identical: and The bivariate distributions also agree: where

Since each of the pairwise joint distributions equals the product of their respective marginal distributions, the variables are pairwise independent:

However, X, Y, and Z are not mutually independent, since the left side equalling for example 1/4 for (x, y, z) = (0, 0, 0) while the right side equals 1/8 for (x, y, z) = (0, 0, 0). In fact, any of is completely determined by the other two (any of X, Y, Z is the sum (modulo 2) of the others). That is as far from independence as random variables can get.

Probability of the union of pairwise independent events

Bounds on the probability that the sum of Bernoulli random variables is at least one, commonly known as the union bound, are provided by the Boole–Fréchet [4] [5] inequalities. While these bounds assume only univariate information, several bounds with knowledge of general bivariate probabilities, have been proposed too. Denote by a set of Bernoulli events with probability of occurrence for each . Suppose the bivariate probabilities are given by for every pair of indices . Kounias [6] derived the following upper bound:


which subtracts the maximum weight of a star spanning tree on a complete graph with nodes (where the edge weights are given by ) from the sum of the marginal probabilities .
Hunter-Worsley [7] [8] tightened this upper bound by optimizing over as follows:

where is the set of all spanning trees on the graph. These bounds are not the tightest possible with general bivariates even when feasibility is guaranteed as shown in Boros et.al. [9] However, when the variables are pairwise independent (), Ramachandra-Natarajan [10] showed that the Kounias-Hunter-Worsley [6] [7] [8] bound is tight by proving that the maximum probability of the union of events admits a closed-form expression given as:

 

 

 

 

(1)

where the probabilities are sorted in increasing order as . It is interesting to note that the tight bound in Eq. 1 depends only on the sum of the smallest probabilities and the largest probability . Thus, while ordering of the probabilities plays a role in the derivation of the bound, the ordering among the smallest probabilities is inconsequential since only their sum is used.

Comparison with the Boole–Fréchet union bound

It is useful to compare the smallest bounds on the probability of the union with arbitrary dependence and pairwise independence respectively. The tightest Boole–Fréchet upper union bound (assuming only univariate information) is given as:

 

 

 

 

(2)

As shown in Ramachandra-Natarajan, [10] it can be easily verified that the ratio of the two tight bounds in Eq. 2 and Eq. 1 is upper bounded by where the maximum value of is attained when

,

where the probabilities are sorted in increasing order as . In other words, in the best-case scenario, the pairwise independence bound in Eq. 1 provides an improvement of over the univariate bound in Eq. 2 .

Generalization

More generally, we can talk about k-wise independence, for any k  2. The idea is similar: a set of random variables is k-wise independent if every subset of size k of those variables is independent. k-wise independence has been used in theoretical computer science, where it was used to prove a theorem about the problem MAXEkSAT.

k-wise independence is used in the proof that k-independent hashing functions are secure unforgeable message authentication codes.

See also

Related Research Articles

Binomial distribution Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

Cumulative distribution function Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

In theoretical computer science, communication complexity studies the amount of communication required to solve a problem when the input to the problem is distributed among two or more parties. The study of communication complexity was first introduced by Andrew Yao in 1979, while studying the problem of computation distributed among several machines. The problem is usually stated as follows: two parties each receive a -bit string and . The goal is for Alice to compute the value of a certain function, , that depends on both and , with the least amount of communication between them.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be k or more standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

Rayleigh distribution Probability distribution

In probability theory and statistics, the Rayleigh distribution is a continuous probability distribution for nonnegative-valued random variables. Up to rescaling, it coincides with the chi distribution with two degrees of freedom. The distribution is named after Lord Rayleigh.

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

In probability theory, the Chernoff bound gives exponentially decreasing bounds on tail distributions of sums of independent random variables. Despite being named after Herman Chernoff, the author of the paper it first appeared in, the result is due to Herman Rubin. It is a sharper bound than the known first- or second-moment-based tail bounds such as Markov's inequality or Chebyshev's inequality, which only yield power-law bounds on tail decay. However, the Chernoff bound requires that the variates be independent – a condition that neither Markov's inequality nor Chebyshev's inequality require, although Chebyshev's inequality does require the variates to be pairwise independent.

Joint probability distribution Type of probability distribution

Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables. It also encodes the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.

In statistics, an exchangeable sequence of random variables is a sequence X1X2X3, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. Thus, for example the sequences

Stein's method is a general method in probability theory to obtain bounds on the distance between two probability distributions with respect to a probability metric. It was introduced by Charles Stein, who first published it in 1972, to obtain a bound between the distribution of a sum of -dependent sequence of random variables and a standard normal distribution in the Kolmogorov (uniform) metric and hence to prove not only a central limit theorem, but also bounds on the rates of convergence for the given metric.

Poisson distribution Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area or volume.

In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usually abbreviated as i.i.d. or iid or IID. IID was first used in statistics. With the development of science, IID has been applied in different fields such as data mining and signal processing.

In probability theory, concentration inequalities provide bounds on how a random variable deviates from some value. The law of large numbers of classical probability theory states that sums of independent random variables are, under very mild conditions, close to their expectation with a large probability. Such sums are the most basic examples of random variables concentrated around their mean. Recent results show that such behavior is shared by other functions of independent random variables.

Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions.

In probability theory, Eaton's inequality is a bound on the largest values of a linear combination of bounded random variables. This inequality was described in 1974 by Morris L. Eaton.

In the mathematical theory of probability, Janson's inequality is a collection of related inequalities giving an exponential bound on the probability of many related events happening simultaneously by their pairwise dependence. Informally Janson's inequality involves taking a sample of many independent random binary variables, and a set of subsets of those variables and bounding the probability that the sample will contain any of those subsets by their pairwise correlation.

References

  1. Gut, A. (2005) Probability: a Graduate Course, Springer-Verlag. ISBN   0-387-27332-8. pp. 7172.
  2. Hogg, R. V., McKean, J. W., Craig, A. T. (2005). Introduction to Mathematical Statistics (6 ed.). Upper Saddle River, NJ: Pearson Prentice Hall. ISBN   0-13-008507-3.{{cite book}}: CS1 maint: multiple names: authors list (link) Definition 2.5.1, page 109.
  3. Hogg, R. V., McKean, J. W., Craig, A. T. (2005). Introduction to Mathematical Statistics (6 ed.). Upper Saddle River, NJ: Pearson Prentice Hall. ISBN   0-13-008507-3.{{cite book}}: CS1 maint: multiple names: authors list (link) Remark 2.6.1, p. 120.
  4. Boole, G. (1854). An Investigation of the Laws of Thought, On Which Are Founded the Mathematical Theories of Logic and Probability. Walton and Maberly, London. See Boole's "major" and "minor" limits of a conjunction on page 299.
  5. Fréchet, M. (1935). Généralisations du théorème des probabilités totales. Fundamenta Mathematicae25: 379–387.
  6. 1 2 E. G. Kounias (1968). "Bounds for the probability of a union, with applications". The Annals of Mathematical Statistics. 39 (6): 2154–2158. doi:10.1214/aoms/1177698049.
  7. 1 2 D. Hunter (1976). "An upper bound for the probability of a union". Journal of Applied Probability. 13 (3): 597–603. doi:10.2307/3212481. JSTOR   3212481.
  8. 1 2 K. J. Worsley (1982). "An improved Bonferroni inequality and applications". Biometrika. 69 (2): 297–302. doi:10.1093/biomet/69.2.297.
  9. E. Boros, A. Scozzari ,F. Tardella and P. Veneziani (2014). "Polynomially computable bounds for the probability of the union of events". Mathematics of Operations Research. 39 (4): 1311–1329. doi:10.1287/moor.2014.0657.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 A. Ramachandra, K. Natarajan (2020). "Tight Probability Bounds with Pairwise Independence". arXiv: 2006.00516 .{{cite journal}}: Cite journal requires |journal= (help)