Conditional independence

Last updated November 14, 2023

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If $A$ is the hypothesis, and $B$ and $C$ are observations, conditional independence can be stated as an equality:

Conditional independence of events
Proof of the equivalent definition
Examples
Conditional independence of random variables
Conditional independence of random vectors
Uses in Bayesian inference
Rules of conditional independence
Symmetry
Decomposition
Weak union
Contraction
Intersection
See also
References
External links

P(A\mid B,C)=P(A\mid C)

where $P(A\mid B,C)$ is the probability of $A$ given both $B$ and $C$ . Since the probability of $A$ given $C$ is the same as the probability of $A$ given both $B$ and $C$ , this equality expresses that $B$ contributes nothing to the certainty of $A$ . In this case, $A$ and $B$ are said to be conditionally independent given $C$ , written symbolically as: $(A\perp \!\!\!\perp B\mid C)$ . In the language of causal equality notation, two functions $f(y)$ and $g(y)$ which both depend on a common variable $y$ are described as conditionally independent using the notation $f\left(y\right)~{\overset {\curvearrowleft \curvearrowright }{=}}~g\left(y\right)$ , which is equivalent to the notation $P(f\mid g,y)=P(f\mid y)$ .

The concept of conditional independence is essential to graph-based theories of statistical inference, as it establishes a mathematical relation between a collection of conditional statements and a graphoid.

Conditional independence of events

Let $A$ , $B$ , and $C$ be events. $A$ and $B$ are said to be conditionally independent given $C$ if and only if $P(C)>0$ and:

P(A\mid B,C)=P(A\mid C)

This property is often written: $(A\perp \!\!\!\perp B\mid C)$ , which should be read $((A\perp \!\!\!\perp B)\vert C)$ .

Equivalently, conditional independence may be stated as:

P(A,B|C)=P(A|C)P(B|C)

where $P(A,B|C)$ is the joint probability of $A$ and $B$ given $C$ . This alternate formulation states that $A$ and $B$ are independent events, given $C$ .

It demonstrates that $(A\perp \!\!\!\perp B\mid C)$ is equivalent to $(B\perp \!\!\!\perp A\mid C)$ .

Proof of the equivalent definition

P(A,B\mid C)=P(A\mid C)P(B\mid C)

iff

{\frac {P(A,B,C)}{P(C)}}=\left({\frac {P(A,C)}{P(C)}}\right)\left({\frac {P(B,C)}{P(C)}}\right)

(definition of conditional probability)

iff

P(A,B,C)={\frac {P(A,C)P(B,C)}{P(C)}}

(multiply both sides by

P(C)

)

iff

{\frac {P(A,B,C)}{P(B,C)}}={\frac {P(A,C)}{P(C)}}

(divide both sides by

P(B,C)

)

iff

P(A\mid B,C)=P(A\mid C)

(definition of conditional probability)

\therefore

Examples

Coloured boxes

Each cell represents a possible outcome. The events $\color {red}R$ , $\color {blue}B$ and $\color {gold}Y$ are represented by the areas shaded red, blue and yellow respectively. The overlap between the events $\color {red}R$ and $\color {blue}B$ is shaded purple.

The probabilities of these events are shaded areas with respect to the total area. In both examples $\color {red}R$ and $\color {blue}B$ are conditionally independent given $\color {gold}Y$ because:

\Pr({\color {red}R},{\color {blue}B}\mid {\color {gold}Y})=\Pr({\color {red}R}\mid {\color {gold}Y})\Pr({\color {blue}B}\mid {\color {gold}Y})

^[1]

but not conditionally independent given $\left[{\text{not }}{\color {gold}Y}\right]$ because:

\Pr({\color {red}R},{\color {blue}B}\mid {\text{not }}{\color {gold}Y})\not =\Pr({\color {red}R}\mid {\text{not }}{\color {gold}Y})\Pr({\color {blue}B}\mid {\text{not }}{\color {gold}Y})

Proximity and delays

Let events A and B be defined as the probability that person A and person B will be home in time for dinner where both people are randomly sampled from the entire world. Events A and B can be assumed to be independent i.e. knowledge that A is late has minimal to no change on the probability that B will be late. However, if a third event is introduced, person A and person B live in the same neighborhood, the two events are now considered not conditionally independent. Traffic conditions and weather-related events that might delay person A, might delay person B as well. Given the third event and knowledge that person A was late, the probability that person B will be late does meaningfully change.^[2]

Dice rolling

Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one dice will not tell you about the result of the second dice. (That is, the two dice are independent.) If, however, the 1st dice's result is a 3, and someone tells you about a third event - that the sum of the two results is even - then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.^[2]

Height and vocabulary

Height and vocabulary are dependent since very small people tend to be children, known for their more basic vocabularies. But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.

Conditional independence of random variables

Two discrete random variables $X$ and $Y$ are conditionally independent given a third discrete random variable $Z$ if and only if they are independent in their conditional probability distribution given $Z$ . That is, $X$ and $Y$ are conditionally independent given $Z$ if and only if, given any value of $Z$ , the probability distribution of $X$ is the same for all values of $Y$ and the probability distribution of $Y$ is the same for all values of $X$ . Formally:

(X\perp \!\!\!\perp Y)\mid Z\quad \iff \quad F_{X,Y\,\mid \,Z\,=\,z}(x,y)=F_{X\,\mid \,Z\,=\,z}(x)\cdot F_{Y\,\mid \,Z\,=\,z}(y)\quad {\text{for all }}x,y,z

(Eq.2)

where $F_{X,Y\,\mid \,Z\,=\,z}(x,y)=\Pr(X\leq x,Y\leq y\mid Z=z)$ is the conditional cumulative distribution function of $X$ and $Y$ given $Z$ .

Two events $R$ and $B$ are conditionally independent given a σ-algebra $\Sigma$ if

\Pr(R,B\mid \Sigma )=\Pr(R\mid \Sigma )\Pr(B\mid \Sigma ){\text{ a.s.}}

where $\Pr(A\mid \Sigma )$ denotes the conditional expectation of the indicator function of the event $A$ , $\chi _{A}$ , given the sigma algebra $\Sigma$ . That is,

\Pr(A\mid \Sigma ):=\operatorname {E} [\chi _{A}\mid \Sigma ].

Two random variables $X$ and $Y$ are conditionally independent given a σ-algebra $\Sigma$ if the above equation holds for all $R$ in $\sigma (X)$ and $B$ in $\sigma (Y)$ .

Two random variables $X$ and $Y$ are conditionally independent given a random variable $W$ if they are independent given σ(W): the σ-algebra generated by $W$ . This is commonly written:

X\perp \!\!\!\perp Y\mid W

or

X\perp Y\mid W

This it read " $X$ is independent of $Y$ , given $W$ "; the conditioning applies to the whole statement: "( $X$ is independent of $Y$ ) given $W$ ".

(X\perp \!\!\!\perp Y)\mid W

This notation extends $X\perp \!\!\!\perp Y$ for " $X$ is independent of $Y$ ."

If $W$ assumes a countable set of values, this is equivalent to the conditional independence of X and Y for the events of the form $[W=w]$ . Conditional independence of more than two events, or of more than two random variables, is defined analogously.

The following two examples show that $X\perp \!\!\!\perp Y$ neither implies nor is implied by $(X\perp \!\!\!\perp Y)\mid W$ .

First, suppose $W$ is 0 with probability 0.5 and 1 otherwise. When W = 0 take $X$ and $Y$ to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise. When $W=1$ , $X$ and $Y$ are again independent, but this time they take the value 1 with probability 0.99. Then $(X\perp \!\!\!\perp Y)\mid W$ . But $X$ and $Y$ are dependent, because Pr(X = 0) < Pr(X = 0|Y = 0). This is because Pr(X = 0) = 0.5, but if Y = 0 then it's very likely that W = 0 and thus that X = 0 as well, so Pr(X = 0|Y = 0) > 0.5.

For the second example, suppose $X\perp \!\!\!\perp Y$ , each taking the values 0 and 1 with probability 0.5. Let $W$ be the product $X\cdot Y$ . Then when $W=0$ , Pr(X = 0) = 2/3, but Pr(X = 0|Y = 0) = 1/2, so $(X\perp \!\!\!\perp Y)\mid W$ is false. This is also an example of Explaining Away. See Kevin Murphy's tutorial ^[3] where $X$ and $Y$ take the values "brainy" and "sporty".

Conditional independence of random vectors

Two random vectors $\mathbf {X} =(X_{1},\ldots ,X_{l})^{\mathrm {T} }$ and $\mathbf {Y} =(Y_{1},\ldots ,Y_{m})^{\mathrm {T} }$ are conditionally independent given a third random vector $\mathbf {Z} =(Z_{1},\ldots ,Z_{n})^{\mathrm {T} }$ if and only if they are independent in their conditional cumulative distribution given $\mathbf {Z}$ . Formally:

(\mathbf {X} \perp \!\!\!\perp \mathbf {Y} )\mid \mathbf {Z} \quad \iff \quad F_{\mathbf {X} ,\mathbf {Y} |\mathbf {Z} =\mathbf {z} }(\mathbf {x} ,\mathbf {y} )=F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )\cdot F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )\quad {\text{for all }}\mathbf {x} ,\mathbf {y} ,\mathbf {z}

(Eq.3)

where $\mathbf {x} =(x_{1},\ldots ,x_{l})^{\mathrm {T} }$ , $\mathbf {y} =(y_{1},\ldots ,y_{m})^{\mathrm {T} }$ and $\mathbf {z} =(z_{1},\ldots ,z_{n})^{\mathrm {T} }$ and the conditional cumulative distributions are defined as follows.

{\begin{aligned}F_{\mathbf {X} ,\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} ,\mathbf {y} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l},Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )&=\Pr(Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\end{aligned}}

Uses in Bayesian inference

Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, ..., n, let X_i = 1 or 0 corresponding, respectively, to whether or not the ith chosen voter will or will not vote "yes".

In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X₁, ..., X_n are independent random variables.

By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X₁, ..., X_n are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the nextX to be observed will be equal to 1.

Rules of conditional independence

A set of rules governing statements of conditional independence have been derived from the basic definition.^[4]^[5]

These rules were termed "Graphoid Axioms" by Pearl and Paz,^[6] because they hold in graphs, where $X\perp \!\!\!\perp A\mid B$ is interpreted to mean: "All paths from X to A are intercepted by the set B".^[7]

Symmetry

X\perp \!\!\!\perp Y\quad \Rightarrow \quad Y\perp \!\!\!\perp X

Decomposition

X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\\X\perp \!\!\!\perp B\end{cases}}

Proof

$p_{X,A,B}(x,a,b)=p_{X}(x)p_{A,B}(a,b)$ (meaning of $X\perp \!\!\!\perp A,B$ )
$\int _{B}p_{X,A,B}(x,a,b)\,db=\int _{B}p_{X}(x)p_{A,B}(a,b)\,db$ (ignore variable B by integrating it out)
$p_{X,A}(x,a)=p_{X}(x)p_{A}(a)$

A similar proof shows the independence of X and B.

Weak union

X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\mid A\end{cases}}

Proof

By assumption, $\Pr(X)=\Pr(X\mid A,B)$ .
Due to the property of decomposition $X\perp \!\!\!\perp B$ , $\Pr(X)=\Pr(X\mid B)$ .
Combining the above two equalities gives $\Pr(X\mid B)=\Pr(X\mid A,B)$ , which establishes $X\perp \!\!\!\perp A\mid B$ .

The second condition can be proved similarly.

Contraction

\left.{\begin{aligned}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp A,B

Proof

This property can be proved by noticing $\Pr(X\mid A,B)=\Pr(X\mid B)=\Pr(X)$ , each equality of which is asserted by $X\perp \!\!\!\perp A\mid B$ and $X\perp \!\!\!\perp B$ , respectively.

Intersection

For strictly positive probability distributions,^[5] the following also holds:

\left.{\begin{aligned}X\perp \!\!\!\perp Y\mid Z,W\\X\perp \!\!\!\perp W\mid Z,Y\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp W,Y\mid Z

Proof

By assumption:

P(X|Z,W,Y)=P(X|Z,W)\land P(X|Z,W,Y)=P(X|Z,Y)\implies P(X|Z,Y)=P(X|Z,W)

Using this equality, together with the Law of total probability applied to $P(X|Z)$ :

{\begin{aligned}P(X|Z)&=\sum _{w\in W}P(X|Z,W=w)P(W=w|Z)\\[4pt]&=\sum _{w\in W}P(X|Y,Z)P(W=w|Z)\\[4pt]&=P(X|Z,Y)\sum _{w\in W}P(W=w|Z)\\[4pt]&=P(X|Z,Y)\end{aligned}}

Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say K. For example, $X\perp \!\!\!\perp Y\Rightarrow Y\perp \!\!\!\perp X$ would also mean that $X\perp \!\!\!\perp Y\mid K\Rightarrow Y\perp \!\!\!\perp X\mid K$ .

Related Research Articles

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable $, or just distribution function of, evaluated at, is the probability that will take a value less than or equal to .$

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In statistics, naive Bayes classifiers are a family of linear "probabilistic classifiers" based on applying Bayes' theorem with strong (naive) independence assumptions between the features. They are among the simplest Bayesian network models, but coupled with kernel density estimation, they can achieve high accuracy levels.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

In probability theory and statistics, two real-valued random variables, $,, are said to be uncorrelated if their covariance,, is zero. If two variables are uncorrelated, there is no linear relationship between them.$

<span class="mw-page-title-main">Covariance matrix</span> Measure of covariance of components of a random vector

In probability theory and statistics, a covariance matrix is a square matrix giving the covariance between each pair of elements of a given random vector.

<span class="mw-page-title-main">Martingale (probability theory)</span> Model in probability theory

In probability theory, a martingale is a sequence of random variables for which, at a particular time, the conditional expectation of the next value in the sequence is equal to the present value, regardless of all prior values.

In linear algebra and functional analysis, a projection is a linear transformation $from a vector space to itself such that . That is, whenever is applied twice to any vector, it gives the same result as if it were applied once. It leaves its image unchanged. This definition of "projection" formalizes and generalizes the idea of graphical projection. One can also consider the effect of a projection on a geometrical object by examining the effect of the projection on points in the object.$

Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables. It also encodes the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.

In mathematics, Schubert calculus is a branch of algebraic geometry introduced in the nineteenth century by Hermann Schubert, in order to solve various counting problems of projective geometry. It was a precursor of several more modern theories, for example characteristic classes, and in particular its algorithmic aspects are still of current interest. The term Schubert calculus is sometimes used to mean the enumerative geometry of linear subspaces of a vector space, which is roughly equivalent to describing the cohomology ring of Grassmannians. Sometimes it is used to mean the more general enumerative geometry of algebraic varieties that are homogenous spaces of simple Lie groups. Even more generally, Schubert calculus is often understood to encompass the study of analogous questions in generalized cohomology theories.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

In statistics, an exchangeable sequence of random variables is a sequence X₁, X₂, X₃, ... whose joint probability distribution does not change when the positions in the sequence in which finitely many of them appear are altered. Thus, for example the sequences

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

The Brownian motion models for financial markets are based on the work of Robert C. Merton and Paul A. Samuelson, as extensions to the one-period market models of Harold Markowitz and William F. Sharpe, and are concerned with defining the concepts of financial assets and markets, portfolios, gains and wealth in terms of continuous-time stochastic processes.

In probability theory and statistics, the generalized chi-squared distribution is the distribution of a quadratic form of a multinormal variable, or a linear combination of different normal variables and squares of normal variables. Equivalently, it is also a linear sum of independent noncentral chi-square variables and a normal variable. There are several other such generalizations for which the same term is sometimes used; some of them are special cases of the family discussed here, for example the gamma distribution.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

References

↑ To see that this is the case, one needs to realise that Pr(R ∩ B | Y) is the probability of an overlap of R and B (the purple shaded area) in the Y area. Since, in the picture on the left, there are two squares where R and B overlap within the Y area, and the Y area has twelve squares, Pr(R ∩ B | Y) = 2/12 = 1/6. Similarly, Pr(R | Y) = 4/12 = 1/3 and Pr(B | Y) = 6/12 = 1/2.
1 2 Could someone explain conditional independence?
↑ "Graphical Models".
↑ Dawid, A. P. (1979). "Conditional Independence in Statistical Theory". Journal of the Royal Statistical Society, Series B . 41 (1): 1–31. JSTOR 2984718. MR 0535541.
1 2 J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press
↑ Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations".{{cite web}}: Missing or empty |url= (help)
↑ Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference . Morgan Kaufmann. ISBN 9780934613736.

External links

Media related to Conditional independence at Wikimedia Commons

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] To see that this is the case, one needs to realise that Pr(R ∩ B | Y) is the probability of an overlap of R and B (the purple shaded area) in the Y area. Since, in the picture on the left, there are two squares where R and B overlap within the Y area, and the Y area has twelve squares, Pr(R ∩ B | Y) = 2/12 = 1/6. Similarly, Pr(R | Y) = 4/12 = 1/3 and Pr(B | Y) = 6/12 = 1/2.

[:0-2] 1 2 Could someone explain conditional independence?

[3] "Graphical Models".

[4] Dawid, A. P. (1979). "Conditional Independence in Statistical Theory". Journal of the Royal Statistical Society, Series B . 41 (1): 1–31. JSTOR 2984718. MR 0535541.

[pearl:2000-5] 1 2 J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press

[pearl:paz85-6] Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations".{{cite web}}: Missing or empty |url= (help)

[pearl:88-7] Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference . Morgan Kaufmann. ISBN 9780934613736.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

Conditional independence

Contents

Conditional independence of events

Proof of the equivalent definition

Examples

Coloured boxes

Proximity and delays

Dice rolling

Height and vocabulary

Conditional independence of random variables

Conditional independence of random vectors

Uses in Bayesian inference

Rules of conditional independence

Symmetry

Decomposition

Weak union

Contraction

Intersection

See also

Related Research Articles

References

External links