In probability theory, the chain rule [1] (also called the general product rule [2] [3] ) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distribution of random variables respectively, using conditional probabilities. This rule allows one to express a joint probability in terms of only conditional probabilities. [4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.
For two events and , the chain rule states that
where denotes the conditional probability of given .
A Jar A has 1 black ball and 2 white balls, and another Jar B has 1 black ball and 3 white balls. Suppose we pick an urn at random and then select a ball from that urn. Let event be choosing the first urn, i.e. , where is the complementary event of . Let event be the chance we choose a white ball. The chance of choosing a white ball, given that we have chosen the first urn, is The intersection then describes choosing the first urn and a white ball from it. The probability can be calculated by the chain rule as follows:
Event definition: Containers are j1 and j2, b is withdraw black ball, w is withdraw white ball. The es row is the event sum for each color. The es column is the event sum for each container. P(w|j1) translates to the probability of 'withdrawing a white ball from j1'. From the table, the ratio is 2/3. P(w∩j1) translates to the probability of 'picking j1 and withdrawing a white ball'. From the table, the ratio is 2/7.
~phyti
For events whose intersection has not probability zero, the chain rule states
For , i.e. four events, the chain rule reads
We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards. What is the probability that we have picked 4 aces?
First, we set . Obviously, we get the following probabilities
Applying the chain rule,
Let be a probability space. Recall that the conditional probability of an given is defined as
Then we have the following theorem.
Chain rule— Let be a probability space. Let . Then
The formula follows immediately by recursion
where we used the definition of the conditional probability in the first step.
For two discrete random variables , we use the events and in the definition above, and find the joint distribution as
or
where is the probability distribution of and conditional probability distribution of given .
Let be random variables and . By the definition of the conditional probability,
and using the chain rule, where we set , we can find the joint distribution as
For , i.e. considering three random variables. Then, the chain rule reads
{{citation}}: CS1 maint: location missing publisher (link){{cite book}}: CS1 maint: location missing publisher (link)