Berkson's paradox

Last updated January 26, 2025

Berkson's paradox, also known as Berkson's bias, collider bias, or Berkson's fallacy, is a result in conditional probability and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is related to the explaining away phenomenon in Bayesian networks, and conditioning on a collider in graphical models.

Examples

Overview

An illustration of Berkson's Paradox. The top graph represents the actual distribution, in which a positive correlation between quality of burgers and fries is observed. However, an individual who does not eat at any location where both are bad observes only the distribution on the bottom graph, which appears to show a negative correlation. Berkson.png — An illustration of Berkson's Paradox. The top graph represents the actual distribution, in which a positive correlation between quality of burgers and fries is observed. However, an individual who does not eat at any location where both are bad observes only the distribution on the bottom graph, which appears to show a negative correlation.

The most common example of Berkson's paradox is a false observation of a negative correlation between two desirable traits, i.e., that members of a population which have some desirable trait tend to lack a second. Berkson's paradox occurs when this observation appears true when in reality the two properties are unrelated—or even positively correlated—because members of the population where both are absent are not equally observed. For example, a person may observe from their experience that fast food restaurants in their area which serve good hamburgers tend to serve bad fries and vice versa; but because they would likely not eat anywhere where both were bad, they fail to allow for the large number of restaurants in this category which would weaken or even flip the correlation.

Original illustration

Berkson's original illustration involves a retrospective study examining a risk factor for a disease in a statistical sample from a hospital in-patient population. Because samples are taken from a hospital in-patient population, rather than from the general public, this can result in a spurious negative association between the disease and the risk factor.^[1]

For example, if the risk factor is diabetes and the disease is cholecystitis, a hospital patient without diabetes is more likely to have cholecystitis than a member of the general population, since the patient must have had some non-diabetes (possibly cholecystitis-causing) reason to enter the hospital in the first place. That result will be obtained regardless of whether there is any association between diabetes and cholecystitis in the general population.

Dating pool example

An example presented by mathematician Jordan Ellenberg is that of a dating pool, measured on axes of niceness and handsomeness. A person might conclude from their own dating experience that "the handsome ones tend not to be nice, and the nice ones tend not to be handsome".^[2]

Suppose Alex will only date men whose niceness plus handsomeness exceeds some threshold. Then nicer men do not have to be as handsome to qualify for Alex's dating pool. So, among the men that Alex dates, the nicer ones are less handsome on average (and vice versa), even if these traits are uncorrelated in the general population.

This does not mean that men in the dating pool compare unfavorably with men in the population. On the contrary, the selection criterion for the pool means that Alex has high standards. The average nice man that Alex dates is actually more handsome than the average man in the population (since even among nice men, the ugliest portion of the population is skipped). Berkson's negative correlation is an effect that arises within the dating pool: the rude men that Alex dates must have been even more handsome to qualify, and the ugly men even more nice.

Quantitative example

As a quantitative example, suppose a collector has 1000 postage stamps, of which 300 are pretty and 100 are rare, with 30 being both pretty and rare. 30% of all his stamps are pretty and 10% of his pretty stamps are rare, so prettiness tells nothing about rarity. He puts the 370 stamps which are pretty or rare on display. Just over 27% of the stamps on display are rare (100/370), but still only 10%(30/300) of the pretty stamps are rare (and 100% of the 70 not-pretty stamps on display are rare). If an observer only considers stamps on display, they will observe a spurious negative relationship between prettiness and rarity as a result of the selection bias (that is, not-prettiness strongly indicates rarity in the display, but not in the total collection).

Statement

Two independent events become conditionally dependent given that at least one of them occurs. Symbolically:

If

P(A\cap B)=P(A)P(B)

and

P(A\cup B)<1

then

$P(A\cap B|A\cup B)<P(A|A\cup B)P(B|A\cup B)$

Proof: Note that $P(A|A\cup B)=P(A)/P(A\cup B)$ and $P(B|A\cup B)=P(B)/P(A\cup B)$ which, together with $P(A\cap B)=P(A)P(B)$ and $P(A\cup B)<1$ (so ${\frac {1}{P(A\cup B)}}<{\frac {1}{[P(A\cup B)]^{2}}}\$ ) implies that

${\begin{aligned}P(A\cap B|A\cup B)={\frac {P(A\cap B)}{P(A\cup B)}}={\frac {P(A)P(B)}{P(A\cup B)}}<{\frac {P(A)P(B)}{[P(A\cup B)]^{2}}}=P(A|A\cup B)P(B|A\cup B).\end{aligned}}$

One can see this in tabular form as follows: the yellow regions are the outcomes where at least one event occurs (and ~A means "not A").

	A	~A
B	A & B	~A & B
~B	A & ~B	~A & ~B

For instance, if one has a sample of $100$ , and both $A$ and $B$ occur independently half the time ( $P(A)=P(B)=1/2$ ), one obtains:

	A	~A
B	25	25
~B	25	25

So in $75$ outcomes, either $A$ or $B$ occurs, of which $50$ have $A$ occurring. By comparing the conditional probability of $A$ to the unconditional probability of $A$ :

P(A|A\cup B)=50/75=2/3>P(A)=50/100=1/2

We see that the probability of $A$ is higher ( $2/3$ ) in the subset of outcomes where ( $A$ or $B$ ) occurs, than in the overall population ( $1/2$ ). On the other hand, the probability of $A$ given both $B$ and ( $A$ or $B$ ) is simply the unconditional probability of $A$ , $P(A)$ , since $A$ is independent of $B$ . In the numerical example, we have conditioned on being in the top row:

	A	~A
B	25	25
~B	25	25

Here the probability of $A$ is $25/50=1/2$ .

Berkson's paradox arises because the conditional probability of $A$ given $B$ within the three-cell subset equals the conditional probability in the overall population, but the unconditional probability within the subset is inflated relative to the unconditional probability in the overall population, hence, within the subset, the presence of $B$ decreases the conditional probability of $A$ (back to its overall unconditional probability):

P(A|B,A\cup B)=P(A|B)=P(A)

P(A|A\cup B)>P(A)

Because the effect of conditioning on $(A\cup B)$ derives from the relative size of $P(A|A\cup B)$ and $P(A)$ the effect is particularly large when $A$ is rare ( $P(A)<<1$ ) but very strongly correlated to $B$ ( $P(A|B)\approx 1$ ). For example, consider the case below where N is very large:

	A	~A
B	1	0
~B	0	N

For the case without conditioning on $(A\cup B)$ we have

P(A)=1/(N+1)

P(A|B)=1

So A occurs rarely, unless B is present, when A occurs always. Thus B is dramatically increasing the likelihood of A.

For the case with conditioning on $(A\cup B)$ we have

P(A|A\cup B)=1

P(A|B,A\cup B)=P(A|B)=1

Now A occurs always, whether B is present or not. So B has no impact on the likelihood of A. Thus we see that for highly correlated data a huge positive correlation of B on A can be effectively removed when one conditions on $(A\cup B)$ .

Related Research Articles

In mathematics, a series is, roughly speaking, an addition of infinitely many terms, one after the other. The study of series is a major part of calculus and its generalization, mathematical analysis. Series are used in most areas of mathematics, even for studying finite structures in combinatorics through generating functions. The mathematical properties of infinite series make them widely applicable in other quantitative disciplines such as physics, computer science, statistics and finance.

The standard probability axioms are the foundations of probability theory introduced by Russian mathematician Andrey Kolmogorov in 1933. These axioms remain central and have direct contributions to mathematics, the physical sciences, and real-world probability cases.

Independence is a fundamental notion in probability theory, as in statistics and the theory of stochastic processes. Two events are independent, statistically independent, or stochastically independent if, informally speaking, the occurrence of one does not affect the probability of occurrence of the other or, equivalently, does not affect the odds. Similarly, two random variables are independent if the realization of one does not affect the probability distribution of the other.

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

Bayes' theorem gives a mathematical rule for inverting conditional probabilities, allowing one to find the probability of a cause given its effect. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to someone of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the person is typical of the population as a whole. Based on Bayes' law, both the prevalence of a disease in a given population and the error rate of an infectious disease test must be taken into account to evaluate the meaning of a positive test result and avoid the base-rate fallacy.

In probability theory, the birthday problem asks for the probability that, in a set of $n$ randomly chosen people, at least two will share the same birthday. The birthday paradox refers to the counterintuitive fact that only 23 people are needed for that probability to exceed 50%.

<span class="mw-page-title-main">Logit</span> Function in statistics

In statistics, the logit function is the quantile function associated with the standard logistic distribution. It has many uses in data analysis and machine learning, especially in data transformations.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B. Due to symmetry, odds ratio reciprocally calculates the ratio of the odds of B occurring in the presence of A, and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event occurring.

In probability theory, the conditional expectation, conditional expected value, or conditional mean of a random variable is its expected value evaluated with respect to the conditional probability distribution. If the random variable can take on only a finite number of values, the "conditions" are that the variable can only take on a subset of those values. More formally, in the case when the random variable is defined over a discrete probability space, the "conditions" are a partition of this probability space.

In probability theory, the Borel–Kolmogorov paradox is a paradox relating to conditional probability with respect to an event of probability zero. It is named after Émile Borel and Andrey Kolmogorov.

In probability theory and statistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given two jointly distributed random variables $and, the conditional probability distribution of given is the probability distribution of when is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value of as a parameter. When both and are categorical variables, a conditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with the marginal distribution of a random variable, which is its distribution without reference to the value of the other variable.$

Fisher's exact test is a statistical significance test used in the analysis of contingency tables. Although in practice it is employed when sample sizes are small, it is valid for all sample sizes. The test assumes that all row and column sums of the contingency table were fixed by design and tends to be conservative and underpowered outside of this setting. It is one of a class of exact tests, so called because the significance of the deviation from a null hypothesis can be calculated exactly, rather than relying on an approximation that becomes exact in the limit as the sample size grows to infinity, as with many statistical tests.

In probability theory, the joint probability distribution is the probability distribution of all possible pairs of outputs of two random variables that are defined on the same probability space. The joint distribution can just as well be considered for any given number of random variables. The joint distribution encodes the marginal distributions, i.e. the distributions of each of the individual random variables and the conditional probability distributions, which deal with how the outputs of one random variable are distributed when given information on the outputs of the other random variable(s).

The two envelopes problem, also known as the exchange paradox, is a paradox in probability theory. It is of special interest in decision theory and for the Bayesian interpretation of probability theory. It is a variant of an older problem known as the necktie paradox. The problem is typically introduced by formulating a hypothetical challenge like the following example:

Imagine you are given two identical envelopes, each containing money. One contains twice as much as the other. You may pick one envelope and keep the money it contains. Having chosen an envelope at will, but before inspecting it, you are given the chance to switch envelopes. Should you switch?

The Boy or Girl paradox surrounds a set of questions in probability theory, which are also known as The Two Child Problem, Mr. Smith's Children and the Mrs. Smith Problem. The initial formulation of the question dates back to at least 1959, when Martin Gardner featured it in his October 1959 "Mathematical Games column" in Scientific American. He titled it The Two Children Problem, and phrased the paradox as follows:

In probability theory, the craps principle is a theorem about event probabilities under repeated iid trials. Let $and denote two mutually exclusive events which might occur on a given trial. Then the probability that occurs before equals the conditional probability that occurs given that or occur on the next trial, which is$

Beliefs depend on the available information. This idea is formalized in probability theory by conditioning. Conditional probabilities, conditional expectations, and conditional probability distributions are treated on three levels: discrete probabilities, probability density functions, and measure theory. Conditioning leads to a non-random result if the condition is completely specified; otherwise, if the condition is left random, the result of conditioning is also random.

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$ ", or "the probability of $A$ under the condition $B$ ", is usually written as $P(A | B)$ or occasionally $P B (A)$ . This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): $.$

In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome such as case or control status while taking into account the stratification. Unlike the McNemar test, which can only handle pairs, the CMH test handles arbitrary strata sizes. It is named after William G. Cochran, Nathan Mantel and William Haenszel. Extensions of this test to a categorical response and/or to several groups are commonly called Cochran–Mantel–Haenszel statistics. It is often used in observational studies in which random assignment of subjects to different treatments cannot be controlled but confounding covariates can be measured.

In the mathematical theory of probability, David Lewis's triviality result is a theorem about the impossibility of systematically equating the conditional probability $with the probability of a so-called conditional event, .$

References

↑ Berkson, Joseph (June 1946). "Limitations of the Application of Fourfold Table Analysis to Hospital Data". Biometrics Bulletin . 2 (3): 47–53. doi:10.2307/3002000. JSTOR 3002000. PMID 21001024. (The paper is frequently miscited as Berkson, J. (1949) Biological Bulletin 2, 47–53.)
↑ Ellenberg, Jordan (2 October 2014). "Why Are Handsome Men Such Jerks?". Medium. Retrieved 16 January 2025.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Berkson, Joseph (June 1946). "Limitations of the Application of Fourfold Table Analysis to Hospital Data". Biometrics Bulletin . 2 (3): 47–53. doi:10.2307/3002000. JSTOR 3002000. PMID 21001024. (The paper is frequently miscited as Berkson, J. (1949) Biological Bulletin 2, 47–53.)

[2] Ellenberg, Jordan (2 October 2014). "Why Are Handsome Men Such Jerks?". Medium. Retrieved 16 January 2025.

[1]

[2]