Birnbaum's Theorem is a pivotal result in the foundations of statistics, formulated by the American statistician Allan Birnbaum in 1962. The theorem formally demonstrates that the likelihood principle is logically equivalent to the combination of two more widely accepted statistical principles: the sufficiency principle and the conditionality principle.
The publication of the theorem in the Journal of the American Statistical Association was a landmark event that sparked intense debate between frequentist and Bayesian statisticians, as the likelihood principle implies that many standard frequentist methods (such as p-values and confidence intervals) violate basic axioms of consistency. [1]
Birnbaum's theorem concerns the "evidential meaning" of an experiment, denoted as , where is the experiment and is the observed data.
The sufficiency principle states that if is a sufficient statistic for a parameter , then the evidential meaning of the data is the same as the evidential meaning of the statistic . Formally:
This principle is widely accepted by almost all statistical schools of thought.
The conditionality principle states that if an experiment is chosen by a random mechanism (such as a coin flip) that does not depend on the parameter , then the evidence provided by the result depends only on the experiment actually performed. For example, if a researcher decides to perform either experiment or based on a fair coin toss, and is chosen, the evidence should not be affected by the fact that "could have" been performed.
The likelihood principle states that all the information about from an experiment is contained in the likelihood function . Two different experiments yielding the same likelihood function (up to a multiplicative constant) should result in the same inference about .
Birnbaum's theorem states:
The likelihood principle (L) is equivalent to the conjunction of the sufficiency principle (S) and the conditionality principle (C).
Symbolically:
The theorem is considered a paradox by many frequentists. While (S) and (C) are viewed as intuitively obvious and "safe" principles of scientific practice, their logical consequence (L) invalidates most frequentist techniques. For instance, (L) implies that the stopping rule of an experiment (for example, whether a researcher decided to stop after 10 trials or after seeing 3 successes) should not affect the final inference—a direct contradiction to how p-values are calculated. [2]
Following Birnbaum's original paper, several statisticians challenged the proof.