Rare disease assumption

Last updated October 11, 2025

The rare disease assumption is a mathematical assumption in epidemiologic case-control studies where the hypothesis tests the association between an exposure and a disease. It is assumed that, if the prevalence of the disease is low, then the odds ratio (OR) approaches the relative risk (RR). The idea was first demonstrated by Jerome Cornfield.^[1]

Case control studies are relatively inexpensive and less time-consuming than cohort studies.^{[ citation needed ]} Since case control studies don't track patients over time, they can't establish relative risk. The case control study can, however, calculate the exposure-odds ratio, which, mathematically, is supposed to approach the relative risk as prevalence falls.

Sander Greenland showed that if the prevalence is 10% or less, the disease can be considered rare enough to allow the rare disease assumption.^[2] Unfortunately, the magnitude of discrepancy between the odds ratio and the relative risk is dependent not only on the prevalence, but also, to a great degree, on two other factors.^[3]^[4] Thus, the reliance on the rare disease assumption when discussing odds ratios as risk should be explicitly stated and discussed.

Mathematical proof

The rare disease assumption can be demonstrated mathematically using the definitions for relative risk and odds ratio.^[1]

	Disease Positive	Disease Negative
Exposure	a	b
No Exposure	c	d

With regards to the table above,^[5]

$RelativeRisk={a/(a+b) \over c/(c+d)}$ and $OddsRatio={{a/(a+c) \over c/(a+c)} \over {b/(b+d) \over d/(b+d)}}={a/c \over b/d}={ad \over bc}$

As prevalence decreases, the number of positive cases $(a+c)$ decreases. As $(a+c)$ approaches 0, then $a$ and $c$ , individually, also approaches 0. In other words, as $(a+c)$ approaches 0,

$RelativeRisk={a/(a+b) \over c/(c+d)}\approx {a/(0+b) \over c/(0+d)}={a/b \over c/d}={ad \over bc}=OddsRatio$ .

Examples

The following example illustrates one of the problems, which occurs when the effects are large because the disease is common in the exposed or unexposed group. Consider the following contingency table.

	Disease Positive	Disease Negative
Exposure	4	6
No Exposure	5	85

$RR={4/(4+6) \over 5/(5+85)}=7.2$ and $OR={4/6 \over 5/85}=11.3$

While the prevalence is only 9% (9/100), the odds ratio (OR) is equal to 11.3 and the relative risk (RR) is equal to 7.2. Despite fulfilling the rare disease assumption overall, the OR and RR can hardly be considered to be approximately the same. However, the prevalence in the exposed group is 40%, which means $a$ is not sufficiently small compared to $b$ and therefore $b\not \approx (a+b)$ .

	Disease Positive	Disease Negative
Exposure	4	96
No Exposure	5	895

$RR={4/(4+96) \over 5/(5+895)}=7.2$ and $OR={4/96 \over 5/895}=7.46$

With a prevalence of 0.9% (9/1000) and no changes to the effect size (same RR as above), estimates for RR and OR converge. Sometimes the prevalence threshold for which the rare disease assumption holds may be much lower.

References

1 2 Cornfield, Jerome (1951-06-01). "A Method of Estimating Comparative Rates from Clinical Data. Applications to Cancer of the Lung, Breast, and Cervix" . JNCI: Journal of the National Cancer Institute. doi:10.1093/jnci/11.6.1269. ISSN 1460-2105.
↑ Greenland, Sander; Thomas, D. C. (1982). "On the need for the rare disease assumption in case-control studies". American Journal of Epidemiology. 116 (3): 547–553. doi:10.1093/oxfordjournals.aje.a113439. ISSN 0002-9262. PMID 7124721.
↑ Greenland, S.; Thomas, D. C.; Morgenstern, H. (1986). "The rare-disease assumption revisited. A critique of "estimators of relative risk for case-control studies"". American Journal of Epidemiology. 124 (6): 869–883. doi:10.1093/oxfordjournals.aje.a114476. ISSN 0002-9262. PMID 3776970.
↑ Knol, Mirjam J.; Vandenbroucke, Jan P.; Scott, Pippa; Egger, Matthias (2008). "What Do Case-Control Studies Estimate? Survey of Methods and Assumptions in Published Case-Control Research". American Journal of Epidemiology. 168 (9): 1073–1081. doi: 10.1093/aje/kwn217 . PMID 18794220.
↑ Fletcher, Robert H. (8 January 2013). Clinical epidemiology : the essentials. Fletcher, Suzanne W.,, Fletcher, Grant S. (5th ed.). Philadelphia. ISBN 978-1-4698-2625-7. OCLC 859337100.{{cite book}}: CS1 maint: location missing publisher (link)

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[:0-1] 1 2 Cornfield, Jerome (1951-06-01). "A Method of Estimating Comparative Rates from Clinical Data. Applications to Cancer of the Lung, Breast, and Cervix" . JNCI: Journal of the National Cancer Institute. doi:10.1093/jnci/11.6.1269. ISSN 1460-2105.

[2] Greenland, Sander; Thomas, D. C. (1982). "On the need for the rare disease assumption in case-control studies". American Journal of Epidemiology. 116 (3): 547–553. doi:10.1093/oxfordjournals.aje.a113439. ISSN 0002-9262. PMID 7124721.

[3] Greenland, S.; Thomas, D. C.; Morgenstern, H. (1986). "The rare-disease assumption revisited. A critique of "estimators of relative risk for case-control studies"". American Journal of Epidemiology. 124 (6): 869–883. doi:10.1093/oxfordjournals.aje.a114476. ISSN 0002-9262. PMID 3776970.

[4] Knol, Mirjam J.; Vandenbroucke, Jan P.; Scott, Pippa; Egger, Matthias (2008). "What Do Case-Control Studies Estimate? Survey of Methods and Assumptions in Published Case-Control Research". American Journal of Epidemiology. 168 (9): 1073–1081. doi: 10.1093/aje/kwn217 . PMID 18794220.

[5] Fletcher, Robert H. (8 January 2013). Clinical epidemiology : the essentials. Fletcher, Suzanne W.,, Fletcher, Grant S. (5th ed.). Philadelphia. ISBN 978-1-4698-2625-7. OCLC 859337100.{{cite book}}: CS1 maint: location missing publisher (link)

[1]

[2]

[3]

[4]

[5]

Rare disease assumption

Contents

Mathematical proof

Examples

References