Rare disease assumption

Last updated

The rare disease assumption is a mathematical assumption in epidemiologic case-control studies where the hypothesis tests the association between an exposure and a disease. It is assumed that, if the prevalence of the disease is low, then the odds ratio (OR) approaches the relative risk (RR). The idea was first demonstrated by Jerome Cornfield. [1]

Contents

Case control studies are relatively inexpensive and less time-consuming than cohort studies.[ citation needed ] Since case control studies don't track patients over time, they can't establish relative risk. The case control study can, however, calculate the exposure-odds ratio, which, mathematically, is supposed to approach the relative risk as prevalence falls.

Sander Greenland showed that if the prevalence is 10% or less, the disease can be considered rare enough to allow the rare disease assumption. [2] Unfortunately, the magnitude of discrepancy between the odds ratio and the relative risk is dependent not only on the prevalence, but also, to a great degree, on two other factors. [3] [4] Thus, the reliance on the rare disease assumption when discussing odds ratios as risk should be explicitly stated and discussed.

Mathematical Proof

Risk Ratio vs Odds Ratio depending on the baseline risk (prevalence) in the low risk group. Risk Ratio vs Odds Ratio.svg
Risk Ratio vs Odds Ratio depending on the baseline risk (prevalence) in the low risk group.

The rare disease assumption can be demonstrated mathematically using the definitions for relative risk and odds ratio. [1]

Disease PositiveDisease Negative
Exposureab
No Exposurecd

With regards to the table above, [5]

and

As prevalence decreases, the number of positive cases decreases. As approaches 0, then and , individually, also approaches 0. In other words, as approaches 0,

.

Examples

The following example illustrates one of the problems, which occurs when the effects are large because the disease is common in the exposed or unexposed group. Consider the following contingency table.

Disease PositiveDisease Negative
Exposure46
No Exposure585

and

While the prevalence is only 9% (9/100), the odds ratio (OR) is equal to 11.3 and the relative risk (RR) is equal to 7.2. Despite fulfilling the rare disease assumption overall, the OR and RR can hardly be considered to be approximately the same. However, the prevalence in the exposed group is 40%, which means is not sufficiently small compared to and therefore .

Disease PositiveDisease Negative
Exposure496
No Exposure5895

and

With a prevalence of 0.9% (9/1000) and no changes to the effect size (same RR as above), estimates for RR and OR converge. Sometimes the prevalence threshold for which the rare disease assumption holds may be much lower.

Related Research Articles

<span class="mw-page-title-main">Epidemiology</span> Study of health and disease within a population

Epidemiology is the study and analysis of the distribution, patterns and determinants of health and disease conditions in a defined population.

In epidemiology, prevalence is the proportion of a particular population found to be affected by a medical condition at a specific time. It is derived by comparing the number of people found to have the condition with the total number of people studied and is usually expressed as a fraction, a percentage, or the number of cases per 10,000 or 100,000 people. Prevalence is most often used in questionnaire studies.

<span class="mw-page-title-main">Incidence (epidemiology)</span> Chance over time of a medical condition

In epidemiology, incidence is a measure of the probability of occurrence of a given medical condition in a population within a specified period of time. Although sometimes loosely expressed simply as the number of new cases during some time period, it is better expressed as a proportion or a rate with a denominator.

The science of epidemiology has matured significantly from the times of Hippocrates, Semmelweis and John Snow. The techniques for gathering and analyzing epidemiological data vary depending on the type of disease being monitored but each study will have overarching similarities.

In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions characterised by two distinct levels of a treatment variable of interest. For example, in a clinical study of a drug, the treated population may die at twice the rate per unit time of the control population. The hazard ratio would be 2, indicating a higher hazard of death from the treatment.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

A case–control study is a type of observational study in which two existing groups differing in outcome are identified and compared on the basis of some supposed causal attribute. Case–control studies are often used to identify factors that may contribute to a medical condition by comparing subjects who have the condition with patients who do not have the condition but are otherwise similar. They require fewer resources but provide less evidence for causal inference than a randomized controlled trial. A case–control study is often used to produce an odds ratio. Some statistical methods make it possible to use a case-control study to also estimate relative risk, risk differences, and other quantities.

In healthcare, a differential diagnosis (DDx) is a method of analysis of a patient's history and physical examination to arrive at the correct diagnosis. It involves distinguishing a particular disease or condition from others that present with similar clinical features. Differential diagnostic procedures are used by clinicians to diagnose the specific disease in a patient, or, at least, to consider any imminently life-threatening conditions. Often, each individual option of a possible disease is called a differential diagnosis.

In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. In medicine, likelihood ratios were introduced between 1975 and 1980.

In medical research, social science, and biology, a cross-sectional study is a type of observational study that analyzes data from a population, or a representative subset, at a specific point in time—that is, cross-sectional data.

<span class="mw-page-title-main">Relative risk</span> Measure of association used in epidemiology

The relative risk (RR) or risk ratio is the ratio of the probability of an outcome in an exposed group to the probability of an outcome in an unexposed group. Together with risk difference and odds ratio, relative risk measures the association between the exposure and the outcome.

<span class="mw-page-title-main">Risk difference</span>

The risk difference (RD), excess risk, or attributable risk is the difference between the risk of an outcome in the exposed group and the unexposed group. It is computed as , where is the incidence in the exposed group, and is the incidence in the unexposed group. If the risk of an outcome is increased by the exposure, the term absolute risk increase (ARI) is used, and computed as . Equivalently, if the risk of an outcome is decreased by the exposure, the term absolute risk reduction (ARR) is used, and computed as .

<span class="mw-page-title-main">Vaccine efficacy</span> Reduction of disease among the vaccinated comparing to the unvaccinated

Vaccine efficacy or vaccine effectiveness is the percentage reduction of disease cases in a vaccinated group of people compared to an unvaccinated group. For example, a vaccine efficacy or effectiveness of 80% indicates an 80% decrease in the number of disease cases among a group of vaccinated people compared to a group in which nobody was vaccinated. When a study is carried out using the most favorable, ideal or perfectly controlled conditions, such as those in a clinical trial, the term vaccine efficacy is used. On the other hand, when a study is carried out to show how well a vaccine works when they are used in a bigger, typical population under less-than-perfectly controlled conditions, the term vaccine effectiveness is used.

Population impact measures (PIMs) are biostatistical measures of risk and benefit used in epidemiological and public health research. They are used to describe the impact of health risks and benefits in a population, to inform health policy.

In statistics, the Cochran–Mantel–Haenszel test (CMH) is a test used in the analysis of stratified or matched categorical data. It allows an investigator to test the association between a binary predictor or treatment and a binary outcome such as case or control status while taking into account the stratification. Unlike the McNemar test, which can only handle pairs, the CMH test handles arbitrary strata size. It is named after William G. Cochran, Nathan Mantel and William Haenszel. Extensions of this test to a categorical response and/or to several groups are commonly called Cochran–Mantel–Haenszel statistics. It is often used in observational studies where random assignment of subjects to different treatments cannot be controlled, but confounding covariates can be measured.

Pre-test probability and post-test probability are the probabilities of the presence of a condition before and after a diagnostic test, respectively. Post-test probability, in turn, can be positive or negative, depending on whether the test falls out as a positive test or a negative test, respectively. In some cases, it is used for the probability of developing the condition of interest in the future.

<span class="mw-page-title-main">Diagnostic odds ratio</span>

In medical testing with binary classification, the diagnostic odds ratio (DOR) is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease.

Predictive genomics is at the intersection of multiple disciplines: predictive medicine, personal genomics and translational bioinformatics. Specifically, predictive genomics deals with the future phenotypic outcomes via prediction in areas such as complex multifactorial diseases in humans. To date, the success of predictive genomics has been dependent on the genetic framework underlying these applications, typically explored in genome-wide association (GWA) studies. The identification of associated single-nucleotide polymorphisms underpin GWA studies in complex diseases that have ranged from Type 2 Diabetes (T2D), Age-related macular degeneration (AMD) and Crohn's disease.

<span class="mw-page-title-main">Attributable fraction for the population</span> Epidemiology statistic

In epidemiology, attributable fraction for the population (AFp) is the proportion of incidents in the population that are attributable to the risk factor. The term attributable risk percent for the population is used if the fraction is expressed as a percentage. It is calculated as , where is the incidence in the population, and is the incidence in the unexposed group.

References

  1. 1 2 Cornfield, Jerome (1951-06-01). "A Method of Estimating Comparative Rates from Clinical Data. Applications to Cancer of the Lung, Breast, and Cervix". JNCI: Journal of the National Cancer Institute. doi:10.1093/jnci/11.6.1269. ISSN   1460-2105.
  2. Greenland, Sander; Thomas, D. C. (1982). "On the need for the rare disease assumption in case-control studies". American Journal of Epidemiology. 116 (3): 547–553. doi:10.1093/oxfordjournals.aje.a113439. ISSN   0002-9262. PMID   7124721.
  3. Greenland, S.; Thomas, D. C.; Morgenstern, H. (1986). "The rare-disease assumption revisited. A critique of "estimators of relative risk for case-control studies"". American Journal of Epidemiology. 124 (6): 869–883. doi:10.1093/oxfordjournals.aje.a114476. ISSN   0002-9262. PMID   3776970.
  4. Knol, Mirjam J.; Vandenbroucke, Jan P.; Scott, Pippa; Egger, Matthias (2008). "What Do Case-Control Studies Estimate? Survey of Methods and Assumptions in Published Case-Control Research". American Journal of Epidemiology. 168 (9): 1073–1081. doi: 10.1093/aje/kwn217 . PMID   18794220.
  5. Fletcher, Robert H. (8 January 2013). Clinical epidemiology : the essentials. Fletcher, Suzanne W.,, Fletcher, Grant S. (5th ed.). Philadelphia. ISBN   978-1-4698-2625-7. OCLC   859337100.{{cite book}}: CS1 maint: location missing publisher (link)