Likelihood ratios in diagnostic testing

Last updated

In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. [1] In medicine, likelihood ratios were introduced between 1975 and 1980. [2] [3] [4]

Contents

Calculation

Two versions of the likelihood ratio exist, one for positive and one for negative test results. Respectively, they are known as the positive likelihood ratio (LR+, likelihood ratio positive, likelihood ratio for positive results) and negative likelihood ratio (LR–, likelihood ratio negative, likelihood ratio for negative results).

The positive likelihood ratio is calculated as

which is equivalent to

or "the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive." Here "T+" or "T" denote that the result of the test is positive or negative, respectively. Likewise, "D+" or "D" denote that the disease is present or absent, respectively. So "true positives" are those that test positive (T+) and have the disease (D+), and "false positives" are those that test positive (T+) but do not have the disease (D).

The negative likelihood ratio is calculated as [5]

which is equivalent to [5]

or "the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative."

The calculation of likelihood ratios for tests with continuous values or more than two outcomes is similar to the calculation for dichotomous outcomes; a separate likelihood ratio is simply calculated for every level of test result and is called interval or stratum specific likelihood ratios. [6]

The pretest odds of a particular diagnosis, multiplied by the likelihood ratio, determines the post-test odds. This calculation is based on Bayes' theorem. (Note that odds can be calculated from, and then converted to, probability.)

Application to medicine

Pretest probability refers to the chance that an individual in a given population has a disorder or condition; this is the baseline probability prior to the use of a diagnostic test. Post-test probability refers to the probability that a condition is truly present given a positive test result. For a good test in a population, the post-test probability will be meaningfully higher or lower than the pretest probability. A high likelihood ratio indicates a good test for a population, and a likelihood ratio close to one indicates that a test may not be appropriate for a population.

For a screening test, the population of interest might be the general population of an area. For diagnostic testing, the ordering clinician will have observed some symptom or other factor that raises the pretest probability relative to the general population. A likelihood ratio of greater than 1 for a test in a population indicates that a positive test result is evidence that a condition is present. If the likelihood ratio for a test in a population is not clearly better than one, the test will not provide good evidence: the post-test probability will not be meaningfully different from the pretest probability. Knowing or estimating the likelihood ratio for a test in a population allows a clinician to better interpret the result. [7]

Research suggests that physicians rarely make these calculations in practice, however, [8] and when they do, they often make errors. [9] A randomized controlled trial compared how well physicians interpreted diagnostic tests that were presented as either sensitivity and specificity, a likelihood ratio, or an inexact graphic of the likelihood ratio, found no difference between the three modes in interpretation of test results. [10]

Estimation table

This table provide examples of how changes in the likelihood ratio affects post-test probability of disease.

Likelihood ratioApproximate* change

in probability [11]

Effect on posttest

Probability of disease [12]

Values between 0 and 1 decrease the probability of disease (LR)
0.1−45%Large decrease
0.2−30%Moderate decrease
0.5−15%Slight decrease
1−0%None
Values greater than 1 increase the probability of disease (+LR)
1+0%None
2+15%Slight increase
5+30%Moderate increase
10+45%Large increase

*These estimates are accurate to within 10% of the calculated answer for all pre-test probabilities between 10% and 90%. The average error is only 4%. For polar extremes of pre-test probability >90% and <10%, see Estimation of pre- and post-test probability section below.

Estimation example

  1. Pre-test probability: For example, if about 2 out of every 5 patients with abdominal distension have ascites, then the pretest probability is 40%.
  2. Likelihood Ratio: An example "test" is that the physical exam finding of bulging flanks has a positive likelihood ratio of 2.0 for ascites.
  3. Estimated change in probability: Based on table above, a likelihood ratio of 2.0 corresponds to an approximately +15% increase in probability.
  4. Final (post-test) probability: Therefore, bulging flanks increases the probability of ascites from 40% to about 55% (i.e., 40% + 15% = 55%, which is within 2% of the exact probability of 57%).

Calculation example

A medical example is the likelihood that a given test result would be expected in a patient with a certain disorder compared to the likelihood that same result would occur in a patient without the target disorder.

Some sources distinguish between LR+ and LR. [13] A worked example is shown below.

A worked example
A diagnostic test with sensitivity 67% and specificity 91% is applied to 2030 people to look for a disorder with a population prevalence of 1.48%
Fecal occult blood screen test outcome
Total population
(pop.) = 2030
Test outcome positiveTest outcome negative Accuracy (ACC)
= (TP + TN) / pop.
= (20 + 1820) / 2030
90.64%
F1 score
= 2 ×precision × recall/precision + recall
0.174
Patients with
bowel cancer
(as confirmed
on endoscopy)
Actual condition
positive (AP)
= 30
(2030 × 1.48%)
True positive (TP)
= 20
(2030 × 1.48% × 67%)
False negative (FN)
= 10
(2030 × 1.48% ×(100% 67%))
True positive rate (TPR), recall, sensitivity
= TP / AP
= 20 / 30
66.7%
False negative rate (FNR), miss rate
= FN / AP
= 10 / 30
33.3%
Actual condition
negative (AN)
= 2000
(2030 ×(100% 1.48%))
False positive (FP)
= 180
(2030 ×(100% 1.48%)×(100% 91%))
True negative (TN)
= 1820
(2030 ×(100% 1.48%)× 91%)
False positive rate (FPR), fall-out, probability of false alarm
= FP / AN
= 180 / 2000
= 9.0%
Specificity, selectivity, true negative rate (TNR)
= TN / AN
= 1820 / 2000
= 91%
Prevalence
= AP / pop.
= 30 / 2030
1.48%
Positive predictive value (PPV), precision
= TP / (TP + FP)
= 20 / (20 + 180)
= 10%
False omission rate (FOR)
= FN / (FN + TN)
= 10 / (10 + 1820)
0.55%
Positive likelihood ratio (LR+)
= TPR/FPR
= (20 / 30) / (180 / 2000)
7.41
Negative likelihood ratio (LR)
= FNR/TNR
= (10 / 30) / (1820 / 2000)
0.366
False discovery rate (FDR)
= FP / (TP + FP)
= 180 / (20 + 180)
= 90.0%
Negative predictive value (NPV)
= TN / (FN + TN)
= 1820 / (10 + 1820)
99.45%
Diagnostic odds ratio (DOR)
= LR+/LR
20.2

Related calculations

This hypothetical screening test (fecal occult blood test) correctly identified two-thirds (66.7%) of patients with colorectal cancer. [lower-alpha 1] Unfortunately, factoring in prevalence rates reveals that this hypothetical test has a high false positive rate, and it does not reliably identify colorectal cancer in the overall population of asymptomatic people (PPV = 10%).

On the other hand, this hypothetical test demonstrates very accurate detection of cancer-free individuals (NPV  99.5%). Therefore, when used for routine colorectal cancer screening with asymptomatic adults, a negative result supplies important data for the patient and doctor, such as ruling out cancer as the cause of gastrointestinal symptoms or reassuring patients worried about developing colorectal cancer.

Confidence intervals for all the predictive parameters involved can be calculated, giving the range of values within which the true value lies at a given confidence level (e.g. 95%). [16]

Estimation of pre- and post-test probability

The likelihood ratio of a test provides a way to estimate the pre- and post-test probabilities of having a condition.

With pre-test probability and likelihood ratio given, then, the post-test probabilities can be calculated by the following three steps: [17]

In equation above, positive post-test probability is calculated using the likelihood ratio positive, and the negative post-test probability is calculated using the likelihood ratio negative.

Odds are converted to probabilities as follows: [18]

multiply equation (1) by (1 − probability)

add (probability × odds) to equation (2)

divide equation (3) by (1 + odds)

hence

Alternatively, post-test probability can be calculated directly from the pre-test probability and the likelihood ratio using the equation:

In fact, post-test probability, as estimated from the likelihood ratio and pre-test probability, is generally more accurate than if estimated from the positive predictive value of the test, if the tested individual has a different pre-test probability than what is the prevalence of that condition in the population.

Example

Taking the medical example from above (20 true positives, 10 false negatives, and 2030 total patients), the positive pre-test probability is calculated as:

As demonstrated, the positive post-test probability is numerically equal to the positive predictive value; the negative post-test probability is numerically equal to (1 − negative predictive value).

Notes

  1. There are advantages and disadvantages for all medical screening tests. Clinical practice guidelines, such as those for colorectal cancer screening, describe these risks and benefits. [14] [15]

Related Research Articles

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function is the probability of observing data assuming is the actual parameter.

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models, specifically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the individual is typical of the population as a whole.

In probability theory, odds provide a measure of the likelihood of a particular outcome. When specific events are equally likely, odds are calculated as the ratio of the number of events that produce that outcome to the number that do not. Odds are commonly used in gambling and statistics.

<span class="mw-page-title-main">Logistic regression</span> Statistical model for a binary dependent variable

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of A in the presence of B and the odds of A in the absence of B, or equivalently, the ratio of the odds of B in the presence of A and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event.

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

In healthcare, a differential diagnosis (DDx) is a method of analysis that distinguishes a particular disease or condition from others that present with similar clinical features. Differential diagnostic procedures are used by clinicians to diagnose the specific disease in a patient, or, at least, to consider any imminently life-threatening conditions. Often, each individual option of a possible disease is called a differential diagnosis.

<span class="mw-page-title-main">Positive and negative predictive values</span> In biostatistics, proportion of true positive and true negative results

The positive and negative predictive values are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measure. A high result can be interpreted as indicating the accuracy of such a statistic. The PPV and NPV are not intrinsic to the test ; they depend also on the prevalence. Both PPV and NPV can be derived using Bayes' theorem.

Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfect classification is one for which every element in the population is assigned to the class it really belongs to. The bayes classifier is the classifier which assigns classes optimally based on the known attributes of the elements to be classified.

<span class="mw-page-title-main">Sensitivity and specificity</span> Statistical measures of the performance of a binary classification test

In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do not are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:

<span class="mw-page-title-main">Precision and recall</span> Pattern-recognition performance metrics

In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

In statistics, the multinomial test is the test of the null hypothesis that the parameters of a multinomial distribution equal specified values; it is used for categorical data.

Confusion of the inverse, also called the conditional probability fallacy or the inverse fallacy, is a logical fallacy whereupon a conditional probability is equated with its inverse; that is, given two events A and B, the probability of A happening given that B has happened is assumed to be about the same as the probability of B given A, when there is actually no evidence for this assumption. More formally, P(A|B) is assumed to be approximately equal to P(B|A).

Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.

In statistics, when performing multiple comparisons, a false positive ratio is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as the ratio between the number of negative events wrongly categorized as positive and the total number of actual negative events.

Pre-test probability and post-test probability are the probabilities of the presence of a condition before and after a diagnostic test, respectively. Post-test probability, in turn, can be positive or negative, depending on whether the test falls out as a positive test or a negative test, respectively. In some cases, it is used for the probability of developing the condition of interest in the future.

<span class="mw-page-title-main">Diagnostic odds ratio</span>

In medical testing with binary classification, the diagnostic odds ratio (DOR) is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease.

<span class="mw-page-title-main">Evaluation of binary classifiers</span>

The evaluation of binary classifiers compares two methods of assigning a binary attribute, one of which is usually a standard method and the other is being investigated. There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence, and metrics that depend on the prevalence – both types are useful, but they have very different properties.

References

  1. Swets JA. (1973). "The relative operating characteristic in Psychology". Science. 182 (14116): 990–1000. Bibcode:1973Sci...182..990S. doi:10.1126/science.182.4116.990. PMID   17833780.
  2. Pauker SG, Kassirer JP (1975). "Therapeutic Decision Making: A Cost-Benefit Analysis". NEJM. 293 (5): 229–34. doi:10.1056/NEJM197507312930505. PMID   1143303.
  3. Thornbury JR, Fryback DG, Edwards W (1975). "Likelihood ratios as a measure of the diagnostic usefulness of excretory urogram information". Radiology. 114 (3): 561–5. doi:10.1148/114.3.561. PMID   1118556.
  4. van der Helm HJ, Hische EA (1979). "Application of Bayes's theorem to results of quantitative clinical chemical determinations". Clin Chem. 25 (6): 985–8. PMID   445835.
  5. 1 2 Gardner, M.; Altman, Douglas G. (2000). Statistics with confidence: confidence intervals and statistical guidelines. London: BMJ Books. ISBN   978-0-7279-1375-3.
  6. Brown MD, Reeves MJ (2003). "Evidence-based emergency medicine/skills for evidence-based emergency care. Interval likelihood ratios: another advantage for the evidence-based diagnostician". Ann Emerg Med. 42 (2): 292–297. doi: 10.1067/mem.2003.274 . PMID   12883521.
  7. Harrell F, Califf R, Pryor D, Lee K, Rosati R (1982). "Evaluating the Yield of Medical Tests". JAMA. 247 (18): 2543–2546. doi:10.1001/jama.247.18.2543. PMID   7069920.
  8. Reid MC, Lane DA, Feinstein AR (1998). "Academic calculations versus clinical judgments: practicing physicians' use of quantitative measures of test accuracy". Am. J. Med. 104 (4): 374–80. doi:10.1016/S0002-9343(98)00054-0. PMID   9576412.
  9. Steurer J, Fischer JE, Bachmann LM, Koller M, ter Riet G (2002). "Communicating accuracy of tests to general practitioners: a controlled study". The BMJ. 324 (7341): 824–6. doi:10.1136/bmj.324.7341.824. PMC   100792 . PMID   11934776.
  10. Puhan MA, Steurer J, Bachmann LM, ter Riet G (2005). "A randomized trial of ways to describe test accuracy: the effect on physicians' post-test probability estimates". Ann. Intern. Med. 143 (3): 184–9. doi:10.7326/0003-4819-143-3-200508020-00004. PMID   16061916.
  11. McGee, Steven (1 August 2002). "Simplifying likelihood ratios". Journal of General Internal Medicine. 17 (8): 647–650. doi:10.1046/j.1525-1497.2002.10750.x. ISSN   0884-8734. PMC   1495095 . PMID   12213147.
  12. Henderson, Mark C.; Tierney, Lawrence M.; Smetana, Gerald W. (2012). The Patient History (2nd ed.). McGraw-Hill. p. 30. ISBN   978-0-07-162494-7.
  13. "Likelihood ratios". Archived from the original on 20 August 2002. Retrieved 4 April 2009.
  14. Lin, Jennifer S.; Piper, Margaret A.; Perdue, Leslie A.; Rutter, Carolyn M.; Webber, Elizabeth M.; O’Connor, Elizabeth; Smith, Ning; Whitlock, Evelyn P. (21 June 2016). "Screening for Colorectal Cancer". JAMA. 315 (23): 2576–2594. doi:10.1001/jama.2016.3332. ISSN   0098-7484. PMID   27305422.
  15. Bénard, Florence; Barkun, Alan N.; Martel, Myriam; Renteln, Daniel von (7 January 2018). "Systematic review of colorectal cancer screening guidelines for average-risk adults: Summarizing the current global recommendations". World Journal of Gastroenterology. 24 (1): 124–138. doi: 10.3748/wjg.v24.i1.124 . PMC   5757117 . PMID   29358889.
  16. Online calculator of confidence intervals for predictive parameters
  17. Likelihood Ratios Archived 22 December 2010 at the Wayback Machine , from CEBM (Centre for Evidence-Based Medicine). Page last edited: 1 February 2009
  18. from Australian Bureau of Statistics: A Comparison of Volunteering Rates from the 2006 Census of Population and Housing and the 2006 General Social Survey, Jun 2012, Latest ISSUE Released at 11:30 AM (CANBERRA TIME) 08/06/2012
Medical likelihood ratio repositories