In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition (such as a disease state) exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. [1] In medicine, likelihood ratios were introduced between 1975 and 1980. [2] [3] [4]
Two versions of the likelihood ratio exist, one for positive and one for negative test results. Respectively, they are known as the positive likelihood ratio (LR+, likelihood ratio positive, likelihood ratio for positive results) and negative likelihood ratio (LR–, likelihood ratio negative, likelihood ratio for negative results).
The positive likelihood ratio is calculated as
which is equivalent to
or "the probability of a person who has the disease testing positive divided by the probability of a person who does not have the disease testing positive." Here "T+" or "T−" denote that the result of the test is positive or negative, respectively. Likewise, "D+" or "D−" denote that the disease is present or absent, respectively. So "true positives" are those that test positive (T+) and have the disease (D+), and "false positives" are those that test positive (T+) but do not have the disease (D−).
The negative likelihood ratio is calculated as [5]
which is equivalent to [5]
or "the probability of a person who has the disease testing negative divided by the probability of a person who does not have the disease testing negative."
The calculation of likelihood ratios for tests with continuous values or more than two outcomes is similar to the calculation for dichotomous outcomes; a separate likelihood ratio is simply calculated for every level of test result and is called interval or stratum specific likelihood ratios. [6]
The pretest odds of a particular diagnosis, multiplied by the likelihood ratio, determines the post-test odds. This calculation is based on Bayes' theorem. (Note that odds can be calculated from, and then converted to, probability.)
Pretest probability refers to the chance that an individual in a given population has a disorder or condition; this is the baseline probability prior to the use of a diagnostic test. Post-test probability refers to the probability that a condition is truly present given a positive test result. For a good test in a population, the post-test probability will be meaningfully higher or lower than the pretest probability. A high likelihood ratio indicates a good test for a population, and a likelihood ratio close to one indicates that a test may not be appropriate for a population.
For a screening test, the population of interest might be the general population of an area. For diagnostic testing, the ordering clinician will have observed some symptom or other factor that raises the pretest probability relative to the general population. A likelihood ratio of greater than 1 for a test in a population indicates that a positive test result is evidence that a condition is present. If the likelihood ratio for a test in a population is not clearly better than one, the test will not provide good evidence: the post-test probability will not be meaningfully different from the pretest probability. Knowing or estimating the likelihood ratio for a test in a population allows a clinician to better interpret the result. [7]
Research suggests that physicians rarely make these calculations in practice, however, [8] and when they do, they often make errors. [9] A randomized controlled trial compared how well physicians interpreted diagnostic tests that were presented as either sensitivity and specificity, a likelihood ratio, or an inexact graphic of the likelihood ratio, found no difference between the three modes in interpretation of test results. [10]
This table provide examples of how changes in the likelihood ratio affects post-test probability of disease.
Likelihood ratio | Approximate* change in probability [11] | Effect on posttest Probability of disease [12] |
---|---|---|
Values between 0 and 1 decrease the probability of disease (−LR) | ||
0.1 | −45% | Large decrease |
0.2 | −30% | Moderate decrease |
0.5 | −15% | Slight decrease |
1 | −0% | None |
Values greater than 1 increase the probability of disease (+LR) | ||
1 | +0% | None |
2 | +15% | Slight increase |
5 | +30% | Moderate increase |
10 | +45% | Large increase |
*These estimates are accurate to within 10% of the calculated answer for all pre-test probabilities between 10% and 90%. The average error is only 4%. For polar extremes of pre-test probability >90% and <10%, see Estimation of pre- and post-test probability section below.
A medical example is the likelihood that a given test result would be expected in a patient with a certain disorder compared to the likelihood that same result would occur in a patient without the target disorder.
Some sources distinguish between LR+ and LR−. [13] A worked example is shown below.
Fecal occult blood screen test outcome | |||||
Total population (pop.) = 2030 | Test outcome positive | Test outcome negative | Accuracy (ACC) = (TP + TN) / pop. = (20 + 1820) / 2030 ≈90.64% | F1 score = 2 ×precision × recall/precision + recall ≈ 0.174 | |
Patients with bowel cancer (as confirmed on endoscopy) | Actual condition positive (AP) = 30 (2030 × 1.48%) | True positive (TP) = 20 (2030 × 1.48% × 67%) | False negative (FN) = 10 (2030 × 1.48% ×(100% − 67%)) | True positive rate (TPR), recall, sensitivity = TP / AP = 20 / 30 ≈66.7% | False negative rate (FNR), miss rate = FN / AP = 10 / 30 ≈33.3% |
Actual condition negative (AN) = 2000 (2030 ×(100% − 1.48%)) | False positive (FP) = 180 (2030 ×(100% − 1.48%)×(100% − 91%)) | True negative (TN) = 1820 (2030 ×(100% − 1.48%)× 91%) | False positive rate (FPR), fall-out, probability of false alarm = FP / AN = 180 / 2000 = 9.0% | Specificity, selectivity, true negative rate (TNR) = TN / AN = 1820 / 2000 = 91% | |
Prevalence = AP / pop. = 30 / 2030 ≈1.48% | Positive predictive value (PPV), precision = TP / (TP + FP) = 20 / (20 + 180) = 10% | False omission rate (FOR) = FN / (FN + TN) = 10 / (10 + 1820) ≈0.55% | Positive likelihood ratio (LR+) = TPR/FPR = (20 / 30) / (180 / 2000) ≈7.41 | Negative likelihood ratio (LR−) = FNR/TNR = (10 / 30) / (1820 / 2000) ≈0.366 | |
False discovery rate (FDR) = FP / (TP + FP) = 180 / (20 + 180) = 90.0% | Negative predictive value (NPV) = TN / (FN + TN) = 1820 / (10 + 1820) ≈99.45% | Diagnostic odds ratio (DOR) = LR+/LR− ≈20.2 |
Related calculations
This hypothetical screening test (fecal occult blood test) correctly identified two-thirds (66.7%) of patients with colorectal cancer. [a] Unfortunately, factoring in prevalence rates reveals that this hypothetical test has a high false positive rate, and it does not reliably identify colorectal cancer in the overall population of asymptomatic people (PPV = 10%).
On the other hand, this hypothetical test demonstrates very accurate detection of cancer-free individuals (NPV ≈ 99.5%). Therefore, when used for routine colorectal cancer screening with asymptomatic adults, a negative result supplies important data for the patient and doctor, such as ruling out cancer as the cause of gastrointestinal symptoms or reassuring patients worried about developing colorectal cancer.
Confidence intervals for all the predictive parameters involved can be calculated, giving the range of values within which the true value lies at a given confidence level (e.g. 95%). [16]
The likelihood ratio of a test provides a way to estimate the pre- and post-test probabilities of having a condition.
With pre-test probability and likelihood ratio given, then, the post-test probabilities can be calculated by the following three steps: [17]
In equation above, positive post-test probability is calculated using the likelihood ratio positive, and the negative post-test probability is calculated using the likelihood ratio negative.
Odds are converted to probabilities as follows: [18]
multiply equation (1) by (1 − probability)
add (probability × odds) to equation (2)
divide equation (3) by (1 + odds)
hence
Alternatively, post-test probability can be calculated directly from the pre-test probability and the likelihood ratio using the equation:
In fact, post-test probability, as estimated from the likelihood ratio and pre-test probability, is generally more accurate than if estimated from the positive predictive value of the test, if the tested individual has a different pre-test probability than what is the prevalence of that condition in the population.
Taking the medical example from above (20 true positives, 10 false negatives, and 2030 total patients), the positive pre-test probability is calculated as:
As demonstrated, the positive post-test probability is numerically equal to the positive predictive value; the negative post-test probability is numerically equal to (1 − negative predictive value).
A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.
In statistics, the likelihood-ratio test is a hypothesis test that involves comparing the goodness of fit of two competing statistical models, typically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the more constrained model is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.
Bayes' theorem gives a mathematical rule for inverting conditional probabilities, allowing us to find the probability of a cause given its effect. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the individual is typical of the population as a whole. Based on Bayes law both the prevalence of a disease in a given population and the error rate of an infectious disease test have to be taken into account to evaluate the meaning of a positive test result correctly and avoid the base-rate fallacy.
In probability theory, odds provide a measure of the probability of a particular outcome. Odds are commonly used in gambling and statistics. For example for an event that is 40% probable, one could say that the odds are "2 in 5","2 to 3 in favor", or "3 to 2 against".
In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.
In frequentist statistics, power is a measure of the ability of an experimental design and hypothesis testing setup to detect a particular effect if it is truly present. In typical use, it is a function of the test used, the assumed distribution of the test, and the effect size of interest. High statistical power is related to low variability, large sample sizes, large effects being looked for, and less stringent requirements for statistical significance.
An odds ratio (OR) is a statistic that quantifies the strength of the association between two events, A and B. The odds ratio is defined as the ratio of the odds of event A taking place in the presence of B, and the odds of A in the absence of B. Due to symmetry, odds ratio reciprocally calculates the ratio of the odds of B occurring in the presence of A, and the odds of B in the absence of A. Two events are independent if and only if the OR equals 1, i.e., the odds of one event are the same in either the presence or absence of the other event. If the OR is greater than 1, then A and B are associated (correlated) in the sense that, compared to the absence of B, the presence of B raises the odds of A, and symmetrically the presence of A raises the odds of B. Conversely, if the OR is less than 1, then A and B are negatively correlated, and the presence of one event reduces the odds of the other event occurring.
In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of one parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes are a complement tool for statistical hypothesis testing, and play an important role in power analyses to assess the sample size required for new experiments. Effect size are fundamental in meta-analyses which aim to provide the combined effect size based on data from multiple studies. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.
In healthcare, a differential diagnosis (DDx) is a method of analysis that distinguishes a particular disease or condition from others that present with similar clinical features. Differential diagnostic procedures are used by clinicians to diagnose the specific disease in a patient, or, at least, to consider any imminently life-threatening conditions. Often, each individual option of a possible disease is called a differential diagnosis.
The positive and negative predictive values are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measure. A high result can be interpreted as indicating the accuracy of such a statistic. The PPV and NPV are not intrinsic to the test ; they depend also on the prevalence. Both PPV and NPV can be derived using Bayes' theorem.
Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfect classification is one for which every element in the population is assigned to the class it really belongs to. The bayes classifier is the classifier which assigns classes optimally based on the known attributes of the elements to be classified.
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do not are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:
In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
Multinomial test is the statistical test of the null hypothesis that the parameters of a multinomial distribution equal specified values; it is used for categorical data.
Confusion of the inverse, also called the conditional probability fallacy or the inverse fallacy, is a logical fallacy whereupon a conditional probability is equated with its inverse; that is, given two events A and B, the probability of A happening given that B has happened is assumed to be about the same as the probability of B given A, when there is actually no evidence for this assumption. More formally, P(A|B) is assumed to be approximately equal to P(B|A).
Statistical proof is the rational demonstration of degree of certainty for a proposition, hypothesis or theory that is used to convince others subsequent to a statistical test of the supporting evidence and the types of inferences that can be drawn from the test scores. Statistical methods are used to increase the understanding of the facts and the proof demonstrates the validity and logic of inference with explicit reference to a hypothesis, the experimental data, the facts, the test, and the odds. Proof has two essential aims: the first is to convince and the second is to explain the proposition through peer and public review.
In statistics, when performing multiple comparisons, a false positive ratio is the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate is calculated as the ratio between the number of negative events wrongly categorized as positive and the total number of actual negative events.
Pre-test probability and post-test probability are the probabilities of the presence of a condition before and after a diagnostic test, respectively. Post-test probability, in turn, can be positive or negative, depending on whether the test falls out as a positive test or a negative test, respectively. In some cases, it is used for the probability of developing the condition of interest in the future.
In medical testing with binary classification, the diagnostic odds ratio (DOR) is a measure of the effectiveness of a diagnostic test. It is defined as the ratio of the odds of the test being positive if the subject has a disease relative to the odds of the test being positive if the subject does not have the disease.
Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake.