Net benefit | |
---|---|
Net benefit is calculated as a weighted combination of true and false positives, where is the threshold probability, true and false positives are count variables and N is the total number of observations. |
Decision curve analysis evaluates a predictor for an event as a probability threshold is varied, typically by showing a graphical plot of net benefit against threshold probability. By convention, the default strategies of assuming that all or no observations are positive are also plotted.
Decision curve analysis is distinguished from other statistical methods like receiver operating characteristic (ROC) curves by the ability to assess the clinical value of a predictor. Applying decision curve analysis can determine whether using a predictor to make clinical decisions like performing biopsy will provide benefit over alternative decision criteria, given a specified threshold probability.
Threshold probability is defined as the minimum probability of an event at which a decision-maker would take a given action, for instance, the probability of cancer at which a doctor would order a biopsy. A lower threshold probability implies a greater concern about the event (e.g. a patient worried about cancer), while a higher threshold implies greater concern about the action to be taken (e.g. a patient averse to the biopsy procedure). Net benefit is a weighted combination of true and false positives, where the weight is derived from the threshold probability. The predictor could be a binary classifier, or a percentage risk from a prediction model, in which case a positive classification is defined by whether predicted probability is at least as great as the threshold probability.
The threshold probability compares the relative harm of unnecessary treatment (false positives) to the benefit of indicated treatment (true positives). The use of threshold probability to weight true and false positives derives from decision theory, in which the expected value of a decision can be calculated from the utilities and probabilities associated with decision outcomes. In the case of predicting an event, there are four possible outcomes: true positive, true negative, false positive and false negative. This means that to conduct a decision analysis, the analyst must specify four different utilities, which is often challenging. In decision curve analysis, the strategy of considering all observations as negative is defined as having a value of zero. This means that only true positives (event identified and appropriately managed) and false positives (unnecessary action) are considered. [1] Furthermore, it is easily shown that the ratio of the utility of a true positive vs. the utility of avoiding a false positive is the odds at the threshold probability. [2] For instance, a doctor whose threshold probability to order a biopsy for cancer is 10% believes that the utility of finding cancer early is 9 times greater than that of avoiding the harm of unnecessary biopsy. Similarly to the calculation of expected value, weighting false positive outcomes by the threshold probability yields an estimate of net benefit that incorporates decision consequences and preferences. [3]
A decision curve analysis graph is drawn by plotting threshold probability on the horizontal axis and net benefit on the vertical axis, illustrating the trade-offs between benefit (true positives) and harm (false positives) as the threshold probability (preference) is varied across a range of reasonable threshold probabilities. [2]
The calculation of net benefit from true positives and false positives is analogous to profit. Consider a wine importer who pays €1m to buy wine in France and sells it for $1.5m in the United States. To calculate the profit, an exchange rate between euros and dollars must be used to put cost and revenue on the same scale. Similarly, the costs (false positives) and revenue (true positives) of the predictor must be compared on the same scale to calculate net benefit. The factor expresses the relative harms and benefits of the different clinical consequences of a decision and is therefore used as the exchange rate in net benefit.
The figure gives a hypothetical example of biopsy for cancer. Given the relative benefits and harms of cancer early detection and avoidable biopsy, we would consider it unreasonable to opt for a biopsy if the risk of cancer was less than 5% or, alternatively, to refuse biopsy if given a risk of more than 25%. Hence the best strategy is that with the highest net benefit across the range of threshold probabilities between 5 – 25%, in this case, model A. If no strategy has highest net benefit across the full range, that is, if the decision curves cross, then the decision curve analysis is equivocal. [4]
The default strategies of assuming all or no observations are positive are often interpreted as “Treat all” (or “Intervention for all”) and “Treat none” (or “Intervention for none”) respectively. The curve for “Treat none” is fixed at a net benefit of 0. The curve for “Treat all” crosses the both axes at the event prevalence. [2]
Net benefit on the vertical axis is expressed in units of true positives per person. [4] For instance, a difference in net benefit of 0.025 at a given threshold probability between two predictors of cancer, Model A and Model B, could be interpreted as “using Model A instead of Model B to order biopsies increases the number of cancers detected by 25 per 1000 patients, without changing the number of unnecessary biopsies.”
Additional resources and a complete tutorial for decision curve analysis are available at decisioncurveanalysis.org.
A decision tree is a decision support recursive partitioning structure that uses a tree-like model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm that only contains conditional control statements.
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one; in unsupervised learning it is usually called a matching matrix.
A receiver operating characteristic curve, or ROC curve, is a graphical plot that illustrates the performance of a binary classifier model at varying threshold values.
In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. In medicine, likelihood ratios were introduced between 1975 and 1980.
The positive and negative predictive values are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measure. A high result can be interpreted as indicating the accuracy of such a statistic. The PPV and NPV are not intrinsic to the test ; they depend also on the prevalence. Both PPV and NPV can be derived using Bayes' theorem.
Prostate cancer screening is the screening process used to detect undiagnosed prostate cancer in men without signs or symptoms. When abnormal prostate tissue or cancer is found early, it may be easier to treat and cure, but it is unclear if early detection reduces mortality rates.
In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do not are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:
Medical statistics deals with applications of statistics to medicine and the health sciences, including epidemiology, public health, forensic medicine, and clinical research. Medical statistics has been a recognized branch of statistics in the United Kingdom for more than 40 years, but the term has not come into general use in North America, where the wider term 'biostatistics' is more commonly used. However, "biostatistics" more commonly connotes all applications of statistics to biology. Medical statistics is a subdiscipline of statistics.
It is the science of summarizing, collecting, presenting and interpreting data in medical practice, and using them to estimate the magnitude of associations and test hypotheses. It has a central role in medical investigations. It not only provides a way of organizing information on a wider and more formal basis than relying on the exchange of anecdotes and personal experience, but also takes into account the intrinsic variation inherent in most biological processes.
A medical test is a medical procedure performed to detect, diagnose, or monitor diseases, disease processes, susceptibility, or to determine a course of treatment. Medical tests such as, physical and visual exams, diagnostic imaging, genetic testing, chemical and cellular analysis, relating to clinical chemistry and molecular diagnostics, are typically performed in a medical setting.
Breast cancer screening is the medical screening of asymptomatic, apparently healthy women for breast cancer in an attempt to achieve an earlier diagnosis. The assumption is that early detection will improve outcomes. A number of screening tests have been employed, including clinical and self breast exams, mammography, genetic screening, ultrasound, and magnetic resonance imaging.
In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
Pre-test probability and post-test probability are the probabilities of the presence of a condition before and after a diagnostic test, respectively. Post-test probability, in turn, can be positive or negative, depending on whether the test falls out as a positive test or a negative test, respectively. In some cases, it is used for the probability of developing the condition of interest in the future.
Active surveillance is a management option for localized prostate cancer that can be offered to appropriate patients who would also be candidates for aggressive local therapies, with the intent to intervene if the disease progresses. Active surveillance should not be confused with watchful waiting, another observational strategy for men that would not be candidates for curative therapy because of a limited life expectancy.
Evaluation of a binary classifier typically assigns a numerical value, or values, to a classifier that represent its accuracy. An example is error rate, which measures how frequently the classifier makes a mistake.
A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition, while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result. They are also known in medicine as a false positivediagnosis, and in statistical classification as a false positiveerror.
The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.
Andrew Julian Vickers is a biostatistician and attending research methodologist at Memorial Sloan Kettering Cancer Center. Since 2013, he has also been professor of public health at Weill Cornell Medical College. He is the statistical editor for the peer-reviewed journal European Urology.
The total operating characteristic (TOC) is a statistical method to compare a Boolean variable versus a rank variable. TOC can measure the ability of an index variable to diagnose either presence or absence of a characteristic. The diagnosis of presence or absence depends on whether the value of the index is above a threshold. TOC considers multiple possible thresholds. Each threshold generates a two-by-two contingency table, which contains four entries: hits, misses, false alarms, and correct rejections.
Ewout W. Steyerberg is Professor of Clinical Biostatistics and Medical Decision Making at Leiden University Medical Center and a Professor of Medical Decision Making at Erasmus MC. He is interested in a wide range of statistical methods for medical research, but is mainly known for his seminal work on prediction modeling, which was stimulated by various research grants including a fellowship from the Royal Netherlands Academy of Arts and Sciences (KNAW). Steyerberg is one of the most cited researchers from the Netherlands. He has published over 1000 peer-reviewed articles according to PubMed, many in collaboration with clinical researchers, both in methodological and medical journals. His h-index exceeds 150 according to Google Scholar.
Laure Wynants is a Belgian epidemiologist who is a professor at Maastricht University. She studies prediction models in medicine and hospital acquired infections.