Base rate

Last updated

In probability and statistics, the base rate (also known as prior probabilities ) is the class of probabilities unconditional on "featural evidence" (likelihoods).

Contents

It is the proportion of individuals in a population who have a certain characteristic or trait. For example, if 1% of the population were medical professionals, and remaining 99% were not medical professionals, then the base rate of medical professionals is 1%. The method for integrating base rates and featural evidence is given by Bayes' rule.

In the sciences, including medicine, the base rate is critical for comparison. [1] In medicine a treatment's effectiveness is clear when the base rate is available. For example, if the control group, using no treatment at all, had their own base rate of 1/20 recoveries within 1 day and a treatment had a 1/100 base rate of recovery within 1 day, we see that the treatment actively decreases the recovery.

The base rate is an important concept in statistical inference, particularly in Bayesian statistics. [2] In Bayesian analysis, the base rate is combined with the observed data to update our belief about the probability of the characteristic or trait of interest. The updated probability is known as the posterior probability and is denoted as P(A|B), where B represents the observed data. For example, suppose we are interested in estimating the prevalence of a disease in a population. The base rate would be the proportion of individuals in the population who have the disease. If we observe a positive test result for a particular individual, we can use Bayesian analysis to update our belief about the probability that the individual has the disease. The updated probability would be a combination of the base rate and the likelihood of the test result given the disease status.

The base rate is also important in decision-making, particularly in situations where the cost of false positives and false negatives are different. [3] For example, in medical testing, a false negative (failing to diagnose a disease) could be much more costly than a false positive (incorrectly diagnosing a disease). In such cases, the base rate can help inform decisions about the appropriate threshold for a positive test result.

Base rate fallacy

Many psychological studies have examined a phenomenon called base-rate neglect or base rate fallacy , in which category base rates are not integrated with presented evidence in a normative manner, [4] although not all evidence is consistent regarding how common this fallacy is. [5] Mathematician Keith Devlin illustrates the risks as a hypothetical type of cancer that afflicts 1% of all people. Suppose a doctor then says there is a test for said cancer that is approximately 80% reliable, and that the test provides a positive result for 100% of people who have cancer, but it also results in a 'false positive' for 20% of people - who do not have cancer. Testing positive may therefore lead people to believe that it is 80% likely that they have cancer. Devlin explains that the odds are instead less than 5%. What is missing from these statistics is the relevant base rate information. The doctor should be asked, "Out of the number of people who test positive (base rate group), how many have cancer?" [6] In assessing the probability that a given individual is a member of a particular class, information other than the base rate needs to be accounted for, especially featural evidence. For example, when a person wearing a white doctor's coat and stethoscope is seen prescribing medication, there is evidence that allows for the conclusion that the probability of this particular individual being a medical professional is considerably more significant than the category base rate of 1%.[ citation needed ]

See also

Related Research Articles

A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than simply assuming that the individual is typical of the population as a whole.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

<span class="mw-page-title-main">Base rate fallacy</span> Error in thinking which involves under-valuing base rate information

The base rate fallacy, also called base rate neglect or base rate bias, is a type of fallacy in which people tend to ignore the base rate in favor of the individuating information . Base rate neglect is a specific form of the more general extension neglect.

A statistical syllogism is a non-deductive syllogism. It argues, using inductive reasoning, from a generalization true for the most part to a particular case.

In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. In medicine, likelihood ratios were introduced between 1975 and 1980.

Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfect classification is one for which every element in the population is assigned to the class it really belongs to. An imperfect classification is one in which some errors appear, and then statistical analysis must be applied to analyse the classification.

The use of evidence under Bayes' theorem relates to the probability of finding evidence in relation to the accused, where Bayes' theorem concerns the probability of an event and its inverse. Specifically, it compares the probability of finding particular evidence if the accused were guilty, versus if they were not guilty. An example would be the probability of finding a person's hair at the scene, if guilty, versus if just passing through the scene. Another issue would be finding a person's DNA where they lived, regardless of committing a crime there.

<span class="mw-page-title-main">Sensitivity and specificity</span> Statistical measures of the performance of a binary classification test

Sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a condition. If individuals who have the condition are considered "positive" and those who don't are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:

In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis, while a type II error is the failure to reject a null hypothesis that is actually false. Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science.

<span class="mw-page-title-main">Multiple comparisons problem</span> Statistical interpretation with many tests

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values.

Confusion of the inverse, also called the conditional probability fallacy or the inverse fallacy, is a logical fallacy whereupon a conditional probability is equated with its inverse; that is, given two events A and B, the probability of A happening given that B has happened is assumed to be about the same as the probability of B given A, when there is actually no evidence for this assumption. More formally, P(A|B) is assumed to be approximately equal to P(B|A).

<span class="mw-page-title-main">Conditional probability</span> Probability of an event occurring, given that another event has already occurred

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occurring with some sort of relationship with another event A. In this event, the event B can be analyzed by a conditional probability with respect to A. If the event of interest is A and the event B is known or assumed to have occurred, "the conditional probability of A given B", or "the probability of A under the condition B", is usually written as P(A|B) or occasionally PB(A). This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): .

Pre-test probability and post-test probability are the probabilities of the presence of a condition before and after a diagnostic test, respectively. Post-test probability, in turn, can be positive or negative, depending on whether the test falls out as a positive test or a negative test, respectively. In some cases, it is used for the probability of developing the condition of interest in the future.

The frequency format hypothesis is the idea that the brain understands and processes information better when presented in frequency formats rather than a numerical or probability format. Thus according to the hypothesis, presenting information as 1 in 5 people rather than 20% leads to better comprehension. The idea was proposed by German scientist Gerd Gigerenzer, after compilation and comparison of data collected between 1976 and 1997.

Misuse of p-values is common in scientific research and scientific education. p-values are often used or interpreted incorrectly; the American Statistical Association states that p-values can indicate how incompatible the data are with a specified statistical model. From a Neyman–Pearson hypothesis testing approach to statistical inferences, the data obtained by comparing the p-value to a significance level will yield one of two results: either the null hypothesis is rejected, or the null hypothesis cannot be rejected at that significance level. From a Fisherian statistical testing approach to statistical inferences, a low p-value means either that the null hypothesis is true and a highly improbable event has occurred or that the null hypothesis is false.

<span class="mw-page-title-main">Forensic epidemiology</span>

The discipline of forensic epidemiology (FE) is a hybrid of principles and practices common to both forensic medicine and epidemiology. FE is directed at filling the gap between clinical judgment and epidemiologic data for determinations of causality in civil lawsuits and criminal prosecution and defense.

Intuitive statistics, or folk statistics, refers to the cognitive phenomenon where organisms use data to make generalizations and predictions about the world. This can be a small amount of sample data or training instances, which in turn contribute to inductive inferences about either population-level properties, future data, or both. Inferences can involve revising hypotheses, or beliefs, in light of probabilistic data that inform and motivate future predictions. The informal tendency for cognitive animals to intuitively generate statistical inferences, when formalized with certain axioms of probability theory, constitutes statistics as an academic discipline.

References

  1. Stephanie (2015-08-18). "Base Rates and the Base Rate Fallacy: Definition, Examples". Statistics How To. Retrieved 2022-10-07.
  2. Birnbaum, Michael H. (Spring 1983). "Base Rates in Bayesian Inference: Signal Detection Analysis of the Cab Problem". The American Journal of Psychology. 96 (1): 85–94. doi:10.2307/1422211. JSTOR   1422211.
  3. Darling, John A.; Jerde, Christopher L.; Sepulveda, Adam J. (September 2021). "What do you mean by false positive?". Environmental DNA. 3 (5): 879–883. doi:10.1002/edn3.194. ISSN   2637-4943. PMC   8941663 . PMID   35330629.
  4. Bar-Hillel, Maya (1980). "The base-rate fallacy in probability judgments". Acta Psychologica. 44 (3): 211–233. doi:10.1016/0001-6918(80)90046-3.
  5. Koehler, Jonathan J. (1996). "The base rate fallacy reconsidered: Descriptive, normative, and methodological challenges". Behavioral and Brain Sciences. 19 (1): 1–17. doi:10.1017/S0140525X00041157. ISSN   0140-525X. S2CID   53343238.
  6. "Edge.org". Edge.org. Retrieved 2021-03-22.