Confusion of the inverse

Last updated April 14, 2024

Confusion of the inverse, also called the conditional probability fallacy or the inverse fallacy, is a logical fallacy whereupon a conditional probability is equated with its inverse; that is, given two events A and B, the probability of A happening given that B has happened is assumed to be about the same as the probability of B given A, when there is actually no evidence for this assumption.^[1]^[2] More formally, P(A|B) is assumed to be approximately equal to P(B|A).

Examples

Example 1

Relative size	Malignant	Benign	Total
Test positive	0.8 (true positive)	9.9 (false positive)	10.7
Test negative	0.2 (false negative)	89.1 (true negative)	89.3
Total	1	99	100

In one study, physicians were asked to give the chances of malignancy with a 1% prior probability of occurring. A test can detect 80% of malignancies and has a 10% false positive rate. What is the probability of malignancy given a positive test result?^[3] Approximately 95 out of 100 physicians responded the probability of malignancy would be about 75%, apparently because the physicians believed that the chances of malignancy given a positive test result were approximately the same as the chances of a positive test result given malignancy.^[4]

The correct probability of malignancy given a positive test result as stated above is 7.5%, derived via Bayes' theorem:

{\begin{aligned}&{}\qquad P({\text{malignant}}|{\text{positive}})\\[8pt]&={\frac {P({\text{positive}}|{\text{malignant}})P({\text{malignant}})}{P({\text{positive}}|{\text{malignant}})P({\text{malignant}})+P({\text{positive}}|{\text{benign}})P({\text{benign}})}}\\[8pt]&={\frac {(0.80\cdot 0.01)}{(0.80\cdot 0.01)+(0.10\cdot 0.99)}}=0.075\end{aligned}}

Other examples of confusion include:

Hard drug users tend to use marijuana; therefore, marijuana users tend to use hard drugs (the first probability is marijuana use given hard drug use, the second is hard drug use given marijuana use).^[5]
Most accidents occur within 25 miles from home; therefore, you are safest when you are far from home.^[5]
Terrorists tend to have an engineering background; so, engineers have a tendency towards terrorism.^[6]

For other errors in conditional probability, see the Monty Hall problem and the base rate fallacy. Compare to illicit conversion.

Example 2

Relative size (%)	Ill	Well	Total
Test positive	0.99 (true positive)	0.99 (false positive)	1.98
Test negative	0.01 (false negative)	98.01 (true negative)	98.02
Total	1	99	100

In order to identify individuals having a serious disease in an early curable form, one may consider screening a large group of people. While the benefits are obvious, an argument against such screenings is the disturbance caused by false positive screening results: If a person not having the disease is incorrectly found to have it by the initial test, they will most likely be distressed, and even if they subsequently take a more careful test and are told they are well, their lives may still be affected negatively. If they undertake unnecessary treatment for the disease, they may be harmed by the treatment's side effects and costs.

The magnitude of this problem is best understood in terms of conditional probabilities.

Suppose 1% of the group suffer from the disease, and the rest are well. Choosing an individual at random,

P({\text{ill}})=1\%=0.01{\text{ and }}P({\text{well}})=99\%=0.99.

Suppose that when the screening test is applied to a person not having the disease, there is a 1% chance of getting a false positive result (and hence 99% chance of getting a true negative result, a number known as the specificity of the test), i.e.

P({\text{positive}}|{\text{well}})=1\%,{\text{ and }}P({\text{negative}}|{\text{well}})=99\%.

Finally, suppose that when the test is applied to a person having the disease, there is a 1% chance of a false negative result (and 99% chance of getting a true positive result, known as the sensitivity of the test), i.e.

P({\text{negative}}|{\text{ill}})=1\%{\text{ and }}P({\text{positive}}|{\text{ill}})=99\%.

Calculations

The fraction of individuals in the whole group who are well and test negative (true negative):

P({\text{well}}\cap {\text{negative}})=P({\text{well}})\times P({\text{negative}}|{\text{well}})=99\%\times 99\%=98.01\%.

The fraction of individuals in the whole group who are ill and test positive (true positive):

P({\text{ill}}\cap {\text{positive}})=P({\text{ill}})\times P({\text{positive}}|{\text{ill}})=1\%\times 99\%=0.99\%.

The fraction of individuals in the whole group who have false positive results:

P({\text{well}}\cap {\text{positive}})=P({\text{well}})\times P({\text{positive}}|{\text{well}})=99\%\times 1\%=0.99\%.

The fraction of individuals in the whole group who have false negative results:

P({\text{ill}}\cap {\text{negative}})=P({\text{ill}})\times P({\text{negative}}|{\text{ill}})=1\%\times 1\%=0.01\%.

Furthermore, the fraction of individuals in the whole group who test positive:

{\begin{aligned}P({\text{positive}})&{}=P({\text{well }}\cap {\text{ positive}})+P({\text{ill}}\cap {\text{positive}})\\&{}=0.99\%+0.99\%=1.98\%.\end{aligned}}

Finally, the probability that an individual actually has the disease, given that the test result is positive:

P({\text{ill}}|{\text{positive}})={\frac {P({\text{ill}}\cap {\text{positive}})}{P({\text{positive}})}}={\frac {0.99\%}{1.98\%}}=50\%.

Conclusion

In this example, it should be easy to relate to the difference between the conditional probabilities P(positive | ill) which with the assumed probabilities is 99%, and P(ill | positive) which is 50%: the first is the probability that an individual who has the disease tests positive; the second is the probability that an individual who tests positive actually has the disease. Thus, with the probabilities picked in this example, roughly the same number of individuals receive the benefits of early treatment as are distressed by false positives; these positive and negative effects can then be considered in deciding whether to carry out the screening, or if possible whether to adjust the test criteria to decrease the number of false positives (possibly at the expense of more false negatives).

Related Research Articles

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the individual is typical of the population as a whole.

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by $, and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.$

The base rate fallacy, also called base rate neglect or base rate bias, is a type of fallacy in which people tend to ignore the base rate in favor of the individuating information. Base rate neglect is a specific form of the more general extension neglect.

In healthcare, a differential diagnosis (DDx) is a method of analysis that distinguishes a particular disease or condition from others that present with similar clinical features. Differential diagnostic procedures are used by clinicians to diagnose the specific disease in a patient, or, at least, to consider any imminently life-threatening conditions. Often, each individual option of a possible disease is called a differential diagnosis.

In evidence-based medicine, likelihood ratios are used for assessing the value of performing a diagnostic test. They use the sensitivity and specificity of the test to determine whether a test result usefully changes the probability that a condition exists. The first description of the use of likelihood ratios for decision rules was made at a symposium on information theory in 1954. In medicine, likelihood ratios were introduced between 1975 and 1980.

Berkson's paradox, also known as Berkson's bias, collider bias, or Berkson's fallacy, is a result in conditional probability and statistics which is often found to be counterintuitive, and hence a veridical paradox. It is a complicating factor arising in statistical tests of proportions. Specifically, it arises when there is an ascertainment bias inherent in a study design. The effect is related to the explaining away phenomenon in Bayesian networks, and conditioning on a collider in graphical models.

The positive and negative predictive values are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measure. A high result can be interpreted as indicating the accuracy of such a statistic. The PPV and NPV are not intrinsic to the test ; they depend also on the prevalence. Both PPV and NPV can be derived using Bayes' theorem.

Given a population whose members each belong to one of a number of different sets or classes, a classification rule or classifier is a procedure by which the elements of the population set are each predicted to belong to one of the classes. A perfect classification is one for which every element in the population is assigned to the class it really belongs to. The bayes classifier is the classifier which assigns classes optimally based on the known attributes of the elements to be classified.

In medicine and statistics, sensitivity and specificity mathematically describe the accuracy of a test that reports the presence or absence of a medical condition. If individuals who have the condition are considered "positive" and those who do not are considered "negative", then sensitivity is a measure of how well a test can identify true positives and specificity is a measure of how well a test can identify true negatives:

In statistical hypothesis testing, a type I error, or a false positive, is the rejection of the null hypothesis when it is actually true. For example, an innocent person may be convicted. A type II error, or a false negative, is the failure to reject a null hypothesis that is actually false. For example: a guilty person may be not convicted.

In statistics, the multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or estimates a subset of parameters selected based on the observed values.

In probability and statistics, the base rate is the class of probabilities unconditional on "featural evidence" (likelihoods).

Youden's J statistic is a single statistic that captures the performance of a dichotomous diagnostic test. (Bookmaker) Informedness is its generalization to the multiclass case and estimates the probability of an informed decision.

<span class="mw-page-title-main">Precision and recall</span> Pattern-recognition performance metrics

In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$ ", or "the probability of $A$ under the condition $B$ ", is usually written as $P(A | B)$ or occasionally $P B (A)$ . This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): $.$

<span class="mw-page-title-main">Evaluation of binary classifiers</span>

The evaluation of binary classifiers compares two methods of assigning a binary attribute, one of which is usually a standard method and the other is being investigated. There are many metrics that can be used to measure the performance of a classifier or predictor; different fields have different preferences for specific metrics due to different goals. For example, in medicine sensitivity and specificity are often used, while in computer science precision and recall are preferred. An important distinction is between metrics that are independent on the prevalence, and metrics that depend on the prevalence – both types are useful, but they have very different properties.

A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition, while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result. They are also known in medicine as a false positivediagnosis, and in statistical classification as a false positiveerror.

Jurimetrics is the application of quantitative methods, and often especially probability and statistics, to law. In the United States, the journal Jurimetrics is published by the American Bar Association and Arizona State University. The Journal of Empirical Legal Studies is another publication that emphasizes the statistical analysis of law.

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. For example gender, ethnicity, sexual orientation or disability. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumers.

P₄ metric enables performance evaluation of the binary classifier. It is calculated from precision, recall, specificity and NPV (negative predictive value). P₄ is designed in similar way to F₁ metric, however addressing the criticisms leveled against F₁. It may be perceived as its extension.

References

↑ Plous, Scott (1993). The Psychology of Judgment and Decisionmaking. pp. 131–134. ISBN 978-0-07-050477-6.^{[ full citation needed ]}
↑ Villejoubert, Gaëlle; Mandel, David (2002). "The inverse fallacy: An account of deviations from Bayes's Theorem and the additivity principle". Memory & Cognition. 30 (5): 171–178. doi: 10.3758/BF03195278 . PMID 12035879.
↑ Eddy, David M. (1982). "Probabilistic reasoning in clinical medicine: Problems and opportunities". In Kahneman, D.; Slovic, P.; Tversky, A. (eds.). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. pp. 249–267. ISBN 0-521-24064-6. Description simplified as in Plous (1993).
↑ Eddy (1982 , p. 253). "Unfortunately, most physicians (approximately 95 out of 100 in an informal sample taken by the author) misinterpret the statements about the accuracy of the test and estimate P(ca|pos) to be about 75%."
1 2 Hastie, Reid; Robyn Dawes (2001). Rational Choice in an Uncertain World. pp. 122–123. ISBN 978-0-7619-2275-9.^{[ full citation needed ]}
↑ see "Engineers make good terrorists?". Slashdot. 2008-04-03. Retrieved 2008-04-25.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Plous, Scott (1993). The Psychology of Judgment and Decisionmaking. pp. 131–134. ISBN 978-0-07-050477-6.^{[ full citation needed ]}

[2] Villejoubert, Gaëlle; Mandel, David (2002). "The inverse fallacy: An account of deviations from Bayes's Theorem and the additivity principle". Memory & Cognition. 30 (5): 171–178. doi: 10.3758/BF03195278 . PMID 12035879.

[3] Eddy, David M. (1982). "Probabilistic reasoning in clinical medicine: Problems and opportunities". In Kahneman, D.; Slovic, P.; Tversky, A. (eds.). Judgment under uncertainty: Heuristics and biases. New York: Cambridge University Press. pp. 249–267. ISBN 0-521-24064-6. Description simplified as in Plous (1993).

[4] Eddy (1982 , p. 253). "Unfortunately, most physicians (approximately 95 out of 100 in an informal sample taken by the author) misinterpret the statements about the accuracy of the test and estimate P(ca|pos) to be about 75%."

[HD-5] 1 2 Hastie, Reid; Robyn Dawes (2001). Rational Choice in an Uncertain World. pp. 122–123. ISBN 978-0-7619-2275-9.^{[ full citation needed ]}

[6] see "Engineers make good terrorists?". Slashdot. 2008-04-03. Retrieved 2008-04-25.

[1]

[2]

[3]

[4]

[5]

[6]