Regression fallacy

Last updated

The regression (or regressive) fallacy is an informal fallacy. It assumes that something has returned to normal because of corrective actions taken while it was abnormal. This fails to account for natural fluctuations. It is frequently a special kind of the post hoc fallacy.

Contents

Explanation

Things like golf scores and chronic back pain fluctuate naturally and usually regress toward the mean. The logical flaw is to make predictions that expect exceptional results to continue as if they were average (see Representativeness heuristic). People are most likely to take action when variance is at its peak. Then after results become more normal they believe that their action was the cause of the change when in fact it was not causal.

This use of the word "regression" was coined by Sir Francis Galton in a study from 1885 called "Regression Toward Mediocrity in Hereditary Stature". He showed that the height of children from very short or very tall parents would move toward the average. In fact, in any situation where two variables are less than perfectly correlated, an exceptional score on one variable may not be matched by an equally exceptional score on the other variable. The imperfect correlation between parents and children (height is not entirely heritable) means that the distribution of heights of their children will be centered somewhere between the average of the parents and the average of the population as whole. Thus, any single child can be more extreme than the parents, but the odds are against it.

Examples

When his pain got worse, he went to a doctor, after which the pain subsided a little. Therefore, he benefited from the doctor's treatment.

The pain subsiding a little after it has gotten worse is more easily explained by regression toward the mean. Assuming the pain relief was caused by the doctor is fallacious.

The student did exceptionally poorly last semester, so I punished him. He did much better this semester. Clearly, punishment is effective in improving students' grades.

Often exceptional performances are followed by more normal performances, so the change in performance might better be explained by regression toward the mean. Incidentally, some experiments have shown that people may develop a systematic bias for punishment and against reward because of reasoning analogous to this example of the regression fallacy. [1]

The frequency of accidents on a road fell after a speed camera was installed. Therefore, the speed camera has improved road safety.

Speed cameras are often installed after a road incurs an exceptionally high number of accidents, and this value usually falls (regression to mean) immediately afterward. Many speed camera proponents attribute this fall in accidents to the speed camera, without observing the overall trend.

Some authors use the Sports Illustrated cover jinx as an example of a regression effect: extremely good performances are likely to be followed by less extreme ones, and athletes are chosen to appear on the cover of Sports Illustrated only after extreme performances. Attributing this to a "jinx" rather than regression, as some athletes reportedly believe, is an example of committing the regression fallacy. [2]

Misapplication

On the other hand, dismissing valid explanations can lead to a worse situation. For example:

After the Western Allies invaded Normandy, creating a second major front, German control of Europe waned. Clearly, the combination of the Western Allies and the USSR drove the Germans back.

Fallacious evaluation: "Given that the counterattacks against Germany occurred only after they had conquered the greatest amount of territory under their control, regression toward the mean can explain the retreat of German forces from occupied territories as a purely random fluctuation that would have happened without any intervention on the part of the USSR or the Western Allies." However, this was not the case. The reason is that political power and occupation of territories is not primarily determined by random events, making the concept of regression toward the mean inapplicable (on the large scale).

In essence, misapplication of regression toward the mean can reduce all events to a just-so story, without cause or effect. (Such misapplication takes as a premise that all events are random, as they must be for the concept of regression toward the mean to be validly applied.)

Notes

  1. Schaffner, 1985; Gilovich, 1991 pp. 27–28
  2. Gilovich, 1991 pp. 26–27; Plous, 1993 p. 118

Related Research Articles

The law of averages is the commonly held belief that a particular outcome or event will, over certain periods of time, occur at a frequency that is similar to its probability. Depending on context or application it can be considered a valid common-sense observation or a misunderstanding of probability. This notion can lead to the gambler's fallacy when one becomes convinced that a particular outcome must come soon simply because it has not occurred recently.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Begging the question</span> Logic founded on unproven premises

In classical rhetoric and logic, begging the question or assuming the conclusion is an informal fallacy that occurs when an argument's premises assume the truth of the conclusion, instead of supporting it.

<span class="mw-page-title-main">Francis Galton</span> English polymath (1822–1911)

Sir Francis Galton, FRS FRAI, was an English Victorian era polymath: a statistician, sociologist, psychologist, anthropologist, tropical explorer, geographer, inventor, meteorologist, proto-geneticist, psychometrician and a proponent of social Darwinism, eugenics, and scientific racism. He was knighted in 1909.

<span class="mw-page-title-main">Law of large numbers</span> Averages of repeated trials converge to the expected value

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

<span class="mw-page-title-main">Regression toward the mean</span> Statistical phenomenon

In statistics, regression toward the mean is a concept that refers to the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. Furthermore, when many random variables are sampled and the most extreme results are intentionally picked out, it refers to the fact that a second sampling of these picked-out variables will result in "less extreme" results, closer to the initial mean of all of the variables.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

The representativeness heuristic is used when making judgments about the probability of an event under uncertainty. It is one of a group of heuristics proposed by psychologists Amos Tversky and Daniel Kahneman in the early 1970s as "the degree to which [an event] (i) is similar in essential characteristics to its parent population, and (ii) reflects the salient features of the process by which it is generated". Heuristics are described as "judgmental shortcuts that generally get us where we need to go – and quickly – but at the cost of occasionally sending us off course." Heuristics are useful because they use effort-reduction and simplification in decision-making.

Thomas Gilovich American psychologist (born 1954)

Thomas Dashiff Gilovich an American psychologist who is the Irene Blecker Rosenfeld Professor of Psychology at Cornell University. He has conducted research in social psychology, decision making, behavioral economics, and has written popular books on these subjects. Gilovich has collaborated with Daniel Kahneman, Richard Nisbett, Lee Ross and Amos Tversky. His articles in peer-reviewed journals on subjects such as cognitive biases have been widely cited. In addition, Gilovich has been quoted in the media on subjects ranging from the effect of purchases on happiness to perception of judgment in social situations. Gilovich is a fellow of the Committee for Skeptical Inquiry.

In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as "unit imputation"; when substituting for a component of a data point, it is known as "item imputation". There are three main problems that missing data causes: missing data can introduce a substantial amount of bias, make the handling and analysis of the data more arduous, and create reductions in efficiency. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with an estimated value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. There have been many theories embraced by scientists to account for missing data but the majority of them introduce bias. A few of the well known attempts to deal with missing data include: hot deck and cold deck imputation; listwise and pairwise deletion; mean imputation; non-negative matrix factorization; regression imputation; last observation carried forward; stochastic imputation; and multiple imputation.

<span class="mw-page-title-main">Coefficient of determination</span> Indicator for how well data points fit a line or curve

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s).

<span class="mw-page-title-main">Image noise</span> Visible interference in an image

Image noise is random variation of brightness or color information in images, and is usually an aspect of electronic noise. It can be produced by the image sensor and circuitry of a scanner or digital camera. Image noise can also originate in film grain and in the unavoidable shot noise of an ideal photon detector. Image noise is an undesirable by-product of image capture that obscures the desired information.

<span class="mw-page-title-main">Quasi-experiment</span> Empirical interventional study

A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

In time series data, seasonality is the presence of variations that occur at specific regular intervals less than a year, such as weekly, monthly, or quarterly. Seasonality may be caused by various factors, such as weather, vacation, and holidays and consists of periodic, repetitive, and generally regular and predictable patterns in the levels of a time series.

The "hot hand" is a phenomenon, previously considered a cognitive social bias, that a person who experiences a successful outcome has a greater chance of success in further attempts. The concept is often applied to sports and skill-based tasks in general and originates from basketball, where a shooter is more likely to score if their previous attempts were successful, i.e. while having the "hot hand.” While previous success at a task can indeed change the psychological attitude and subsequent success rate of a player, researchers for many years did not find evidence for a "hot hand" in practice, dismissing it as fallacious. However, later research questioned whether the belief is indeed a fallacy. Some recent studies using modern statistical analysis have observed evidence for the "hot hand" in some sporting activities; however, other recent studies have not observed evidence of the "hot hand". Moreover, evidence suggests that only a small subset of players may show a "hot hand" and, among those who do, the magnitude of the "hot hand" tends to be small.

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Gary Nance Smith is the Fletcher Jones Professor of Economics at Pomona College. His research on financial markets statistical reasoning, and artificial intelligence, often involves stock market anomalies, statistical fallacies, and the misuse of data have been widely cited.

References