Lord's paradox

Last updated

In statistics, Lord's paradox raises the issue of when it is appropriate to control for baseline status. In three papers, Frederic M. Lord gave examples when statisticians could reach different conclusions depending on whether they adjust for pre-existing differences. [1] [2] [3] Holland & Rubin (1983) use these examples to illustrate how there may be multiple valid descriptive comparisons in the data, but causal conclusions require an underlying (untestable) causal model. [4] Pearl used these examples to illustrate how graphical causal models resolve the issue of when control for baseline status is appropriate. [5] [6] [7]

Contents

Lord's formulation

Sketch of Lord's paradox. ParadoxSketch.jpg
Sketch of Lord's paradox.

The most famous formulation of Lord's paradox comes from his 1967 paper: [1]

“A large university is interested in investigating the effects on the students of the diet provided in the university dining halls and any sex differences in these effects. Various types of data are gathered. In particular, the weight of each student at the time of his arrival in September and his weight the following June are recorded.” (Lord 1967, p. 304)

In both September and June, the overall distribution of male weights is the same, although individuals' weights have changed, and likewise for the distribution of female weights.

Lord imagines two statisticians who use different common statistical methods but reach opposite conclusions.

One statistician does not adjust for initial weight, instead using t-test and comparing gain scores (individuals' average final weight − average initial weight) as the outcome. The first statistician claims no significant difference between genders: "[A]s far as these data are concerned, there is no evidence of any interesting effect of diet (or of anything else) on student weights. In particular, there is no evidence of any differential effect on the two sexes, since neither group shows any systematic change." (pg. 305) Visually, the first statistician sees that neither group mean ('A' and 'B') has changed, and concludes that the new diet had no causal impact.

The second statistician adjusts for initial weight, using analysis of covariance (ANCOVA), and compares (adjusted) final weights as the outcome. He finds a significant difference between the two dining halls. Visually, the second statistician fits a regression model (green dotted lines), finds that the intercept differs for boys vs girls, and concludes that the new diet had a larger impact for males.

Lord concluded: "there simply is no logical or statistical procedure that can be counted on to make proper allowance for uncontrolled preexisting differences between groups."

Responses

There have been many attempts and interpretations of the paradox, along with its relationship to other statistical paradoxes. While initially framed as a paradox, later authors have used the example to clarify the importance of untestable assumptions in causal inference.

Importance of modeling assumptions

Bock (1975)

Bock responded to the paradox by positing that both statisticians in the scenario are correct once the question being asked is clarified. The first statistician (who compares group means and distributions) is asking "are there differences in average weight gain?", whereas the second is asking "what are the differences in individual weight gain?" [5]

Cox and McCullagh (1982)

Cox and McCullagh interpret the problem by constructing a model of what could have happened had the students not dined in the dining hall, where they assume that a student's weight would have stayed constant. They conclude that in fact the first statistician was right when asking about group differences, while the second was right when asking about the effect on an individual. [5]

Holland and Rubin (1983)

Holland & Rubin (1983) [4] argue that both statisticians have captured accurate descriptive features of the data: Statistician 1 accurately finds no difference in relative weight changes across the two genders, while Statistician 2 accurately finds a larger average weight gain for boys conditional on a boy and girl have the same starting weight. However, when turning these descriptions into causal statements, they implicitly assert that weight would have otherwise stayed constant (Statistician 1) or that it would have followed the posited linear model (Statistician 2).

“In summary, we believe that the following views resolve Lord's Paradox. If both statisticians made only descriptive statements, they would both be correct. Statistician 1 makes their unconditional descriptive statements that the average weight gains for males and females are equal; Statistician 2 makes the conditional (on X) statement that for males and females of equal September weight, the males gain more than the females. In contrast, if the statisticians turned these descriptive statements into causal statements, neither would be correct or incorrect because untestable assumptions determine the correctness of causal statements... Statistician 1 is wrong because he makes a causal statement without specifying the assumption needed to make it true. Statistician 2 is more cautious, since he makes only a descriptive statement. However, unless he too makes further assumptions, his descriptive statement is completely irrelevant to the campus dietician's interest in the effect of the dining hall diet." (pg. 19)

Moreover, the underlying assumptions necessary to turn descriptive statements into causal statements are untestable. Unlike descriptive statements (e.g. "the average height in the US is X"), causal statements involve a comparison between what happened and what would have happened absent an intervention. The latter is unobservable in the real world, a fact that Holland & Rubin term "the fundamental problem of causal inference" (pg. 10). This is explains why researchers often turn to experiments: while we still never observe both counterfactuals for a single subject, experiments let us make statistical claims about these differences in the population under minimal assumptions. Absent an experiment, modelers should carefully describe the model they use to make causal statements and justify those models as strongly as possible.

Pearl (2016)

Pearl (2016) [5] agrees with Lord’s conclusion that the answer cannot be found in the data, but he finds Holland and Rubin’s account to be incomplete. In his views, a complete resolution of the Paradox should provide an answer to Lord’s essential question: "How to allow for preexisting differences between groups?" Moreover, since the answer depends on the causal model assumed, we should explain: (1) Why people find Lord’s story to be "Paradoxical" rather than "In need of more information" and, (2) How to properly utilize causal models to answer Lord’s question, regardless of whether they are testable or not.

To this end, Pearl used a simplified version of Lord’s Paradox, proposed by Wainer and Brown, [8] in which gender differences are not considered. Instead, the quantity of interest is the effect of diet on weight gain, as shown in Figure 2(a).

Revised version of Lord's paradox and its causal diagram (from ) Paradoxcausaldiagram2.png
Revised version of Lord’s paradox and its causal diagram (from )

The two ellipses represent two dining halls, each serving a different diet, and each point represents a student's initial and final weights. Note that students who weigh more in the beginning tend to eat in dining hall B, while the ones who weigh less eat in dining hall A. The first statistician claims that switching from Diet A to B would have no effect on weight gain, since the gain WFWI has the same distribution in both ellipses. The second statistician compares the final weights under Diet A to those of Diet B for a group of students with same initial weight W0 and finds that latter is larger than the former in every level of W0. He concludes therefore that the students on Diet B gain more than those on Diet A. As before, the data can’t tell us whom to believe, and a causal model must be assumed to settle the issue. One plausible model is shown in Figure 2(b). In this model, WI is the only confounder of D and WF, so controlling for WI is essential for deconfounding the causal effect needed. Assuming this model, the second statistician would be correct and the first statistician would be wrong.

This analysis also unveils why Lord’s story appears paradoxical, and why generations of statisticians have found it perplexing.

According to Pearl, the data triggers a clash between two strong intuitions, both are valid in causal thinking, but not in the non-causal thinking invoked by the first statistician.

One intuition claims that, to get the needed effect, we must make “proper allowances” for uncontrolled preexisting differences between groups” (i.e. initial weights). The second claims that the overall effect (of Diet on Gain) is just the average of the stratum-specific effects. The two intuitions are valid, but seem to clash when we interpret the first statistician’s finding as zero effect when, in fact, his finding merely entails equality of distributions, and says nothing about ``effects. [fn 1] This can also be seen from Figure 2(b), which allows D to causally affect Y while, simultaneously, be statistically independent of it (due to path cancelations).

This resolution of Lord’s Paradox answers both questions: (1) How to allow for preexisting differences between groups and (2) Why the data appear paradoxical. Pearl's do-calculus [6] further answers question (1) for any causal model assumed, including models with multiple unobserved confounders.

Initial weight as a mediator

Going back to Lord's original problem of comparing boys and girls, Pearl (2016) [5] posits another causal model where sex and initial weight both influence the final weight. Moreover, since sex also influences the initial weight, Initial Weight becomes a mediating variable: sex influences final weight both through a direct effect and an indirect effect (by influencing initial weight, which then influences final weight). Note that none of these variables are confounders, so controls are not strictly necessary in this model. However, the choice of whether to control for initial weight dictates which effect the researcher is measuring: the first statistician does not control and measures a total effect, while the second does control and measures a direct effect.

"Cases where total and direct effects differ in sign are commonplace. For example, we are not at all surprised when smallpox inoculation carries risks of fatal reaction, yet reduces overall mortality by eradicating smallpox. The direct effect (fatal reaction) in this case is negative for every stratum of the population, yet the total effect (on mortality) is positive for the population as a whole." (pg 4)

Tu, Gunnell, and Gilthorpe (2008) use a similar causal framework, but counter that the conceptualization of direct and total effect is not the best framework in many cases because there are many different variables that could be controlled for, without an experimental basis that these are separate causal paths. [9]

Relation to other paradoxes

According to Tu, Gunnell, and Gilthorpe, Lord's paradox is the continuous version of Simpson's paradox. [10] Those authors state that Lord's Paradox, Simpson's Paradox, and the suppression of covariates by uncorrelated predictor variables are all the same thing, namely a reversal paradox.

Importance

Broadly, the "fundamental problem of causal inference" [4] and related aggregation concepts Simpson's paradox play major roles in applied statistics. Lord's Paradox and associated analyses provide a powerful teaching tool to understand these fundamental statistical concepts.

More directly, Lord's Paradox may have implications for both education and health policies that attempt to reward educators or hospitals for the improvements that their children/patients made under their care, which is the basis for No Child Left Behind. [11] It has also been suspected to be at work in studies linking IQ to eye defects. [12]

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

Causality (also called causation, or cause and effect) is an influence by which one event, process, state, or object (acause) contributes to the production of another event, process, state, or object (an effect) where the cause is partly responsible for the effect, and the effect is partly dependent on the cause. In general, a process has many causes, which are also said to be causal factors for it, and all lie in its past. An effect can in turn be a cause of, or causal factor for, many other effects, which all lie in its future. Some writers have held that causality is metaphysically prior to notions of time and space.

The phrase "correlation does not imply causation" refers to the inability to legitimately deduce a cause-and-effect relationship between two events or variables solely on the basis of an observed association or correlation between them. The idea that "correlation implies causation" is an example of a questionable-cause logical fallacy, in which two events occurring together are taken to have established a cause-and-effect relationship. This fallacy is also known by the Latin phrase cum hoc ergo propter hoc. This differs from the fallacy known as post hoc ergo propter hoc, in which an event following another is seen as a necessary consequence of the former event, and from conflation, the errant merging of two events, ideas, databases, etc., into one.

<span class="mw-page-title-main">Simpson's paradox</span> Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

<span class="mw-page-title-main">Interaction (statistics)</span> Statistical term

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

The Markov condition, sometimes called the Markov assumption, is an assumption made in Bayesian probability theory, that every node in a Bayesian network is conditionally independent of its nondescendants, given its parents. Stated loosely, it is assumed that a node has no bearing on nodes which do not descend from it. In a DAG, this local Markov condition is equivalent to the global Markov condition, which states that d-separations in the graph also correspond to conditional independence relations. This also means that a node is conditionally independent of the entire network, given its Markov blanket.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection does not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

<span class="mw-page-title-main">Mediation (statistics)</span> Statistical model

In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, a mediation model proposes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.

Probabilistic causation is a concept in a group of philosophical theories that aim to characterize the relationship between cause and effect using the tools of probability theory. The central idea behind these theories is that causes raise the probabilities of their effects, all else being equal.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

Principal stratification is a statistical technique used in causal inference when adjusting results for post-treatment covariates. The idea is to identify underlying strata and then compute causal effects only within strata. It is a generalization of the local average treatment effect (LATE) in the sense of presenting applications besides all-or-none compliance. The LATE method, which was independently developed by Imbens and Angrist (1994) and Baker and Lindeman (1994) also included the key exclusion restriction and monotonicity assumptions for identifiability. For the history of early developments see Baker, Kramer, Lindeman.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

<i>The Book of Why</i> 2018 book by Judea Pearl and Dana Mackenzie

The Book of Why: The New Science of Cause and Effect is a 2018 nonfiction book by computer scientist Judea Pearl and writer Dana Mackenzie. The book explores the subject of causality and causal inference from statistical and philosophical points of view for a general audience.

References

  1. 1 2 Lord, E. M. (1967). A paradox in the interpretation of group comparisons. Psychological Bulletin, 68, 304–305. doi : 10.1037/h0025105
  2. Lord, F. M. (1969). Statistical adjustments when comparing preexisting groups. Psychological Bulletin, 72, 336–337. doi : 10.1037/h0028108
  3. Lord, E. M. (1975). Lord's paradox. In S. B. Anderson, S. Ball, R. T. Murphy, & Associates, Encyclopedia of Educational Evaluation (pp. 232–236). San Francisco, CA: Jossey-Bass.
  4. 1 2 3 Holland, Paul W.; Rubin, Donald B. (1983). "On Lord's paradox". Principals of modern psychological measurement. Routledge. pp. 3–25. ISBN   9780898592771.
  5. 1 2 3 4 5 Pearl, Judea (2016). "Lord's Paradox Revisited – (Oh Lord! Kumbaya!)". Journal of Causal Inference. 4 (2): 20160021. doi: 10.1515/jci-2016-0021 . S2CID   17506239.
  6. 1 2 3 4 Pearl, Judea; Mackenzie, Dana (2018). The Book of Why: The New Science of Cause and Effect. New York, NY: Basic Books.
  7. Pearl, Judea. "Lord's Paradox: The Power of Causal Thinking". Causal Analysis in Theory and Practice. Retrieved August 13, 2019.
  8. Wainer, H.; Brown, L. (2007) "Three statistical paradoxes in the interpretation of group differences: Illustrated with medical school admission and licensing data" In Rao, C.; Sinharay, S. (Eds.) Handbook of Statistics 26: Psychometrics Vol. 26 North Holland: Elsevier B.V., pp. 893-918.
  9. From the text:
    "Whilst the total effect of birth weight on BP is not affected by the numbers of intermediate body size variables in the model, the estimation of 'direct' effect differs when different intermediate variables are adjusted for. Unless there is experimental evidence to support the notion that there are indeed different paths of direct and indirect effects from birth weight to BP, we are cautious of using such terminology to label the results from multiple regression, as with model 3. In other words, to determine whether the unconditional or conditional relationship reflects the true physiological relationship between birth weight and blood pressure, experiments in which birth weight and current weight can be manipulated are required in order to estimate the impact of birth weight on blood pressure." (pg8)
    Yu-Kang Tu, David Gunnell, Mark S Gilthorpe. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox. Emerg Themes Epidemiol. 2008; 5: 2. doi : 10.1186/1742-7622-5-2 PMC   2254615
  10. Yu-Kang Tu, David Gunnell, Mark S Gilthorpe. Simpson's Paradox, Lord's Paradox, and Suppression Effects are the same phenomenon – the reversal paradox. Emerg Themes Epidemiol. 2008; 5: 2. doi : 10.1186/1742-7622-5-2 PMC   2254615
  11. DB Rubin, EA Stuart, EL Zanutto, A potential outcomes view of value-added assessment in education, Journal of educational and behavioral Statistics, Vol. 29, No. 1, Value-Added Assessment Special Issue (Spring, 2004), pp. 103–116 doi : 10.3102/10769986029001103
  12. Refractive state, intelligence, education, and Lord's paradox Sorjonen, K et al Intelligence 61, March–April 2017, 115–119 doi : 10.1016/j.intell.2017.01.011

Notes

  1. An identical clash surfaces in Simpson's paradox Simpson's paradox when we give causal interpretations to statistical associations; [6] the Sure thing principle may fail in non-causal relations. In other words, statistical associations may disappear or reverse upon aggregation when strata are of different sizes.