Confounding

Last updated
Whereas a mediator is a factor in the causal chain (above), a confounder is a spurious factor incorrectly implying causation (bottom) Comparison confounder mediator.svg
Whereas a mediator is a factor in the causal chain (above), a confounder is a spurious factor incorrectly implying causation (bottom)

In causal inference, a confounder (also confounding variable, confounding factor, extraneous determinant or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. [1] [2] [3] The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

Contents

Confounds are threats to internal validity. [4]

Definition

Confounding is defined in terms of the data generating model. Let X be some independent variable, and Y some dependent variable. To estimate the effect of X on Y, the statistician must suppress the effects of extraneous variables that influence both X and Y. We say that X and Y are confounded by some other variable Z whenever Z causally influences both X and Y.

Let be the probability of event Y = y under the hypothetical intervention X = x. X and Y are not confounded if and only if the following holds:

 

 

 

 

(1)

for all values X = x and Y = y, where is the conditional probability upon seeing X = x. Intuitively, this equality states that X and Y are not confounded whenever the observationally witnessed association between them is the same as the association that would be measured in a controlled experiment, with x randomized.

In principle, the defining equality can be verified from the data generating model, assuming we have all the equations and probabilities associated with the model. This is done by simulating an intervention (see Bayesian network) and checking whether the resulting probability of Y equals the conditional probability . It turns out, however, that graph structure alone is sufficient for verifying the equality .

Control

Consider a researcher attempting to assess the effectiveness of drug X, from population data in which drug usage was a patient's choice. The data shows that gender (Z) influences a patient's choice of drug as well as their chances of recovery (Y). In this scenario, gender Z confounds the relation between X and Y since Z is a cause of both X and Y:

Causal diagram of Gender as common cause of Drug use and Recovery Confounding.PNG
Causal diagram of Gender as common cause of Drug use and Recovery

We have that

 

 

 

 

(2)

because the observational quantity contains information about the correlation between X and Z, and the interventional quantity does not (since X is not correlated with Z in a randomized experiment). It can be shown [5] that, in cases where only observational data is available, an unbiased estimate of the desired quantity , can be obtained by "adjusting" for all confounding factors, namely, conditioning on their various values and averaging the result. In the case of a single confounder Z, this leads to the "adjustment formula":

 

 

 

 

(3)

which gives an unbiased estimate for the causal effect of X on Y. The same adjustment formula works when there are multiple confounders except, in this case, the choice of a set Z of variables that would guarantee unbiased estimates must be done with caution. The criterion for a proper choice of variables is called the Back-Door [5] [6] and requires that the chosen set Z "blocks" (or intercepts) every path between X and Y that contains an arrow into X. Such sets are called "Back-Door admissible" and may include variables which are not common causes of X and Y, but merely proxies thereof.

Returning to the drug use example, since Z complies with the Back-Door requirement (i.e., it intercepts the one Back-Door path ), the Back-Door adjustment formula is valid:

 

 

 

 

(4)

In this way the physician can predict the likely effect of administering the drug from observational studies in which the conditional probabilities appearing on the right-hand side of the equation can be estimated by regression.

Contrary to common beliefs, adding covariates to the adjustment set Z can introduce bias. [7] A typical counterexample occurs when Z is a common effect of X and Y, [8] a case in which Z is not a confounder (i.e., the null set is Back-door admissible) and adjusting for Z would create bias known as "collider bias" or "Berkson's paradox." Controls that are not good confounders are sometimes called bad controls.

In general, confounding can be controlled by adjustment if and only if there is a set of observed covariates that satisfies the Back-Door condition. Moreover, if Z is such a set, then the adjustment formula of Eq. (3) is valid. [5] [6] Pearl's do-calculus provides all possible conditions under which can be estimated, not necessarily by adjustment. [9]

History

According to Morabia (2011), [10] the word confounding derives from the Medieval Latin verb "confundere", which meant "mixing", and was probably chosen to represent the confusion (from Latin: con=with + fusus=mix or fuse together) between the cause one wishes to assess and other causes that may affect the outcome and thus confuse, or stand in the way of the desired assessment. Greenland, Robins and Pearl [11] note an early use of the term "confounding" in causal inference by John Stuart Mill in 1843.

Fisher introduced the word "confounding" in his 1935 book "The Design of Experiments" [12] to refer specifically to a consequence of blocking (i.e., partitioning) the set of treatment combinations in a factorial experiment, whereby certain interactions may be "confounded with blocks". This popularized the notion of confounding in statistics, although Fisher was concerned with the control of heterogeneity in experimental units, not with causal inference.

According to Vandenbroucke (2004) [13] it was Kish [14] who used the word "confounding" in the sense of "incomparability" of two or more groups (e.g., exposed and unexposed) in an observational study. Formal conditions defining what makes certain groups "comparable" and others "incomparable" were later developed in epidemiology by Greenland and Robins (1986) [15] using the counterfactual language of Neyman (1935) [16] and Rubin (1974). [17] These were later supplemented by graphical criteria such as the Back-Door condition (Pearl 1993; Greenland, Robins and Pearl 1999). [11] [5]

Graphical criteria were shown to be formally equivalent to the counterfactual definition [18] but more transparent to researchers relying on process models.

Types

In the case of risk assessments evaluating the magnitude and nature of risk to human health, it is important to control for confounding to isolate the effect of a particular hazard such as a food additive, pesticide, or new drug. For prospective studies, it is difficult to recruit and screen for volunteers with the same background (age, diet, education, geography, etc.), and in historical studies, there can be similar variability. Due to the inability to control for variability of volunteers and human studies, confounding is a particular challenge. For these reasons, experiments offer a way to avoid most forms of confounding.

In some disciplines, confounding is categorized into different types. In epidemiology, one type is "confounding by indication", [19] which relates to confounding from observational studies. Because prognostic factors may influence treatment decisions (and bias estimates of treatment effects), controlling for known prognostic factors may reduce this problem, but it is always possible that a forgotten or unknown factor was not included or that factors interact complexly. Confounding by indication has been described as the most important limitation of observational studies. Randomized trials are not affected by confounding by indication due to random assignment.

Confounding variables may also be categorised according to their source. The choice of measurement instrument (operational confound), situational characteristics (procedural confound), or inter-individual differences (person confound).

Examples

Say one is studying the relation between birth order (1st child, 2nd child, etc.) and the presence of Down Syndrome in the child. In this scenario, maternal age would be a confounding variable:

  1. Higher maternal age is directly associated with Down Syndrome in the child
  2. Higher maternal age is directly associated with Down Syndrome, regardless of birth order (a mother having her 1st vs 3rd child at age 50 confers the same risk)
  3. Maternal age is directly associated with birth order (the 2nd child, except in the case of twins, is born when the mother is older than she was for the birth of the 1st child)
  4. Maternal age is not a consequence of birth order (having a 2nd child does not change the mother's age)

In risk assessments, factors such as age, gender, and educational levels often affect health status and so should be controlled. Beyond these factors, researchers may not consider or have access to data on other causal factors. An example is on the study of smoking tobacco on human health. Smoking, drinking alcohol, and diet are lifestyle activities that are related. A risk assessment that looks at the effects of smoking but does not control for alcohol consumption or diet may overestimate the risk of smoking. [22] Smoking and confounding are reviewed in occupational risk assessments such as the safety of coal mining. [23] When there is not a large sample population of non-smokers or non-drinkers in a particular occupation, the risk assessment may be biased towards finding a negative effect on health.

Decreasing the potential for confounding

A reduction in the potential for the occurrence and effect of confounding factors can be obtained by increasing the types and numbers of comparisons performed in an analysis. If measures or manipulations of core constructs are confounded (i.e. operational or procedural confounds exist), subgroup analysis may not reveal problems in the analysis. Additionally, increasing the number of comparisons can create other problems (see multiple comparisons).

Peer review is a process that can assist in reducing instances of confounding, either before study implementation or after analysis has occurred. Peer review relies on collective expertise within a discipline to identify potential weaknesses in study design and analysis, including ways in which results may depend on confounding. Similarly, replication can test for the robustness of findings from one study under alternative study conditions or alternative analyses (e.g., controlling for potential confounds not identified in the initial study).

Confounding effects may be less likely to occur and act similarly at multiple times and locations.[ citation needed ] In selecting study sites, the environment can be characterized in detail at the study sites to ensure sites are ecologically similar and therefore less likely to have confounding variables. Lastly, the relationship between the environmental variables that possibly confound the analysis and the measured parameters can be studied. The information pertaining to environmental variables can then be used in site-specific models to identify residual variance that may be due to real effects. [24]

Depending on the type of study design in place, there are various ways to modify that design to actively exclude or control confounding variables: [25]

All these methods have their drawbacks:

  1. The best available defense against the possibility of spurious results due to confounding is often to dispense with efforts at stratification and instead conduct a randomized study of a sufficiently large sample taken as a whole, such that all potential confounding variables (known and unknown) will be distributed by chance across all study groups and hence will be uncorrelated with the binary variable for inclusion/exclusion in any group.
  2. Ethical considerations: In double-blind and randomized controlled trials, participants are not aware that they are recipients of sham treatments and may be denied effective treatments. [26] There is a possibility that patients only agree to invasive surgery (which carry real medical risks) under the understanding that they are receiving treatment. Although this is an ethical concern, it is not a complete account of the situation. For surgeries that are currently being performed regularly, but for which there is no concrete evidence of a genuine effect, there may be ethical issues to continue such surgeries. In such circumstances, many of people are exposed to the real risks of surgery yet these treatments may possibly offer no discernible benefit. Sham-surgery control is a method that may allow medical science to determine whether a surgical procedure is efficacious or not. Given that there are known risks associated with medical operations, it is questionably ethical to allow unverified surgeries to be conducted ad infinitum into the future.

Artifacts

Artifacts are variables that should have been systematically varied, either within or across studies, but that were accidentally held constant. Artifacts are thus threats to external validity. Artifacts are factors that covary with the treatment and the outcome. Campbell and Stanley [27] identify several artifacts. The major threats to internal validity are history, maturation, testing, instrumentation, statistical regression, selection, experimental mortality, and selection-history interactions.

One way to minimize the influence of artifacts is to use a pretest-posttest control group design. Within this design, "groups of people who are initially equivalent (at the pretest phase) are randomly assigned to receive the experimental treatment or a control condition and then assessed again after this differential experience (posttest phase)". [28] Thus, any effects of artifacts are (ideally) equally distributed in participants in both the treatment and control conditions.

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

<span class="mw-page-title-main">Simpson's paradox</span> Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

<span class="mw-page-title-main">Experiment</span> Scientific procedure performed to validate a hypothesis

An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.

<span class="mw-page-title-main">Spurious relationship</span> Apparent, but false, correlation between causally-independent variables

In statistics, a spurious relationship or spurious correlation is a mathematical relationship in which two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can generalize or transport to other situations, people, stimuli, and times. Generalizability refers to the applicability of a predefined sample to a broader population while transportability refers to the applicability of one sample to another target population. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study.

<span class="mw-page-title-main">Number needed to treat</span> Epidemiological measure

The number needed to treat (NNT) or number needed to treat for an additional beneficial outcome (NNTB) is an epidemiological measure used in communicating the effectiveness of a health-care intervention, typically a treatment with medication. The NNT is the average number of patients who need to be treated to prevent one additional bad outcome. It is defined as the inverse of the absolute risk reduction, and computed as , where is the incidence in the control (unexposed) group, and is the incidence in the treated (exposed) group. This calculation implicitly assumes monotonicity, that is, no individual can be harmed by treatment. The modern approach, based on counterfactual conditionals, relaxes this assumption and yields bounds on NNT.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection does not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

<span class="mw-page-title-main">Observational study</span> Study with uncontrolled variable of interest

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group. Observational studies, for lacking an assignment mechanism, naturally present difficulties for inferential analysis.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

<span class="mw-page-title-main">Mediation (statistics)</span> Statistical model

In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, which is often false, a mediation model proposes that the independent variable influences the mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.

In causal models, controlling for a variable means binning data according to measured values of the variable. This is typically done so that the variable can no longer act as a confounder in, for example, an observational study or experiment.

<span class="mw-page-title-main">Mendelian randomization</span> Statistical method in genetic epidemiology

In epidemiology, Mendelian randomization is a method using measured variation in genes to examine the causal effect of an exposure on an outcome. Under key assumptions, the design reduces both reverse causation and confounding, which often substantially impede or mislead the interpretation of results from epidemiological studies.

<span class="mw-page-title-main">Quasi-experiment</span> Empirical interventional study

A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

References

  1. Pearl, J., (2009). Simpson's Paradox, Confounding, and Collapsibility In Causality: Models, Reasoning and Inference (2nd ed.). New York : Cambridge University Press.
  2. VanderWeele, T.J.; Shpitser, I. (2013). "On the definition of a confounder". Annals of Statistics. 41 (1): 196–220. arXiv: 1304.0564 . doi:10.1214/12-aos1058. PMC   4276366 . PMID   25544784.
  3. Greenland, S.; Robins, J. M.; Pearl, J. (1999). "Confounding and Collapsibility in Causal Inference". Statistical Science. 14 (1): 29–46. doi: 10.1214/ss/1009211805 .
  4. Shadish, W. R.; Cook, T. D.; Campbell, D. T. (2002). Experimental and quasi-experimental designs for generalized causal inference. Boston, MA: Houghton-Mifflin.
  5. 1 2 3 4 Pearl, J., (1993). "Aspects of Graphical Models Connected With Causality", In Proceedings of the 49th Session of the International Statistical Science Institute, pp. 391–401.
  6. 1 2 Pearl, J. (2009). Causal Diagrams and the Identification of Causal Effects In Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, US: Cambridge University Press.
  7. Cinelli, C.; Forney, A.; Pearl, J. (March 2022). "A Crash Course in Good and Bad Controls" (PDF). UCLA Cognitive Systems Laboratory, Technical Report (R-493).
  8. Lee, P. H. (2014). "Should We Adjust for a Confounder if Empirical and Theoretical Criteria Yield Contradictory Results? A Simulation Study". Sci Rep . 4: 6085. Bibcode:2014NatSR...4E6085L. doi:10.1038/srep06085. PMC   5381407 . PMID   25124526.
  9. Shpitser, I.; Pearl, J. (2008). "Complete identification methods for the causal hierarchy". The Journal of Machine Learning Research. 9: 1941–1979.
  10. Morabia, A (2011). "History of the modern epidemiological concept of confounding" (PDF). Journal of Epidemiology and Community Health. 65 (4): 297–300. doi: 10.1136/jech.2010.112565 . PMID   20696848. S2CID   9068532.
  11. 1 2 Greenland, S.; Robins, J. M.; Pearl, J. (1999). "Confounding and Collapsibility in Causal Inference". Statistical Science. 14 (1): 31. doi: 10.1214/ss/1009211805 .
  12. Fisher, R. A. (1935). The design of experiments (pp. 114–145).
  13. Vandenbroucke, J. P. (2004). "The history of confounding". Soz Praventivmed. 47 (4): 216–224. doi:10.1007/BF01326402. PMID   12415925. S2CID   198174446.
  14. Kish, L (1959). "Some statistical problems in research design". Am Sociol. 26 (3): 328–338. doi:10.2307/2089381. JSTOR   2089381.
  15. Greenland, S.; Robins, J. M. (1986). "Identifiability, exchangeability, and epidemiological confounding". International Journal of Epidemiology. 15 (3): 413–419. CiteSeerX   10.1.1.157.6445 . doi:10.1093/ije/15.3.413. PMID   3771081.
  16. Neyman, J., with cooperation of K. Iwaskiewics and St. Kolodziejczyk (1935). Statistical problems in agricultural experimentation (with discussion). Suppl J Roy Statist Soc Ser B 2 107-180.
  17. Rubin, D. B. (1974). "Estimating causal effects of treatments in randomized and nonrandomized studies". Journal of Educational Psychology. 66 (5): 688–701. doi:10.1037/h0037350. S2CID   52832751.
  18. Pearl, J., (2009). Causality: Models, Reasoning and Inference (2nd ed.). New York, NY, US: Cambridge University Press.
  19. Johnston, S. C. (2001). "Identifying Confounding by Indication through Blinded Prospective Review". American Journal of Epidemiology . 154 (3): 276–284. doi: 10.1093/aje/154.3.276 . PMID   11479193.
  20. 1 2 Pelham, Brett (2006). Conducting Research in Psychology. Belmont: Wadsworth. ISBN   978-0-534-53294-9.
  21. Steg, L.; Buunk, A. P.; Rothengatter, T. (2008). "Chapter 4". Applied Social Psychology: Understanding and managing social problems. Cambridge, UK: Cambridge University Press.
  22. Tjønneland, Anne; Grønbæk, Morten; Stripp, Connie; Overvad, Kim (January 1999). "Wine intake and diet in a random sample of 48763 Danish men and women". The American Journal of Clinical Nutrition. 69 (1): 49–54. doi: 10.1093/ajcn/69.1.49 . PMID   9925122.
  23. Axelson, O. (1989). "Confounding from smoking in occupational epidemiology". British Journal of Industrial Medicine. 46 (8): 505–07. doi:10.1136/oem.46.8.505. PMC   1009818 . PMID   2673334.
  24. Calow, Peter P. (2009) Handbook of Environmental Risk Assessment and Management, Wiley
  25. Mayrent, Sherry L (1987). Epidemiology in Medicine . Lippincott Williams & Wilkins. ISBN   978-0-316-35636-7.
  26. Emanuel, Ezekiel J; Miller, Franklin G (Sep 20, 2001). "The Ethics of Placebo-Controlled Trials—A Middle Ground". New England Journal of Medicine . 345 (12): 915–9. doi:10.1056/nejm200109203451211. PMID   11565527.
  27. Campbell, D. T.; Stanley, J. C. (1966). Experimental and quasi-experimental designs for research. Chicago: Rand McNally.
  28. Crano, W. D.; Brewer, M. B. (2002). Principles and methods of social research (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates. p. 28.

Further reading