Ignorability

Last updated

In statistics, ignorability is a feature of an experiment design whereby the method of data collection (and the nature of missing data) does not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

Contents

This idea is part of the Rubin Causal Inference Model, developed by Donald Rubin in collaboration with Paul Rosenbaum in the early 1970s. The exact definition differs between their articles in that period. In one of Rubins articles from 1978 Rubin discuss ignorable assignment mechanisms, [1] which can be understood as the way individuals are assigned to treatment groups is irrelevant for the data analysis, given everything that is recorded about that individual. Later, in 1983 [2] Rubin and Rosenbaum rather define strongly ignorable treatment assignment which is a stronger condition, mathematically formulated as , where is a potential outcome given treatment , is some covariates and is the actual treatment.

Pearl [3] devised a simple graphical criterion, called back-door, that entails ignorability and identifies sets of covariates that achieve this condition.

Ignorability means we can ignore how one ended up in one vs. the other group (‘treated’ , or ‘control’ ) when it comes to the potential outcome (say ). It has also been called unconfoundedness, selection on the observables, or no omitted variable bias. [4]

Formally it has been written as , or in words the potential outcome of person had they been treated or not does not depend on whether they have really been (observable) treated or not. We can ignore in other words how people ended up in one vs. the other condition, and treat their potential outcomes as exchangeable. While this seems thick, it becomes clear if we add subscripts for the ‘realized’ and superscripts for the ‘ideal’ (potential) worlds (notation suggested by David Freedman. So: Y11/*Y01 are potential Y outcomes had the person been treated (superscript 1), when in reality they have actually been (Y11, subscript 1), or not (*Y01: the signals this quantity can never be realized or observed, or is fully contrary-to-fact or counterfactual, CF).

Similarly, are potential outcomes had the person not been treated (superscript ), when in reality they have been , subscript or not actually (.

Only one of each potential outcome (PO) can be realized, the other cannot, for the same assignment to condition, so when we try to estimate treatment effects, we need something to replace the fully contrary-to-fact ones with observables (or estimate them). When ignorability/exogeneity holds, like when people are randomized to be treated or not, we can ‘replace’ *Y01 with its observable counterpart Y11, and *Y10 with its observable counterpart Y00, not at the individual level Yi’s, but when it comes to averages like E[Yi1Yi0], which is exactly the causal treatment effect (TE) one tries to recover.

Because of the ‘consistency rule’, the potential outcomes are the values actually realized, so we can write Yi0 = Yi00 and Yi1 = Yi11 (“the consistency rule states that an individual’s potential outcome under a hypothetical condition that happened to materialize is precisely the outcome experienced by that individual”, [5] p. 872). Hence TE = E[Yi1 – Yi0] = E[Yi11 – Yi00]. Now, by simply adding and subtracting the same fully counterfactual quantity *Y10 we get: E[Yi11 – Yi00] = E[Yi11 –*Y10 +*Y10 - Yi00] = E[Yi11 –*Y10] + E[*Y10 - Yi00] = ATT + {Selection Bias}, where ATT = average treatment effect on the treated [6] and the second term is the bias introduced when people have the choice to belong to either the ‘treated’ or the ‘control’ group. Ignorability, either plain or conditional on some other variables, implies that such selection bias can be ignored, so one can recover (or estimate) the causal effect.

See also

Related Research Articles

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

<span class="mw-page-title-main">Conditional independence</span> Probability theory concept

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis. Conditional independence is usually formulated in terms of conditional probability, as a special case where the probability of the hypothesis given the uninformative observation is equal to the probability without. If is the hypothesis, and and are observations, conditional independence can be stated as an equality:

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

<span class="mw-page-title-main">Confounding</span> Variable in statistics

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

<span class="mw-page-title-main">Causal model</span> Conceptual model in philosophy of science

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

Matching is a statistical technique which is used to evaluate the effect of a treatment by comparing the treated and the non-treated units in an observational study or quasi-experiment. The goal of matching is to reduce bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment reducing bias due to confounding. Propensity score matching, an early matching technique, was developed as part of the Rubin causal model, but has been shown to increase model dependence, bias, inefficiency, and power and is no longer recommended compared to other matching methods. A simple, easy-to-understand, and statistically powerful method of matching known as Coarsened Exact Matching or CEM.

Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference are common in application. There may be prohibitive factors barring researchers from directly sampling from the target population such as cost, time, or ethical concerns. A solution to this problem is to use an alternate design strategy, e.g. stratified sampling. Weighting, when correctly applied, can potentially improve the efficiency and reduce the bias of unweighted estimators.

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

A dynamic unobserved effects model is a statistical model used in econometrics for panel analysis. It is characterized by the influence of previous values of the dependent variable on its present value, and by the presence of unobservable explanatory variables.

In statistics, in particular in the design of experiments, a multi-valued treatment is a treatment that can take on more than two values. It is related to the dose-response model in the medical literature.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

In econometrics and related empirical fields, the local average treatment effect (LATE), also known as the complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is not to be confused with the average treatment effect (ATE), which includes compliers and non-compliers together. Compliance refers to the human-subject response to a proposed experimental treatment condition. Similar to the ATE, the LATE is calculated but does not include non-compliant parties. If the goal is to evaluate the effect of a treatment in ideal, compliant subjects, the LATE value will give a more precise estimate. However, it may lack external validity by ignoring the effect of non-compliance that is likely to occur in the real-world deployment of a treatment method. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

In statistics and econometrics, set identification extends the concept of identifiability in statistical models to situations where the distribution of observable variables is not informative of the exact value of a parameter, but instead constrains the parameter to lie in a strict subset of the parameter space. Statistical models that are set identified arise in a variety of settings in economics, including game theory and the Rubin causal model.

References

  1. Rubin, Donald (1978). "Bayesian Inference for Causal Effects: The Role of Randomization". The Annals of Statistics. 6 (1): 34–58. doi: 10.1214/aos/1176344064 .
  2. Rubin, Donald B.; Rosenbaum, Paul R. (1983). "The Central Role of the Propensity Score in Observational Studies for Causal Effects". Biometrika. 70 (1): 41–55. doi: 10.2307/2335942 . JSTOR   2335942.
  3. Pearl, Judea (2000). Causality : models, reasoning, and inference. Cambridge, U.K.: Cambridge University Press. ISBN   978-0-521-89560-6.
  4. Yamamoto, Teppei (2012). "Understanding the Past: Statistical Analysis of Causal Attribution". Journal of Political Science. 56 (1): 237–256. doi:10.1111/j.1540-5907.2011.00539.x. hdl: 1721.1/85887 . S2CID   15961756.
  5. Pearl, Judea (2010). "On the consistency rule in causal inference: axiom, definition, assumption, or theorem?". Epidemiology. 21 (6): 872–875. doi: 10.1097/EDE.0b013e3181f5d3fd . PMID   20864888. S2CID   4648801.
  6. Imai, Kosuke (2006). "Misunderstandings between experimentalists and observationalists about causal inference". Journal of the Royal Statistical Society, Series A (Statistics in Society). 171 (2): 481–502. doi:10.1111/j.1467-985X.2007.00527.x. S2CID   17852724.

Further reading