Local average treatment effect

Last updated

The local average treatment effect (LATE), also known as the complier average causal effect (CACE), was first introduced into the econometrics literature by Guido W. Imbens and Joshua D. Angrist in 1994. [1] It is the treatment effect for the subset of the sample that takes the treatment if and only if they were assigned to the treatment, otherwise known as the compliers. It is not to be confused with the average treatment effect (ATE), which is the average subject-level treatment effect; the LATE is only the ATE among the compliers. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

Contents

General definition

In the terminology from the potential outcomes framework, denotes the potential outcome of subject , where is the binary indicator of subject ’s treatment status. Let be the vector of all s. denotes the treated potential outcome for subject , while denotes the untreated potential outcome.

The ATE is the difference between the expected value of the treatment group and the expected value of the control group. In an experimental setting, random assignment allows us to assume that the treatment group and control group have the same expected potential outcomes when treated (or untreated). This can be expressed as:

In an ideal experiment, all subjects assigned to treatment are treated, while those that are assigned to control will remain untreated. In reality, however, the compliance rate is often imperfect, which prevents researchers from identifying the ATE. In such cases, estimating the LATE becomes the more feasible option. The LATE is the average treatment effect among a specific subset of the subjects, who in this case would be the compliers.

Potential outcome framework

The causal effect of the treatment on subject is . However, we can never observe both and for the same subject. At any given time, we can only observe a subject in its treated or untreated state.

Through random assignment, the expected untreated potential outcome of the control group is the same as that of the treatment group, and the expected treated potential outcome of treatment group is the same as that of the control group. The random assignment assumption thus allows us to take the difference between the average outcome in the treatment group and the average outcome in the control group as the overall average treatment effect, such that:

Noncompliance framework

Researchers frequently encounter noncompliance problems in their experiments, whereby subjects fail to comply with their experimental assignments. Some subjects assigned to the treatment do not take the treatment, so their potential outcome of will not be revealed; and some subjects assigned to the control group will take the treatment, so they will not reveal their .

Given noncompliance, the experiment subjects can be divided into four subgroups: compliers, always-takers, never-takers and defiers. Let indicate experimental assignment, such that when , subject is assigned to treatment, and when , subject is assigned to control. Thus, represents whether subject is actually treated or not when treatment assignment is .

Compliers are subjects who will take the treatment if and only if they were assigned to the treatment group, i.e. the subpopulation with and .

Noncompliers are composed of the three remaining subgroups:

Non-compliance can take two forms. In the case of one-sided non-compliance, a number of the subjects who were assigned to the treatment group remain untreated. Subjects are thus divided into compliers and never-takers, such that for all , while or . In the case of two-sided non-compliance, a number of the subjects assigned to the treatment group fail to receive the treatment, while a number of the subjects assigned to the control group receive the treatment. In this case, subjects are divided into the four subgroups, such that both and can be 0 or 1.

Given non-compliance, we require certain assumptions to estimate the LATE. Under one-sided non-compliance, we assume non-interference and excludability. Under two-sided non-compliance, we assume non-interference, excludability, and monotonicity.

Assumptions under one-sided non-compliance

Assumptions under two-sided non-compliance

Identification

The , whereby

The measures the average effect of experimental assignment on outcomes without accounting for the proportion of the group that was actually treated (i.e. average of those assigned to treatment minus the average of those assigned to control). In experiments with full compliance, the .

The measures the proportion of subjects who are treated when they are assigned to the treatment group, minus the proportion who would have been treated even if they had been assigned to the control group, i.e. = the share of compliers.

Proof

Under one-sided noncompliance , all subjects assigned to control group will not take the treatment, therefore: [3] ,

so that

If all subjects were assigned to treatment, the expected potential outcomes would be a weighted average of the treated potential outcomes among compliers, and the untreated potential outcomes among never-takers, such that

If all subjects were assigned to control, however, the expected potential outcomes would be a weighted average of the untreated potential outcomes among compliers and never-takers, such that

Through substitution, we can express the ITT as a weighted average of the ITT among the two subpopulations (compliers and never-takers), such that

Given the exclusion and monotonicity assumption, the second half of this equation should be zero.

As such,

Application: hypothetical schedule of potential outcome under two-sided noncompliance

The table below lays out the hypothetical schedule of potential outcomes under two-sided noncompliance.

The ATE is calculated by the average of

Hypothetical Schedule of Potential Outcome under Two-sided Noncompliance
ObservationType
147301Complier
235200Never-taker
315401Complier
458311Always-taker
5410601Complier
628600Never-taker
7610401Complier
859401Complier
925311Always-taker

LATE is calculated by ATE among compliers, so

ITT is calculated by the average of ,

so

is the share of compliers

Others: LATE in instrumental variable framework

We can also think of  LATE through an IV framework. [5] Treatment assignment is the instrument that drives the causal effect on outcome through the variable of interest , such that only influences through the endogenous variable , and through no other path. This would produce the treatment effect for compliers.

In addition to the potential outcomes framework mentioned above, LATE can also be estimated through the Structural Equation Modeling (SEM) framework, originally developed for econometric applications.

SEM is derived through the following equations:

The first equation captures the first stage effect of on , adjusting for variance, where

The second equation captures the reduced form effect of on ,

The covariate-adjusted IV estimator is  the ratio

Similar to the nonzero compliance assumption, the coefficient in first stage regression needs to be significant to make a valid instrument.

However, because of SEM’s strict assumption of constant effect on every individual, the potential outcomes framework is in more prevalent use today.

Generalizing LATE

The primary goal of running an experiment is to obtain causal leverage, and it does so by randomly assigning subjects to experimental conditions, which sets it apart from observational studies. In an experiment with perfect compliance, the average treatment effect can be obtained easily. However, many experiments are likely to experience either one-sided or two-sided non-compliance. In the presence of non-compliance, the ATE can no longer be recovered. Instead, what is recovered is the average treatment effect for a certain subpopulation known as the compliers, which is the LATE.

When there may exist heterogeneous treatment effects across groups, the LATE is unlikely to be equivalent to the ATE. In one example, Angrist (1989) [6] attempts to estimate the causal effect of serving in the military on earnings, using the draft lottery as an instrument. The compliers are those who were induced by the draft lottery to serve in the military. If the research interest is on how to compensate those involuntarily taxed by the draft, LATE would be useful, since the research targets compliers. However, if researchers are concerned about a more universal draft for future interpretation, then the ATE would be more important (Imbens 2009). [1]

Generalizing from the LATE to the ATE thus becomes an important issue when the research interest lies with the causal treatment effect on a broader population, not just the compliers. In these cases, the LATE may not be the parameter of interest, and researchers have questioned its utility. [7] [8] Other researchers, however, have countered this criticism by proposing new methods to generalize from the LATE to the ATE. [9] [10] [11] Most of these involve some form of reweighting from the LATE, under certain key assumptions that allow for extrapolation from the compliers.

Reweighting

The intuition behind reweighting comes from the notion that given a certain strata, the distribution among the compliers may not reflect the distribution of the broader population. Thus, to retrieve the ATE, it is necessary to reweight based on the information gleaned from compliers. There are a number of ways that reweighting can be used to try to get at the ATE from the LATE.

Reweighting by ignorability assumption

By leveraging instrumental variable, Aronow and Carnegie (2013) [9] propose a new reweighting method called Inverse Compliance Score weighting (ICSW), with a similar intuition behind IPW. This method assumes compliance propensity is a pre-treatment covariate and compliers would have the same average treatment effect within their strata. ICSW first estimates the conditional probability of being a complier (Compliance Score) for each subject by Maximum Likelihood estimator given covariates control, then reweights each unit by its inverse of compliance score, so that compliers would have covariate distribution that matches the full population. ICSW is applicable at both one-sided and two-sided noncompliance situation.

Although one's compliance score cannot be directly observed, the probability of compliance can be estimated by observing the compliance condition from the same strata,  in other words those that share the same covariate profile. The compliance score is treated as a latent pretreatment covariate, which is independent of treatment assignment . For each unit , compliance score is denoted as , where is the covariate vector for unit .

In one-sided noncompliance case,  the population consists of only compliers and never-takers. All units assigned to the treatment group that take the treatment will be compliers. Thus, a simple bivariate regression of D on X can predict the probability of compliance.

In two-sided noncompliance case, compliance score is estimated using maximum likelihood estimation.

By assuming probit distribution for compliance and of Bernoulli distribution of D,

where  .

and  is a vector of covariates to be estimated, is the cumulative distribution function for a probit model

  • ICSW estimator

By the LATE theorem, [1]   average treatment effect for compliers can be estimated with equation:

Define the ICSW estimator  is simply  weighted by  :

This estimator is equivalent to using 2SLS estimator with weight .

  • Core assumptions under reweighting

An essential assumption of ICSW relying on  treatment homogeneity within strata, which means the treatment effect should on average be the same for everyone in the strata, not just for the compliers. If this assumption holds, LATE is equal to ATE within some covariate profile. Denote as:

Notice this is a less restrictive assumption than the traditional ignorability assumption, as this only concerns the covariate sets that are relevant to compliance score, which further leads to heterogeneity, without considering all sets of covariates.

The second assumption is consistency of  for  and the third assumption is the nonzero compliance for each strata, which is an extension of IV assumption of nonzero compliance over population. This is a reasonable assumption as if compliance score is zero for certain strata, the inverse of it would be infinite.

ICSW estimator is more sensible than that of IV estimator, as it incorporate more covariate information, such that the estimator might have higher variances. This is a general problem for IPW-style estimation. The problem is exaggerated when there is only a small population in certain strata and compliance rate is low.  One way to compromise it to winsorize the estimates, in this paper they set the threshold as =0.275. If compliance score for lower than 0.275, it is replaced by this value. Bootstrap is also recommended in the entire process to reduce uncertainty(Abadie 2002). [12]

Reweighting under monotonicity assumption

In another approach, one might assume that an underlying utility model links the never-takers, compliers, and always-takers. The ATE can be estimated by reweighting based on an extrapolation of the complier treated and untreated potential outcomes to the never-takers and always-takers. The following method is one that has been proposed by Amanda Kowalski. [11]

First, all subjects are assumed to have a utility function, determined by their individual gains from treatment and costs from treatment. Based on an underlying assumption of monotonicity, the never-takers, compliers, and always-takers can be arranged on the same continuum based on their utility function. This assumes that the always-takers have such a high utility from taking the treatment that they will take it even without encouragement. On the other hand, the never-takers have such a low utility function that they will not take the treatment despite encouragement. Thus, the never-takers can be aligned with the compliers with the lowest utilities, and the always-takers with the compliers with the highest utility functions.

In an experimental population, several aspects can be observed: the treated potential outcomes of the always-takers (those who are treated in the control group); the untreated potential outcomes of the never-takers (those who remain untreated in the treatment group); the treated potential outcomes of the always-takers and compliers (those who are treated in the treatment group); and the untreated potential outcomes of the compliers and never-takers (those who are untreated in the control group). However, the treated and untreated potential outcomes of the compliers should be extracted from the latter two observations. To do so, the LATE must be extracted from the treated population.

Assuming no defiers, it can be assumed that the treated group in the treatment condition consists of both always-takers and compliers. From the observations of the treated outcomes in the control group, the average treated outcome for always-takers can be extracted, as well as their share of the overall population. As such, the weighted average can be undone and the treated potential outcome for the compliers can be obtained; then, the LATE is subtracted to get the untreated potential outcomes for the compliers. This move will then allow extrapolation from the compliers to obtain the ATE.

Returning to the weak monotonicity assumption, which assumes that the utility function always runs in one direction, the utility of a marginal complier would be similar to the utility of a never-taker on one end, and that of an always-taker on the other end. The always-takers will have the same untreated potential outcomes as the compliers, which is its maximum untreated potential outcome. Again, this is based on the underlying utility model linking the subgroups, which assumes that the utility function of an always-taker would not be lower than the utility function of a complier. The same logic would apply to the never-takers, who are assumed to have a utility function that will always be lower than that of a complier.

Given this, extrapolation is possible by projecting the untreated potential outcomes of the compliers to the always-takers, and the treated potential outcomes of the compliers to the never-takers. In other words, if it is assumed that the untreated compliers are informative about always-takers, and the treated compliers are informative about never-takers, then comparison is now possible among the treated always-takers to their “as-if” untreated always-takers, and the untreated never-takers can be compared to their “as-if” treated counterparts. This will then allow the calculation of the overall treatment effect. Extrapolation under the weak monotonicity assumption will provide a bound, rather than a point-estimate.

Limitations

The estimation of the extrapolation to ATE from the LATE requires certain key assumptions, which may vary from one approach to another. While some may assume homogeneity within covariates, and thus extrapolate based on strata, [9] others may instead assume monotonicity. [11]   All will assume the absence of defiers within the experimental population. Some of these assumptions may be weaker than others—for example, the monotonicity assumption is weaker than the ignorability assumption. However, there are other trade-offs to consider, such as whether the estimates produced are point-estimates, or bounds. Ultimately, the literature on generalizing the LATE relies entirely on key assumptions. It is not a design-based approach per se, and the field of experiments is not usually in the habit of comparing groups unless they are randomly assigned. Even in case when assumptions are difficult to verify, researcher can incorporate through the foundation of experiment design. For example, in a typical field experiment where instrument is  “encouragement to treatment”, treatment heterogeneity could be detected by varying intensity of encouragement. If the compliance rate remains stable under different intensity, if could be a signal of homogeneity across groups. Thus, it is important to be a smart consumer of this line of literature, and examine whether the key assumptions are going to be valid in each experimental case.

Related Research Articles

The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a "true positive" detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Field experiment

Field experiments are experiments carried out outside of laboratory settings.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

Difference in differences is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment. It calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. Although it is intended to mitigate the effects of extraneous factors and selection bias, depending on how the treatment group is chosen, this method may still be subject to certain biases.

In medicine an intention-to-treat (ITT) analysis of the results of an experiment is based on the initial treatment assignment and not on the treatment eventually received. ITT analysis is intended to avoid various misleading artifacts that can arise in intervention research such as non-random attrition of participants from the study or crossover. ITT is also simpler than other forms of study design and analysis, because it does not require observation of compliance status for units assigned to different treatments or incorporation of compliance into the analysis. Although ITT analysis is widely employed in published clinical trials, it can be incorrectly described and there are some issues with its application. Furthermore, there is no consensus on how to carry out an ITT analysis in the presence of missing outcome data.

Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes, before some event occurs, to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one's hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection do not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). More specifically, PCR is used for estimating the unknown regression coefficients in a standard linear regression model.

In statistics and machine learning, lasso is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the resulting statistical model. It was originally introduced in geophysics, and later by Robert Tibshirani, who coined the term.

In statistics, a paired difference test is a type of location test that is used when comparing two sets of measurements to assess whether their population means differ. A paired difference test uses additional information about the sample that is not present in an ordinary unpaired testing situation, either to increase the statistical power, or to reduce the effects of confounders.

Inverse probability weighting is a statistical technique for calculating statistics standardized to a pseudo-population different from that in which the data was collected. Study designs with a disparate sampling population and population of target inference are common in application. There may be prohibitive factors barring researchers from directly sampling from the target population such as cost, time, or ethical concerns. A solution to this problem is to use an alternate design strategy, e.g. stratified sampling. Weighting, when correctly applied, can potentially improve the efficiency and reduce the bias of unweighted estimators.

In marketing, attribution, also known as multi-touch attribution, is the identification of a set of user actions that contribute to a desired outcome, and then the assignment of a value to each of these events. Marketing attribution provides a level of understanding of what combination of events in what particular order influence individuals to engage in a desired behavior, typically referred to as a conversion.

A stepped-wedge trial is a type of randomised controlled trial, a scientific experiment which is structured to reduce bias when testing new medical treatments, social interventions, or other testable hypotheses. In a traditional RCT, a part of the participants in the experiment are simultaneously and randomly assigned to a group that receives the treatment and another part to a group that does not. In a SWT, a logistical constraint typically prevents the simultaneous treatment of some participants, and instead, all or most participants receive the treatment in waves or "steps".

Batch normalization is a method used to make artificial neural networks faster and more stable through normalization of the layers' inputs by re-centering and re-scaling. It was proposed by Sergey Ioffe and Christian Szegedy in 2015.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

References

  1. 1 2 3 4 Imbens, Guido W.; Angrist, Joshua D. (March 1994). "Identification and Estimation of Local Average Treatment Effects" (PDF). Econometrica. 62 (2): 467. doi:10.2307/2951620. ISSN   0012-9682. JSTOR   2951620.
  2. Rubin, Donald B. (January 1978). "Bayesian Inference for Causal Effects: The Role of Randomization". The Annals of Statistics. 6 (1): 34–58. doi: 10.1214/aos/1176344064 . ISSN   0090-5364.
  3. 1 2 Angrist, Joshua D.; Imbens, Guido W.; Rubin, Donald B. (June 1996). "Identification of Causal Effects Using Instrumental Variables" (PDF). Journal of the American Statistical Association. 91 (434): 444–455. doi:10.1080/01621459.1996.10476902. ISSN   0162-1459.
  4. Imbens, G. W.; Rubin, D. B. (1997-10-01). "Estimating Outcome Distributions for Compliers in Instrumental Variables Models". The Review of Economic Studies. 64 (4): 555–574. doi:10.2307/2971731. ISSN   0034-6527. JSTOR   2971731.
  5. Hanck, Christoph (2009-10-24). "Joshua D. Angrist and Jörn-Steffen Pischke (2009): Mostly Harmless Econometrics: An Empiricist's Companion". Statistical Papers. 52 (2): 503–504. doi: 10.1007/s00362-009-0284-y . ISSN   0932-5026.
  6. Angrist, Joshua (September 1990). "The Draft Lottery and Voluntary Enlistment in the Vietnam Era". Cambridge, MA. doi: 10.3386/w3514 .Cite journal requires |journal= (help)
  7. Deaton, Angus (January 2009). "Instruments of development: Randomization in the tropics, and the search for the elusive keys to economic development". Cambridge, MA. doi: 10.3386/w14690 .Cite journal requires |journal= (help)
  8. Heckman, James J.; Urzúa, Sergio (May 2010). "Comparing IV with structural models: What simple IV can and cannot identify". Journal of Econometrics. 156 (1): 27–37. doi:10.1016/j.jeconom.2009.09.006. ISSN   0304-4076. PMC   2861784 . PMID   20440375.
  9. 1 2 3 Aronow, Peter M.; Carnegie, Allison (2013). "Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable". Political Analysis. 21 (4): 492–506. doi:10.1093/pan/mpt013. ISSN   1047-1987.
  10. Imbens, Guido W (June 2010). "Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and Urzua (2009)" (PDF). Journal of Economic Literature. 48 (2): 399–423. doi:10.1257/jel.48.2.399. ISSN   0022-0515.
  11. 1 2 3 Kowalski, Amanda (2016). "Doing More When You're Running LATE: Applying Marginal Treatment Effect Methods to Examine Treatment Effect Heterogeneity in Experiments". NBER Working Paper No. 22363. doi: 10.3386/w22363 .
  12. Abadie, Alberto (March 2002). "Bootstrap Tests for Distributional Treatment Effects in Instrumental Variable Models". Journal of the American Statistical Association. 97 (457): 284–292. CiteSeerX   10.1.1.337.3129 . doi:10.1198/016214502753479419. ISSN   0162-1459.

Further reading