# Average treatment effect

Last updated

The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial (i.e., an experimental study), the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter (i.e., an estimate or property of a population) that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

## General definition

Originating from early statistical analysis in the fields of agriculture and medicine, the term "treatment" is now applied, more generally, to other fields of natural and social science, especially psychology, political science, and economics such as, for example, the evaluation of the impact of public policies. The nature of a treatment or outcome is relatively unimportant in the estimation of the ATE—that is to say, calculation of the ATE requires that a treatment be applied to some units and not others, but the nature of that treatment (e.g., a pharmaceutical, an incentive payment, a political advertisement) is irrelevant to the definition and estimation of the ATE.

The expression "treatment effect" refers to the causal effect of a given treatment or intervention (for example, the administering of a drug) on an outcome variable of interest (for example, the health of the patient). In the Neyman-Rubin "potential outcomes framework" of causality a treatment effect is defined for each individual unit in terms of two "potential outcomes." Each unit has one outcome that would manifest if the unit were exposed to the treatment and another outcome that would manifest if the unit were exposed to the control. The "treatment effect" is the difference between these two potential outcomes. However, this individual-level treatment effect is unobservable because individual units can only receive the treatment or the control, but not both. Random assignment to treatment ensures that units assigned to the treatment and units assigned to the control are identical (over a large number of iterations of the experiment). Indeed, units in both groups have identical distributions of covariates and potential outcomes. Thus the average outcome among the treatment units serves as a counterfactual for the average outcome among the control units. The differences between these two averages is the ATE, which is an estimate of the central tendency of the distribution of unobservable individual-level treatment effects. [1] If a sample is randomly constituted from a population, the sample ATE (abbreviated SATE) is also an estimate of the population ATE (abbreviated PATE). [2]

While an experiment ensures, in expectation, that potential outcomes (and all covariates) are equivalently distributed in the treatment and control groups, this is not the case in an observational study. In an observational study, units are not assigned to treatment and control randomly, so their assignment to treatment may depend on unobserved or unobservable factors. Observed factors can be statistically controlled (e.g., through regression or matching), but any estimate of the ATE could be confounded by unobservable factors that influenced which units received the treatment versus the control.

## Formal definition

In order to define formally the ATE, we define two potential outcomes : ${\displaystyle y_{0}(i)}$ is the value of the outcome variable for individual ${\displaystyle i}$ if they are not treated, ${\displaystyle y_{1}(i)}$ is the value of the outcome variable for individual ${\displaystyle i}$ if they are treated. For example, ${\displaystyle y_{0}(i)}$ is the health status of the individual if they are not administered the drug under study and ${\displaystyle y_{1}(i)}$ is the health status if they are administered the drug.

The treatment effect for individual ${\displaystyle i}$ is given by ${\displaystyle y_{1}(i)-y_{0}(i)=\beta (i)}$. In the general case, there is no reason to expect this effect to be constant across individuals. The average treatment effect is given by

${\displaystyle {\text{ATE}}={\frac {1}{N}}\sum _{i}(y_{1}(i)-y_{0}(i))}$

where the summation occurs over all ${\displaystyle N}$ individuals in the population.

If we could observe, for each individual, ${\displaystyle y_{1}(i)}$ and ${\displaystyle y_{0}(i)}$ among a large representative sample of the population, we could estimate the ATE simply by taking the average value of ${\displaystyle y_{1}(i)-y_{0}(i)}$ across the sample. However, we can not observe both ${\displaystyle y_{1}(i)}$ and ${\displaystyle y_{0}(i)}$ for each individual since an individual cannot be both treated and not treated. For example, in the drug example, we can only observe ${\displaystyle y_{1}(i)}$ for individuals who have received the drug and ${\displaystyle y_{0}(i)}$ for those who did not receive it. This is the main problem faced by scientists in the evaluation of treatment effects and has triggered a large body of estimation techniques.

## Estimation

Depending on the data and its underlying circumstances, many methods can be used to estimate the ATE. The most common ones are:

## An example

Consider an example where all units are unemployed individuals, and some experience a policy intervention (the treatment group), while others do not (the control group). The causal effect of interest is the impact a job search monitoring policy (the treatment) has on the length of an unemployment spell: On average, how much shorter would one's unemployment be if they experienced the intervention? The ATE, in this case, is the difference in expected values (means) of the treatment and control groups' length of unemployment.

A positive ATE, in this example, would suggest that the job policy increased the length of unemployment. A negative ATE would suggest that the job policy decreased the length of unemployment. An ATE estimate equal to zero would suggest that there was no advantage or disadvantage to providing the treatment in terms of the length of unemployment. Determining whether an ATE estimate is distinguishable from zero (either positively or negatively) requires statistical inference.

Because the ATE is an estimate of the average effect of the treatment, a positive or negative ATE does not indicate that any particular individual would benefit or be harmed by the treatment. Thus the average treatment effect neglects the distribution of the treatment effect. Some parts of the population might be worse off with the treatment even if the mean effect is positive.

## Heterogenous treatment effects

Some researchers call a treatment effect "heterogenous" if it affects different individuals differently (heterogeneously). For example, perhaps the above treatment of a job search monitoring policy affected men and women differently, or people who live in different states differently. ATE requires a strong assumption known as the stable unit treatment value assumption (SUTVA) which requires the value of the potential outcome ${\displaystyle y(i)}$ be unaffected by the mechanism used to assign the treatment and the treatment exposure of all other individuals. Let ${\displaystyle d}$ be the treatment, the treatment effect for individual ${\displaystyle i}$ is given by ${\displaystyle y_{1}(i,d)-y_{0}(i,d)}$. The SUTVA assumption allows us to declare ${\displaystyle y_{1}(i,d)=y_{1}(i),y_{0}(i,d)=y_{0}(i)}$.

One way to look for heterogeneous treatment effects is to divide the study data into subgroups (e.g., men and women, or by state), and see if the average treatment effects are different by subgroup. If the average treatment effects are different, SUTVA is violated. A per-subgroup ATE is called a "conditional average treatment effect" (CATE), i.e. the ATE conditioned on membership in the subgroup. CATE can be used as an estimate if SUTVA does not hold.

A challenge with this approach is that each subgroup may have substantially less data than the study as a whole, so if the study has been powered to detect the main effects without subgroup analysis, there may not be enough data to properly judge the effects on subgroups.

There is some work on detecting heterogenous treatment effects using random forests. [3] [4]

## Related Research Articles

Econometrics is the application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference". An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships". Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term, in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can be generalized to and across other situations, people, stimuli, and times. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study. Because general conclusions are almost always a goal in research, external validity is an important property of any study. Mathematical analysis of external validity concerns a determination of whether generalization across heterogeneous populations is feasible, and devising statistical and computational methods that produce valid generalizations.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In statistics, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations.

Difference in differences is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment. It calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. Although it is intended to mitigate the effects of extraneous factors and selection bias, depending on how the treatment group is chosen, this method may still be subject to certain biases.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

In statistics, ignorability is a feature of an experiment design whereby the method of data collection do not depend on the missing data. A missing data mechanism such as a treatment assignment or survey sampling strategy is "ignorable" if the missing data matrix, which indicates which variables are observed or missing, is independent of the missing data conditional on the observed data.

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

In statistics, a mediation model seeks to identify and explain the mechanism or process that underlies an observed relationship between an independent variable and a dependent variable via the inclusion of a third hypothetical variable, known as a mediator variable. Rather than a direct causal relationship between the independent variable and the dependent variable, a mediation model proposes that the independent variable influences the (non-observable) mediator variable, which in turn influences the dependent variable. Thus, the mediator variable serves to clarify the nature of the relationship between the independent and dependent variables.

Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest-posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. Paul R. Rosenbaum and Donald Rubin introduced the technique in 1983.

Principal stratification is a statistical technique used in causal inference when adjusting results for post-treatment covariates. The idea is to identify underlying strata and then compute causal effects only within strata. It is a generalization of the local average treatment effect (LATE).

Control functions are statistical methods to correct for endogeneity problems by modelling the endogeneity in the error term. The approach thereby differs in important ways from other models that try to account for the same econometric problem. Instrumental variables, for example, attempt to model the endogenous variable X as an often invertible model with respect to a relevant and exogenous instrument Z. Panel analysis uses special data properties to difference out unobserved heterogeneity that is assumed to be fixed over time.

A dynamic unobserved effects model is a statistical model used in econometrics for panel analysis. It is characterized by the influence of previous values of the dependent variable on its present value, and by the presence of unobservable explanatory variables.

In statistics, in particular in the design of experiments, a multi-valued treatment is a treatment that can take on more than two values. It is related to the dose-response model in the medical literature.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

The local average treatment effect (LATE), also known as the complier average causal effect (CACE), was first introduced into the econometrics literature by Guido W. Imbens and Joshua D. Angrist in 1994. It is the treatment effect for the subset of the sample that takes the treatment if and only if they were assigned to the treatment, otherwise known as the compliers. It is not to be confused with the average treatment effect (ATE), which is the average subject-level treatment effect; the LATE is only the ATE among the compliers. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

## References

1. Holland, Paul W. (1986). "Statistics and Causal Inference". J. Amer. Statist. Assoc. 81 (396): 945–960. doi:10.1080/01621459.1986.10478354. JSTOR   2289064.
2. Imai, Kosuke; King, Gary; Stuart, Elizabeth A. (2008). "Misunderstandings Between Experimentalists and Observationalists About Causal Inference". J. R. Stat. Soc. Ser. A . 171 (2): 481–502. doi:10.1111/j.1467-985X.2007.00527.x.
3. Wager, Stefan; Athey, Susan (2015). "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests". arXiv: [stat.ME].
• Wooldridge, Jeffrey M. (2013). "Policy Analysis with Pooled Cross Sections". Introductory Econometrics: A Modern Approach. Mason, OH: Thomson South-Western. pp. 438–443. ISBN   978-1-111-53104-1.