Synthetic control method

Last updated
Comparison of per-capita GDP in West Germany before and after the 1990 German reunification and the hypothetical one if the reunification had not taken place. SCMGermany.png
Comparison of per-capita GDP in West Germany before and after the 1990 German reunification and the hypothetical one if the reunification had not taken place.

The synthetic control method is an econometric method used to evaluate the effect of large-scale interventions. It was proposed in a series of articles by Alberto Abadie and his coauthors. [2] [3] [4] A synthetic control is a weighted average of several units (such as regions or companies) combined to recreate the trajectory that the outcome of a treated unit would have followed in the absence of the intervention. The weights are selected in a data-driven manner to ensure that the resulting synthetic control closely resembles the treated unit in terms of key predictors of the outcome variable. [2] Unlike difference in differences approaches, this method can account for the effects of confounders changing over time, by weighting the control group to better match the treatment group before the intervention. [5] Another advantage of the synthetic control method is that it allows researchers to systematically select comparison groups. It has been applied to the fields of economics, [6] political science, [1] health policy, [5] criminology, [7] and others.

The synthetic control method combines elements from matching and difference-in-differences techniques. Difference-in-differences methods are often-used policy evaluation tools that estimate the effect of an intervention at an aggregate level (e.g. state, country, age group etc.) by averaging over a set of unaffected units. Famous examples include studies of the employment effects of a raise in the minimum wage in New Jersey fast food restaurants by comparing them to fast food restaurants just across the border in Philadelphia that were unaffected by a minimum wage raise, [8] and studies that look at crime rates in southern cities to evaluate the impact of the Mariel Boatlift on crime. [9] The control group in this specific scenario can be interpreted as a weighted average, where some units effectively receive zero weight while others get an equal, non-zero weight.

The synthetic control method tries to offer a more systematic way to assign weights to the control group. It typically uses a relatively long time series of the outcome prior to the intervention and estimates weights in such a way that the control group mirrors the treatment group as closely as possible. In particular, assume we have J observations over T time periods where the relevant treatment occurs at time where Let

be the treatment effect for unit at time , where is the outcome in absence of the treatment. Without loss of generality, if unit 1 receives the relevant treatment, only is not observed for . We aim to estimate .

Imposing some structure

and assuming there exist some optimal weights such that

for , the synthetic controls approach suggests using these weights to estimate the counterfactual

for . So under some regularity conditions, such weights would provide estimators for the treatment effects of interest. In essence, the method uses the idea of matching and using the training data pre-intervention to set up the weights and hence a relevant control post-intervention. [2]

Synthetic controls have been used in a number of empirical applications, ranging from studies examining natural catastrophes and growth, [10] studies that examine the effect of vaccine mandates on childhood immunization, [11] and studies linking political murders to house prices. [12]

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Design of experiments</span> Design of tasks

The design of experiments, also known as experiment design or experimental design, is the design of any task that aims to describe and explain the variation of information under conditions that are hypothesized to reflect the variation. The term is generally associated with experiments in which the design introduces conditions that directly affect the variation, but may also refer to the design of quasi-experiments, in which natural conditions that influence the variation are selected for observation.

<span class="mw-page-title-main">Meta-analysis</span> Statistical method that summarizes and/or integrates data from multiple sources

Meta-analysis is a method of synthesis of quantitative data from multiple independent studies addressing a common research question. An important part of this method involves computing a combined effect size across all of the studies. As such, this statistical approach involves extracting effect sizes and variance measures from various studies. By combining these effect sizes the statistical power is improved and can resolve uncertainties or discrepancies found in individual studies. Meta-analyses are integral in supporting research grant proposals, shaping treatment guidelines, and influencing health policies. They are also pivotal in summarizing existing research to guide future studies, thereby cementing their role as a fundamental methodology in metascience. Meta-analyses are often, but not always, important components of a systematic review.

Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates (CV)'s, typically nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes are a complement tool for statistical hypothesis testing, and play an important role in power analyses to assess the sample size required for new experiments. Effect size are fundamental in meta-analyses which aim to provide the combined effect size based on data from multiple studies. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

<span class="mw-page-title-main">Interaction (statistics)</span> Statistical term

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test because the latter converges to the former as the size of the dataset increases.

Alberto Abadie is a Spanish economist who has served as a professor of economics at the Massachusetts Institute of Technology since 2016, where he is also Associate Director of the Institute for Data, Systems, and Society (IDSS). He is principally known for his work in econometrics and empirical microeconomics, and is a specialist in causal inference and program evaluation. He has made fundamental contributions to important areas in econometrics and statistics, including treatment effect models, instrumental variable estimation, matching estimators, difference in differences, and synthetic controls.

In machine learning, backpropagation is a gradient estimation method commonly used for training neural networks to compute the network parameter updates.

In the statistical theory of the design of experiments, blocking is the arranging of experimental units that are similar to one another in groups (blocks) based on one or more variables. These variables are chosen carefully to minimize the impact of their variability on the observed outcomes. There are different ways that blocking can be implemented, resulting in different confounding effects. However, the different methods share the same purpose: to control variability introduced by specific factors that could influence the outcome of an experiment. The roots of blocking originated from the statistician, Ronald Fisher, following his development of ANOVA.

<span class="mw-page-title-main">Baumol effect</span> Rise of salaries in jobs that have seen little rise of productivity

In economics, the Baumol effect, also known as Baumol's cost disease, first described by William J. Baumol and William G. Bowen in the 1960s, is the tendency for wages in jobs that have experienced little or no increase in labor productivity to rise in response to rising wages in other jobs that did experience high productivity growth. In turn, these sectors of the economy become more expensive over time, because their input costs increase while productivity does not. Typically, this affects services more than manufactured goods, and in particular health, education, arts and culture.

Difference in differences is a statistical technique used in econometrics and quantitative research in the social sciences that attempts to mimic an experimental research design using observational study data, by studying the differential effect of a treatment on a 'treatment group' versus a 'control group' in a natural experiment. It calculates the effect of a treatment on an outcome by comparing the average change over time in the outcome variable for the treatment group to the average change over time for the control group. Although it is intended to mitigate the effects of extraneous factors and selection bias, depending on how the treatment group is chosen, this method may still be subject to certain biases.

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model, is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin. The name "Rubin causal model" was first coined by Paul W. Holland. The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis, though he discussed it only in the context of completely randomized experiments. Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest–posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not.

<span class="mw-page-title-main">Errors-in-variables models</span> Regression models accounting for possible errors in independent variables

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In medicine, a stepped-wedge trial is a type of randomised controlled trial (RCT). An RCT is a scientific experiment that is designed to reduce bias when testing a new medical treatment, a social intervention, or another testable hypothesis.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

In econometrics and related empirical fields, the local average treatment effect (LATE), also known as the complier average causal effect (CACE), is the effect of a treatment for subjects who comply with the experimental treatment assigned to their sample group. It is not to be confused with the average treatment effect (ATE), which includes compliers and non-compliers together. Compliance refers to the human-subject response to a proposed experimental treatment condition. Similar to the ATE, the LATE is calculated but does not include non-compliant parties. If the goal is to evaluate the effect of a treatment in ideal, compliant subjects, the LATE value will give a more precise estimate. However, it may lack external validity by ignoring the effect of non-compliance that is likely to occur in the real-world deployment of a treatment method. The LATE can be estimated by a ratio of the estimated intent-to-treat effect and the estimated proportion of compliers, or alternatively through an instrumental variable estimator.

References

  1. 1 2 Abadie, Alberto; Diamond, Alexis; Hainmueller, Jens (February 2015). "Comparative Politics and the Synthetic Control Method". American Journal of Political Science. 59 (2): 495–510. doi:10.1111/ajps.12116.
  2. 1 2 3 Abadie, Alberto; Gardeazabal, Javier (2003). "The Economic Costs of Conflict: A Case Study of the Basque Country". American Economic Review. 93 (1): 113–132. doi:10.1257/000282803321455188. ISSN   0002-8282.
  3. Abadie, Alberto; Diamond, Alexis; Hainmueller, Jens (2010). "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program". Journal of the American Statistical Association. 105 (490): 493–505. doi:10.1198/jasa.2009.ap08746. ISSN   0162-1459.
  4. Abadie, Alberto (2021). "Using Synthetic Controls: Feasibility, Data Requirements, and Methodological Aspects". Journal of Economic Literature. 59 (2): 391–425. doi: 10.1257/jel.20191450 . hdl: 1721.1/144417 . ISSN   0022-0515.
  5. 1 2 Kreif, Noémi; Grieve, Richard; Hangartner, Dominik; Turner, Alex James; Nikolova, Silviya; Sutton, Matt (December 2016). "Examination of the Synthetic Control Method for Evaluating Health Policies with Multiple Treated Units". Health Economics. 25 (12): 1514–1528. doi:10.1002/hec.3258. PMC   5111584 . PMID   26443693.
  6. Billmeier, Andreas; Nannicini, Tommaso (July 2013). "Assessing Economic Liberalization Episodes: A Synthetic Control Approach". Review of Economics and Statistics. 95 (3): 983–1001. doi:10.1162/REST_a_00324. S2CID   57561957.
  7. Saunders, Jessica; Lundberg, Russell; Braga, Anthony A.; Ridgeway, Greg; Miles, Jeremy (3 June 2014). "A Synthetic Control Approach to Evaluating Place-Based Crime Interventions". Journal of Quantitative Criminology. 31 (3): 413–434. doi:10.1007/s10940-014-9226-5. S2CID   254702864.
  8. Card, D.; Krueger, A. (1994). "Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania". American Economic Review . 84 (4): 772–793. JSTOR   2118030.
  9. Billy, Alexander (2022). "Crime and the Mariel Boatlift". International Review of Law and Economics . 72: 106094. doi:10.1016/j.irle.2022.106094. S2CID   219390309 via Science Direct.
  10. Cavallo, E.; Galliani, S.; Noy, I.; Pantano, J. (2013). "Catastrophic Natural Disasters and Economic Growth" (PDF). Review of Economics and Statistics . 95 (5): 1549–1561. doi:10.1162/REST_a_00413. S2CID   16038784.
  11. Li, Ang.; Toll, Mathew. (2020). "Removing conscientious objection: The impact of 'No Jab No Pay' and 'No Jab No Play' vaccine policies in Australia". Preventive Medicine. 145: 106406. doi:10.1016/j.ypmed.2020.106406. ISSN   0091-7435. PMID   33388333. S2CID   230489130.
  12. Gautier, P. A.; Siegmann, A.; Van Vuuren, A. (2009). "Terrorism and Attitudes towards Minorities: The effect of the Theo van Gogh murder on house prices in Amsterdam". Journal of Urban Economics . 65 (2): 113–126. doi:10.1016/j.jue.2008.10.004. S2CID   190624.