Field experiment

Last updated

Field experiments are experiments carried out outside of laboratory settings.

Contents

They randomly assign subjects (or other sampling units) to either treatment or control groups to test claims of causal relationships. Random assignment helps establish the comparability of the treatment and control group so that any differences between them that emerge after the treatment has been administered plausibly reflect the influence of the treatment rather than pre-existing differences between the groups. The distinguishing characteristics of field experiments are that they are conducted in real-world settings and often unobtrusively and control not only the subject pool but selection and overtness, as defined by leaders such as John A. List. This is in contrast to laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. Field experiments have some contextual differences as well from naturally-occurring experiments and quasi-experiments. [1] While naturally-occurring experiments rely on an external force (e.g. a government, nonprofit, etc.) controlling the randomization treatment assignment and implementation, field experiments require researchers to retain control over randomization and implementation. Quasi-experiments occur when treatments are administered as-if randomly (e.g. U.S. Congressional districts where candidates win with slim margins, [2] weather patterns, natural disasters, etc.).

Field experiments encompass a broad array of experimental designs, each with varying degrees of generality. Some criteria of generality (e.g. authenticity of treatments, participants, contexts, and outcome measures) refer to the contextual similarities between the subjects in the experimental sample and the rest of the population. They are increasingly used in the social sciences to study the effects of policy-related interventions in domains such as health, education, crime, social welfare, and politics.

Characteristics

Under random assignment, outcomes of field experiments are reflective of the real-world because subjects are assigned to groups based on non-deterministic probabilities. [3] Two other core assumptions underlie the ability of the researcher to collect unbiased potential outcomes: excludability and non-interference. [4] [5] The excludability assumption provides that the only relevant causal agent is through the receipt of the treatment. Asymmetries in assignment, administration or measurement of treatment and control groups violate this assumption. The non-interference assumption, or Stable Unit Treatment Value Assumption (SUTVA), indicates that the value of the outcome depends only on whether or not the subject is assigned the treatment and not whether or not other subjects are assigned to the treatment. When these three core assumptions are met, researchers are more likely to provide unbiased estimates through field experiments.

After designing the field experiment and gathering the data, researchers can use statistical inference tests to determine the size and strength of the intervention's effect on the subjects. Field experiments allow researchers to collect diverse amounts and types of data. For example, a researcher could design an experiment that uses pre- and post-trial information in an appropriate statistical inference method to see if an intervention has an effect on subject-level changes in outcomes.

Practical uses

Field experiments offer researchers a way to test theories and answer questions with higher external validity because they simulate real-world occurrences. [6] Some researchers argue that field experiments are a better guard against potential bias and biased estimators. As well, field experiments can act as benchmarks for comparing observational data to experimental results. Using field experiments as benchmarks can help determine levels of bias in observational studies, and, since researchers often develop a hypothesis from an a priori judgment, benchmarks can help to add credibility to a study. [7] While some argue that covariate adjustment or matching designs might work just as well in eliminating bias, field experiments can increase certainty [8] by displacing omitted variable bias because they better allocate observed and unobserved factors. [9]

Researchers can utilize machine learning methods to simulate, reweight, and generalize experimental data. [10] This increases the speed and efficiency of gathering experimental results and reduces the costs of implementing the experiment. Another cutting-edge technique in field experiments is the use of the multi armed bandit design, [11] including similar adaptive designs on experiments with variable outcomes and variable treatments over time. [12]

Limitations

There are limitations of and arguments against using field experiments in place of other research designs (e.g. lab experiments, survey experiments, observational studies, etc.). Given that field experiments necessarily take place in a specific geographic and political setting, there is a concern about extrapolating outcomes to formulate a general theory regarding the population of interest. However, researchers have begun to find strategies to effectively generalize causal effects outside of the sample by comparing the environments of the treated population and external population, accessing information from larger sample size, and accounting and modeling for treatment effects heterogeneity within the sample. [13] Others have used covariate blocking techniques to generalize from field experiment populations to external populations. [14]

Noncompliance issues affecting field experiments (both one-sided and two-sided noncompliance) [15] [16] can occur when subjects who are assigned to a certain group never receive their assigned intervention. Other problems to data collection include attrition (where subjects who are treated do not provide outcome data) which, under certain conditions, will bias the collected data. These problems can lead to imprecise data analysis; however, researchers who use field experiments can use statistical methods in calculating useful information even when these difficulties occur. [16]

Using field experiments can also lead to concerns over interference [17] between subjects. When a treated subject or group affects the outcomes of the nontreated group (through conditions like displacement, communication, contagion etc.), nontreated groups might not have an outcome that is the true untreated outcome. A subset of interference is the spillover effect, which occurs when the treatment of treated groups has an effect on neighboring untreated groups.

Field experiments can be expensive, time-consuming to conduct, difficult to replicate, and plagued with ethical pitfalls. Subjects or populations might undermine the implementation process if there is a perception of unfairness in treatment selection (e.g. in 'negative income tax' experiments communities may lobby for their community to get a cash transfer so the assignment is not purely random). There are limitations to collecting consent forms from all subjects. Comrades administering interventions or collecting data could contaminate the randomization scheme. The resulting data, therefore, could be more varied: larger standard deviation, less precision and accuracy, etc. This leads to the use of larger sample sizes for field testing. However, others argue that, even though replicability is difficult, if the results of the experiment are important then there a larger chance that the experiment will get replicated. As well, field experiments can adopt a "stepped-wedge" design that will eventually give the entire sample access to the intervention on different timing schedules. [18] Researchers can also design a blinded field experiment to remove possibilities of manipulation.

Examples

The history of experiments in the lab and the field has left longstanding impacts in the physical, natural, and life sciences. Modern use field experiments has roots in the 1700s, when James Lind utilized a controlled field experiment to identify a treatment for scurvy. [19]

Other categorical examples of sciences that use field experiments include:

See also

Related Research Articles

Econometrics is an application of statistical methods to economic data in order to give empirical content to economic relationships. More precisely, it is "the quantitative analysis of actual economic phenomena based on the concurrent development of theory and observation, related by appropriate methods of inference." An introductory economics textbook describes econometrics as allowing economists "to sift through mountains of data to extract simple relationships." Jan Tinbergen is one of the two founding fathers of econometrics. The other, Ragnar Frisch, also coined the term in the sense in which it is used today.

<span class="mw-page-title-main">Experiment</span> Scientific procedure performed to validate a hypothesis

An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.

<span class="mw-page-title-main">Randomized controlled trial</span> Form of scientific experiment

A randomized controlled trial is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical techniques, medical devices, diagnostic procedures, diets or other medical treatments.

In a blind or blinded experiment, information which may influence the participants of the experiment is withheld until after the experiment is complete. Good blinding can reduce or eliminate experimental biases that arise from a participants' expectations, observer's effect on the participants, observer bias, confirmation bias, and other sources. A blind can be imposed on any participant of an experiment, including subjects, researchers, technicians, data analysts, and evaluators. In some cases, while blinding would be useful, it is impossible or unethical. For example, it is not possible to blind a patient to their treatment in a physical therapy intervention. A good clinical protocol ensures that blinding is as effective as possible within ethical and practical constraints.

Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed. It is sometimes referred to as the selection effect. The phrase "selection bias" most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may be false.

External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can generalize or transport to other situations, people, stimuli, and times. Generalizability refers to the applicability of a predefined sample to a broader population while transportability refers to the applicability of one sample to another target population. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study.

<span class="mw-page-title-main">Confounding</span> Variable or factor in causal inference

In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.

The goals of experimental finance are to understand human and market behavior in settings relevant to finance. Experiments are synthetic economic environments created by researchers specifically to answer research questions. This might involve, for example, establishing different market settings and environments to observe experimentally and analyze agents' behavior and the resulting characteristics of trading flows, information diffusion and aggregation, price setting mechanism and returns processes.

<span class="mw-page-title-main">Randomized experiment</span> Experiment using randomness in some aspect, usually to aid in removal of bias

In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.

<span class="mw-page-title-main">Observational study</span> Study with uncontrolled variable of interest

In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group. Observational studies, for lacking an assignment mechanism, naturally present difficulties for inferential analysis.

<span class="mw-page-title-main">Quasi-experiment</span> Empirical interventional study

A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.

<span class="mw-page-title-main">Joshua Angrist</span> Israeli–American economist

Joshua David Angrist is an Israeli–American economist and Ford Professor of Economics at the Massachusetts Institute of Technology. Angrist, together with Guido Imbens, was awarded the Nobel Memorial Prize in Economics in 2021 "for their methodological contributions to the analysis of causal relationships".

The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.

Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.

In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest–posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.

Matching is a statistical technique that evaluates the effect of a treatment by comparing the treated and the non-treated units in an observational study or quasi-experiment. The goal of matching is to reduce bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment reducing bias due to confounding. Propensity score matching, an early matching technique, was developed as part of the Rubin causal model, but has been shown to increase model dependence, bias, inefficiency, and power and is no longer recommended compared to other matching methods. A simple, easy-to-understand, and statistically powerful method of matching known as Coarsened Exact Matching or CEM.

Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.

Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.

Experimental benchmarking allows researchers to learn about the accuracy of non-experimental research designs. Specifically, one can compare observational results to experimental findings to calibrate bias. Under ordinary conditions, carrying out an experiment gives the researchers an unbiased estimate of their parameter of interest. This estimate can then be compared to the findings of observational research. Note that benchmarking is an attempt to calibrate non-statistical uncertainty. When combined with meta-analysis this method can be used to understand the scope of bias associated with a specific area of research.

In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.

References

  1. Meyer, B. D. (1995). "Natural and quasi-experiments in economics" (PDF). Journal of Business & Economic Statistics. 13 (2): 151–161. doi:10.2307/1392369. JSTOR   1392369.
  2. Lee, D. S.; Moretti, E.; Butler, M. J. (2004). "Do voters affect or elect policies? Evidence from the US House". The Quarterly Journal of Economics. 119 (3): 807–859. doi:10.1162/0033553041502153. JSTOR   25098703.
  3. Rubin, Donald B. (2005). "Causal Inference Using Potential Outcomes". Journal of the American Statistical Association. 100 (469): 322–331. doi:10.1198/016214504000001880. S2CID   842793.
  4. Nyman, Pär (2017). "Door-to-door canvassing in the European elections: Evidence from a Swedish field experiment". Electoral Studies. 45: 110–118. doi:10.1016/j.electstud.2016.12.002.
  5. Broockman, David E.; Kalla, Joshua L.; Sekhon, Jasjeet S. (2017). "The Design of Field Experiments with Survey Outcomes: A Framework for Selecting More Efficient, Robust, and Ethical Designs". Political Analysis. 25 (4): 435–464. doi:10.1017/pan.2017.27. S2CID   233321039.
  6. Duflo, Esther (2006). Field Experiments in Development Economics (Report). Massachusetts Institute of Technology.
  7. Harrison, G. W.; List, J. A. (2004). "Field experiments". Journal of Economic Literature. 42 (4): 1009–1055. doi:10.1257/0022051043004577. JSTOR   3594915.
  8. LaLonde, R. J. (1986). "Evaluating the econometric evaluations of training programs with experimental data". The American Economic Review. 76 (4): 604–620. JSTOR   1806062.
  9. Gordon, Brett R.; Zettelmeyer, Florian; Bhargava, Neha; Chapsky, Dan (2017). "A Comparison of Approaches to Advertising Measurement: Evidence from Big Field Experiments at Facebook". Marketing Science. doi:10.2139/ssrn.3033144. S2CID   197733986.
  10. Athey, Susan; Imbens, Guido (2016). "Recursive partitioning for heterogeneous causal effects: Table 1". Proceedings of the National Academy of Sciences. 113 (27): 7353–7360. doi: 10.1073/pnas.1510489113 . PMC   4941430 . PMID   27382149.
  11. Scott, Steven L. (2010). "A modern Bayesian look at the multi-armed bandit". Applied Stochastic Models in Business and Industry. 26 (6): 639–658. doi:10.1002/asmb.874.
  12. Raj, V.; Kalyani, S. (2017). "Taming non-stationary bandits: A Bayesian approach". arXiv: 1707.09727 [stat.ML].
  13. Dehejia, R.; Pop-Eleches, C.; Samii, C. (2015). From local to global: External validity in a fertility natural experiment (PDF) (Report). National Bureau of Economic Research. w21459.
  14. Egami, Naoki; Hartman, Erin (19 July 2018). "Covariate Selection for Generalizing Experimental Results" (PDF). Princeton.edu. Archived from the original (PDF) on 10 July 2020. Retrieved 31 December 2018.
  15. Blackwell, Matthew (2017). "Instrumental Variable Methods for Conditional Effects and Causal Interaction in Voter Mobilization Experiments". Journal of the American Statistical Association. 112 (518): 590–599. doi:10.1080/01621459.2016.1246363. S2CID   55878137.
  16. 1 2 Aronow, Peter M.; Carnegie, Allison (2013). "Beyond LATE: Estimation of the Average Treatment Effect with an Instrumental Variable". Political Analysis. 21 (4): 492–506. doi:10.1093/pan/mpt013.
  17. Aronow, P. M.; Samii, C. (2017). "Estimating average causal effects under general interference, with application to a social network experiment". The Annals of Applied Statistics. 11 (4): 1912–1947. arXiv: 1305.6156 . doi:10.1214/16-AOAS1005. S2CID   26963450.
  18. Woertman, W.; de Hoop, E.; Moerbeek, M.; Zuidema, S. U.; Gerritsen, D. L.; Teerenstra, S. (2013). "Stepped wedge designs could reduce the required sample size in cluster randomized trials". Journal of Clinical Epidemiology. 66 (7): 752–758. doi: 10.1016/j.jclinepi.2013.01.009 . hdl: 2066/117688 . PMID   23523551.
  19. Tröhler, U. (2005). "Lind and scurvy: 1747 to 1795". Journal of the Royal Society of Medicine. 98 (11): 519–522. doi:10.1177/014107680509801120. PMC   1276007 . PMID   16260808.
  20. Bertrand, Marianne; Mullainathan, Sendhil (2004). "Are Emily and Greg more employable than Lakisha and Jamal? A field experiment on labor market discrimination" (PDF). American Economic Review. 94 (4): 991–1013. doi:10.1257/0002828042002561.
  21. Gneezy, Uri; List, John A (2006). "Putting behavioral economics to work: Testing for gift exchange in labor markets using field experiments" (PDF). Econometrica. 74 (5): 1365–1384. doi:10.1111/j.1468-0262.2006.00707.x.
  22. Ahmed, Ali M; Hammarstedt, Mats (2008). "Discrimination in the rental housing market: A field experiment on the Internet". Journal of Urban Economics. 64 (2): 362–372. doi:10.1016/j.jue.2008.02.004.
  23. Edelman, Benjamin; Luca, Michael; Svirsky, Dan (2017). "Racial discrimination in the sharing economy: Evidence from a field experiment". American Economic Journal: Applied Economics. 9 (2): 1–22. doi: 10.1257/app.20160213 .
  24. Pager, Devah; Shepherd, Hana (2008). "The sociology of discrimination: Racial discrimination in employment, housing, credit, and consumer markets". Annual Review of Sociology. 34: 181–209. doi:10.1146/annurev.soc.33.040406.131740. PMC   2915460 . PMID   20689680.
  25. Nesseler, Cornel; Carlos, Gomez-Gonzalez; Dietl, Helmut (2019). "What's in a name? Measuring access to social activities with a field experiment". Palgrave Communications. 5: 1–7. doi: 10.1057/s41599-019-0372-0 . hdl: 11250/2635691 .
  26. Ashraf, Nava; Berry, James; Shapiro, Jesse M (2010). "Can higher prices stimulate product use? Evidence from a field experiment in Zambia" (PDF). American Economic Review. 100 (5): 2383–2413. doi:10.1257/aer.100.5.2383. S2CID   6392533.
  27. Karlan, Dean; List, John A (2007). "Does price matter in charitable giving? Evidence from a large-scale natural field experiment" (PDF). American Economic Review. 97 (5): 1774–1793. doi:10.1257/aer.97.5.1774. S2CID   10041821.
  28. Fryer Jr, Roland G (2014). "Injecting charter school best practices into traditional public schools: Evidence from field experiments". The Quarterly Journal of Economics. 129 (3): 1355–1407. doi:10.1093/qje/qju011.
  29. Field, Erica; Pande, Rohini (2008). "Repayment frequency and default in microfinance: evidence from India". Journal of the European Economic Association. 6 (2–3): 501–509. doi:10.1162/JEEA.2008.6.2-3.501.
  30. Fisher, R.A. (1937). The Design of Experiments (PDF). Oliver and Boyd Ltd.
  31. Gosnell, Harold F. (1926). "An Experiment in the Stimulation of Voting". American Political Science Review. 20 (4): 869–874. doi: 10.1017/S0003055400110524 .
  32. Grodwohl, Jean-Baptiste; Porto, Franco; El-Hani, Charbel N. (2018-07-31). "The instability of field experiments: building an experimental research tradition on the rocky seashores (1950–1985)". History and Philosophy of the Life Sciences. 40 (3): 45. doi:10.1007/s40656-018-0209-y. ISSN   1742-6316. PMID   30066110. S2CID   51889466.