This article needs additional citations for verification .(June 2022) |
Part of a series on |
Research |
---|
|
Philosophy portal |
Field experiments are experiments carried out outside of laboratory settings.
They randomly assign subjects (or other sampling units) to either treatment or control groups to test claims of causal relationships. Random assignment helps establish the comparability of the treatment and control group so that any differences between them that emerge after the treatment has been administered plausibly reflect the influence of the treatment rather than pre-existing differences between the groups. The distinguishing characteristics of field experiments are that they are conducted in real-world settings and often unobtrusively and control not only the subject pool but selection and overtness, as defined by leaders such as John A. List. This is in contrast to laboratory experiments, which enforce scientific control by testing a hypothesis in the artificial and highly controlled setting of a laboratory. Field experiments have some contextual differences as well from naturally-occurring experiments and quasi-experiments. [1] While naturally-occurring experiments rely on an external force (e.g. a government, nonprofit, etc.) controlling the randomization treatment assignment and implementation, field experiments require researchers to retain control over randomization and implementation. Quasi-experiments occur when treatments are administered as-if randomly (e.g. U.S. Congressional districts where candidates win with slim margins, [2] weather patterns, natural disasters, etc.).
Field experiments encompass a broad array of experimental designs, each with varying degrees of generality. Some criteria of generality (e.g. authenticity of treatments, participants, contexts, and outcome measures) refer to the contextual similarities between the subjects in the experimental sample and the rest of the population. They are increasingly used in the social sciences to study the effects of policy-related interventions in domains such as health, education, crime, social welfare, and politics.
Under random assignment, outcomes of field experiments are reflective of the real-world because subjects are assigned to groups based on non-deterministic probabilities. [3] Two other core assumptions underlie the ability of the researcher to collect unbiased potential outcomes: excludability and non-interference. [4] [5] The excludability assumption provides that the only relevant causal agent is through the receipt of the treatment. Asymmetries in assignment, administration or measurement of treatment and control groups violate this assumption. The non-interference assumption, or Stable Unit Treatment Value Assumption (SUTVA), indicates that the value of the outcome depends only on whether or not the subject is assigned the treatment and not whether or not other subjects are assigned to the treatment. When these three core assumptions are met, researchers are more likely to provide unbiased estimates through field experiments.
After designing the field experiment and gathering the data, researchers can use statistical inference tests to determine the size and strength of the intervention's effect on the subjects. Field experiments allow researchers to collect diverse amounts and types of data. For example, a researcher could design an experiment that uses pre- and post-trial information in an appropriate statistical inference method to see if an intervention has an effect on subject-level changes in outcomes.
Field experiments offer researchers a way to test theories and answer questions with higher external validity because they simulate real-world occurrences. [6] Some researchers argue that field experiments are a better guard against potential bias and biased estimators. As well, field experiments can act as benchmarks for comparing observational data to experimental results. Using field experiments as benchmarks can help determine levels of bias in observational studies, and, since researchers often develop a hypothesis from an a priori judgment, benchmarks can help to add credibility to a study. [7] While some argue that covariate adjustment or matching designs might work just as well in eliminating bias, field experiments can increase certainty [8] by displacing omitted variable bias because they better allocate observed and unobserved factors. [9]
Researchers can utilize machine learning methods to simulate, reweight, and generalize experimental data. [10] This increases the speed and efficiency of gathering experimental results and reduces the costs of implementing the experiment. Another cutting-edge technique in field experiments is the use of the multi armed bandit design, [11] including similar adaptive designs on experiments with variable outcomes and variable treatments over time. [12]
There are limitations of and arguments against using field experiments in place of other research designs (e.g. lab experiments, survey experiments, observational studies, etc.). Given that field experiments necessarily take place in a specific geographic and political setting, there is a concern about extrapolating outcomes to formulate a general theory regarding the population of interest. However, researchers have begun to find strategies to effectively generalize causal effects outside of the sample by comparing the environments of the treated population and external population, accessing information from larger sample size, and accounting and modeling for treatment effects heterogeneity within the sample. [13] Others have used covariate blocking techniques to generalize from field experiment populations to external populations. [14]
Noncompliance issues affecting field experiments (both one-sided and two-sided noncompliance) [15] [16] can occur when subjects who are assigned to a certain group never receive their assigned intervention. Other problems to data collection include attrition (where subjects who are treated do not provide outcome data) which, under certain conditions, will bias the collected data. These problems can lead to imprecise data analysis; however, researchers who use field experiments can use statistical methods in calculating useful information even when these difficulties occur. [16]
Using field experiments can also lead to concerns over interference [17] between subjects. When a treated subject or group affects the outcomes of the nontreated group (through conditions like displacement, communication, contagion etc.), nontreated groups might not have an outcome that is the true untreated outcome. A subset of interference is the spillover effect, which occurs when the treatment of treated groups has an effect on neighboring untreated groups.
Field experiments can be expensive, time-consuming to conduct, difficult to replicate, and plagued with ethical pitfalls. Subjects or populations might undermine the implementation process if there is a perception of unfairness in treatment selection (e.g. in 'negative income tax' experiments communities may lobby for their community to get a cash transfer so the assignment is not purely random). There are limitations to collecting consent forms from all subjects. Comrades administering interventions or collecting data could contaminate the randomization scheme. The resulting data, therefore, could be more varied: larger standard deviation, less precision and accuracy, etc. This leads to the use of larger sample sizes for field testing. However, others argue that, even though replicability is difficult, if the results of the experiment are important then there a larger chance that the experiment will get replicated. As well, field experiments can adopt a "stepped-wedge" design that will eventually give the entire sample access to the intervention on different timing schedules. [18] Researchers can also design a blinded field experiment to remove possibilities of manipulation.
The history of experiments in the lab and the field has left longstanding impacts in the physical, natural, and life sciences. Modern use field experiments has roots in the 1700s, when James Lind utilized a controlled field experiment to identify a treatment for scurvy. [19]
Other categorical examples of sciences that use field experiments include:
An experiment is a procedure carried out to support or refute a hypothesis, or determine the efficacy or likelihood of something previously untried. Experiments provide insight into cause-and-effect by demonstrating what outcome occurs when a particular factor is manipulated. Experiments vary greatly in goal and scale but always rely on repeatable procedure and logical analysis of the results. There also exist natural experimental studies.
A randomized controlled trial is a form of scientific experiment used to control factors not under direct experimental control. Examples of RCTs are clinical trials that compare the effects of drugs, surgical techniques, medical devices, diagnostic procedures, diets or other medical treatments.
In a blind or blinded experiment, information which may influence the participants of the experiment is withheld until after the experiment is complete. Good blinding can reduce or eliminate experimental biases that arise from a participants' expectations, observer's effect on the participants, observer bias, confirmation bias, and other sources. A blind can be imposed on any participant of an experiment, including subjects, researchers, technicians, data analysts, and evaluators. In some cases, while blinding would be useful, it is impossible or unethical. For example, it is not possible to blind a patient to their treatment in a physical therapy intervention. A good clinical protocol ensures that blinding is as effective as possible within ethical and practical constraints.
Selection bias is the bias introduced by the selection of individuals, groups, or data for analysis in such a way that proper randomization is not achieved, thereby failing to ensure that the sample obtained is representative of the population intended to be analyzed. It is sometimes referred to as the selection effect. The phrase "selection bias" most often refers to the distortion of a statistical analysis, resulting from the method of collecting samples. If the selection bias is not taken into account, then some conclusions of the study may be false.
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Intuitively, IVs are used when an explanatory variable of interest is correlated with the error term (endogenous), in which case ordinary least squares and ANOVA give biased results. A valid instrument induces changes in the explanatory variable but has no independent effect on the dependent variable and is not correlated with the error term, allowing a researcher to uncover the causal effect of the explanatory variable on the dependent variable.
External validity is the validity of applying the conclusions of a scientific study outside the context of that study. In other words, it is the extent to which the results of a study can generalize or transport to other situations, people, stimuli, and times. Generalizability refers to the applicability of a predefined sample to a broader population while transportability refers to the applicability of one sample to another target population. In contrast, internal validity is the validity of conclusions drawn within the context of a particular study.
In causal inference, a confounder is a variable that influences both the dependent variable and independent variable, causing a spurious association. Confounding is a causal concept, and as such, cannot be described in terms of correlations or associations. The existence of confounders is an important quantitative explanation why correlation does not imply causation. Some notations are explicitly designed to identify the existence, possible existence, or non-existence of confounders in causal relationships between elements of a system.
The goals of experimental finance are to understand human and market behavior in settings relevant to finance. Experiments are synthetic economic environments created by researchers specifically to answer research questions. This might involve, for example, establishing different market settings and environments to observe experimentally and analyze agents' behavior and the resulting characteristics of trading flows, information diffusion and aggregation, price setting mechanism and returns processes.
In science, randomized experiments are the experiments that allow the greatest reliability and validity of statistical estimates of treatment effects. Randomization-based inference is especially important in experimental design and in survey sampling.
In fields such as epidemiology, social sciences, psychology and statistics, an observational study draws inferences from a sample to a population where the independent variable is not under the control of the researcher because of ethical concerns or logistical constraints. One common observational study is about the possible effect of a treatment on subjects, where the assignment of subjects into a treated group versus a control group is outside the control of the investigator. This is in contrast with experiments, such as randomized controlled trials, where each subject is randomly assigned to a treated group or a control group. Observational studies, for lacking an assignment mechanism, naturally present difficulties for inferential analysis.
A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to treatment or control. Instead, quasi-experimental designs typically allow the researcher to control the assignment to the treatment condition, but using some criterion other than random assignment.
The average treatment effect (ATE) is a measure used to compare treatments in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control. In a randomized trial, the average treatment effect can be estimated from a sample using a comparison in mean outcomes for treated and untreated units. However, the ATE is generally understood as a causal parameter that a researcher desires to know, defined without reference to the study design or estimation procedure. Both observational studies and experimental study designs with random assignment may enable one to estimate an ATE in a variety of ways.
Impact evaluation assesses the changes that can be attributed to a particular intervention, such as a project, program or policy, both the intended ones, as well as ideally the unintended ones. In contrast to outcome monitoring, which examines whether targets have been achieved, impact evaluation is structured to answer the question: how would outcomes such as participants' well-being have changed if the intervention had not been undertaken? This involves counterfactual analysis, that is, "a comparison between what actually happened and what would have happened in the absence of the intervention." Impact evaluations seek to answer cause-and-effect questions. In other words, they look for the changes in outcome that are directly attributable to a program.
In statistics, econometrics, political science, epidemiology, and related disciplines, a regression discontinuity design (RDD) is a quasi-experimental pretest–posttest design that aims to determine the causal effects of interventions by assigning a cutoff or threshold above or below which an intervention is assigned. By comparing observations lying closely on either side of the threshold, it is possible to estimate the average treatment effect in environments in which randomisation is unfeasible. However, it remains impossible to make true causal inference with this method alone, as it does not automatically reject causal effects by any potential confounding variable. First applied by Donald Thistlethwaite and Donald Campbell (1960) to the evaluation of scholarship programs, the RDD has become increasingly popular in recent years. Recent study comparisons of randomised controlled trials (RCTs) and RDDs have empirically demonstrated the internal validity of the design.
Matching is a statistical technique that evaluates the effect of a treatment by comparing the treated and the non-treated units in an observational study or quasi-experiment. The goal of matching is to reduce bias for the estimated treatment effect in an observational-data study, by finding, for every treated unit, one non-treated unit(s) with similar observable characteristics against which the covariates are balanced out. By matching treated units to similar non-treated units, matching enables a comparison of outcomes among treated and non-treated units to estimate the effect of the treatment reducing bias due to confounding. Propensity score matching, an early matching technique, was developed as part of the Rubin causal model, but has been shown to increase model dependence, bias, inefficiency, and power and is no longer recommended compared to other matching methods. A simple, easy-to-understand, and statistically powerful method of matching known as Coarsened Exact Matching or CEM.
Causal analysis is the field of experimental design and statistics pertaining to establishing cause and effect. Typically it involves establishing four elements: correlation, sequence in time, a plausible physical or information-theoretical mechanism for an observed effect to follow from a possible cause, and eliminating the possibility of common and alternative ("special") causes. Such analysis usually involves one or more artificial or natural experiments.
Experimental political science is the use of experiments, which may be natural or controlled, to implement the scientific method in political science.
Causal inference is the process of determining the independent, actual effect of a particular phenomenon that is a component of a larger system. The main difference between causal inference and inference of association is that causal inference analyzes the response of an effect variable when a cause of the effect variable is changed. The study of why things occur is called etiology, and can be described using the language of scientific causal notation. Causal inference is said to provide the evidence of causality theorized by causal reasoning.
Experimental benchmarking allows researchers to learn about the accuracy of non-experimental research designs. Specifically, one can compare observational results to experimental findings to calibrate bias. Under ordinary conditions, carrying out an experiment gives the researchers an unbiased estimate of their parameter of interest. This estimate can then be compared to the findings of observational research. Note that benchmarking is an attempt to calibrate non-statistical uncertainty. When combined with meta-analysis this method can be used to understand the scope of bias associated with a specific area of research.
In experiments, a spillover is an indirect effect on a subject not directly treated by the experiment. These effects are useful for policy analysis but complicate the statistical analysis of experiments.