Forking paths problem

Last updated

The garden of forking paths is a problem in frequentist hypothesis testing through which researchers can unintentionally produce false positives for a tested hypothesis, through leaving themselves too many degrees of freedom. In contrast to fishing expeditions such as data dredging where only expected or apparently-significant results are published, this allows for a similar effect even when only one experiment is run, through a series of choices about how to implement methods and analyses, which are themselves informed by the data as it is observed and processed. [1]

Contents

History

Exploring a forking decision-tree while analyzing data was at one point grouped with the multiple comparisons problem as an example of poor statistical method. However Gelman and Loken demonstrated [2] that this can happen implicitly by researchers aware of best practices who only make a single comparison and only evaluate their data once.

The fallacy is believing an analysis to be free of multiple comparisons despite having had enough degrees of freedom in choosing the method, after seeing some or all of the data, to produce similarly-grounded false positives. Degrees of freedom can include choosing among main effects or interactions, methods for data exclusion, whether to combine different studies, and method of data analysis.

Multiverse analysis

A multiverse analysis is an approach that acknowledges the multitude of analytical paths available when analyzing data. The concept is inspired by the metaphorical "garden of forking paths," which represents the multitude of potential analyses that could be conducted on a single dataset. In a multiverse analysis, researchers systematically vary their analytical choices to explore a range of possible outcomes from the same raw data [3] [4] [5] . This involves altering variables such as data inclusion/exclusion criteria, variable transformations, outlier handling, statistical models, and hypothesis tests to generate a spectrum of results that could have been obtained given different analytic decisions.

The key benefits of a multiverse analysis include.

This approach is valuable in fields where research findings are sensitive to the methods of data analysis, such as psychology [4] , neuroscience [5] , economics, and social sciences. Multiverse analysis aims to mitigate issues related to reproducibility and replicability by revealing how different analytical choices can lead to different conclusions from the same dataset. Thus, it encourages a more nuanced understanding of data analysis, promoting integrity and credibility in scientific research.

Concepts that are closely related to multiverse analysis are specification-curve analysis [6] and the assessment of vibration of effects [7] .

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Meta-analysis</span> Statistical method that summarizes data from multiple sources

Meta-analysis is the statistical combination of the results of multiple studies addressing a similar research question. An important part of this method involves computing an effect size across all of the studies, this involves extracting effect sizes and variance measures from various studies. Meta-analyses are integral in supporting research grant proposals, shaping treatment guidelines, and influencing health policies. They are also pivotal in summarizing existing research to guide future studies, thereby cementing their role as a fundamental methodology in metascience. Meta-analyses are often, but not always, important components of a systematic review procedure. For instance, a meta-analysis may be conducted on several clinical trials of a medical treatment, in an effort to obtain a better understanding of how well the treatment works.

<span class="mw-page-title-main">Content analysis</span> Research method for studying documents and communication artifacts

Content analysis is the study of documents and communication artifacts, which might be texts of various formats, pictures, audio or video. Social scientists use content analysis to examine patterns in communication in a replicable and systematic manner. One of the key advantages of using content analysis to analyse social phenomena is their non-invasive nature, in contrast to simulating social experiences or collecting survey answers.

<span class="mw-page-title-main">Data dredging</span> Misuse of data analysis

Data dredging is the misuse of data analysis to find patterns in data that can be presented as statistically significant, thus dramatically increasing and understating the risk of false positives. This is done by performing many statistical tests on the data and only reporting those that come back with significant results.

In statistics, multicollinearity or collinearity is a situation where the predictors in a regression model are linearly dependent.

<span class="mw-page-title-main">The Garden of Forking Paths</span> 1941 short story by Jorge Luis Borges

"The Garden of Forking Paths" is a 1941 short story by Argentine writer and poet Jorge Luis Borges. It is the title story in the collection El jardín de senderos que se bifurcan (1941), which was republished in its entirety in Ficciones (Fictions) in 1944. It was the first of Borges's works to be translated into English by Anthony Boucher when it appeared in Ellery Queen's Mystery Magazine in August 1948.

In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.

<span class="mw-page-title-main">Data analysis</span> The process of analyzing data to discover useful information and support decision-making

Data analysis is the process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, and is used in different business, science, and social science domains. In today's business world, data analysis plays a role in making decisions more scientific and helping businesses operate more effectively.

In statistics, (between-) study heterogeneity is a phenomenon that commonly occurs when attempting to undertake a meta-analysis. In a simplistic scenario, studies whose results are to be combined in the meta-analysis would all be undertaken in the same way and to the same experimental protocols. Differences between outcomes would only be due to measurement error. Study heterogeneity denotes the variability in outcomes that goes beyond what would be expected due to measurement error alone.

Statistics models the collection, organization, analysis, interpretation, presentation of data, and is used to solve mathematical problems. Conclusions drawn from statistical analysis typically contain uncertainties, certainties or as they represent the probability of an event occurring. Statistics is fundamental to disciplines of science that involve predicting or classifying events based on a large set of data and is an integral part of fields such as machine learning, bioinformatics, genomics, and economics.

<span class="mw-page-title-main">Analytical skill</span> Crucial skill in all different fields of work and life

Analytical skill is the ability to deconstruct information into smaller categories in order to draw conclusions. Analytical skill consists of categories that include logical reasoning, critical thinking, communication, research, data analysis and creativity. Analytical skill is taught in contemporary education with the intention of fostering the appropriate practises for future professions. The professions that adopt analytical skill include educational institutions, public institutions, community organisations and industry.

Clinical trials are medical research studies conducted on human subjects. The human subjects are assigned to one or more interventions, and the investigators evaluate the effects of those interventions. The progress and results of clinical trials are analyzed statistically.

Robust decision-making (RDM) is an iterative decision analytics framework that aims to help identify potential robust strategies, characterize the vulnerabilities of such strategies, and evaluate the tradeoffs among them. RDM focuses on informing decisions under conditions of what is called "deep uncertainty", that is, conditions where the parties to a decision do not know or do not agree on the system models relating actions to consequences or the prior probability distributions for the key input parameters to those models.

<span class="mw-page-title-main">Replication crisis</span> Observed inability to reproduce scientific studies

The replication crisis is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method, such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

<span class="mw-page-title-main">JASP</span> Free and open-source statistical program

JASP is a free and open-source program for statistical analysis supported by the University of Amsterdam. It is designed to be easy to use, and familiar to users of SPSS. It offers standard analysis procedures in both their classical and Bayesian form. JASP generally produces APA style results tables and plots to ease publication. It promotes open science via integration with the Open Science Framework and reproducibility by integrating the analysis settings into the results. The development of JASP is financially supported by several universities and research funds. As the JASP GUI is developed in C++ using Qt framework, some of the team left to make a notable fork which is Jamovi which has its GUI developed in JavaScript and HTML5.

Patent analysis is the process of analyzing patent documents and other information from the patent lifecycle. Patent analysis is used to obtain deeper insights into different technologies and innovation. Other terms are sometimes used as synonyms for patent analytics: patent landscaping, patent mapping, or cartography. However, there is no harmonized terminology in different languages, including in French and Spanish, while in some languages terms are borrowed from other languages. Patent analytics encompasses the analysis of patent data, analysis of the scientific literature, data cleaning, text mining, machine learning, geographic mapping, and data visualisation.

Preregistration is the practice of registering the hypotheses, methods, and/or analyses of a scientific study before it is conducted. Clinical trial registration is similar, although it may not require the registration of a study's analysis protocol. Finally, registered reports include the peer review and in principle acceptance of a study protocol prior to data collection.

Researcher degrees of freedom is a concept referring to the inherent flexibility involved in the process of designing and conducting a scientific experiment, and in analyzing its results. The term reflects the fact that researchers can choose between multiple ways of collecting and analyzing data, and these decisions can be made either arbitrarily or because they, unlike other possible choices, produce a positive and statistically significant result. The researcher degrees of freedom has positives such as affording the ability to look at nature from different angles, allowing new discoveries and hypotheses to be generated. However, researcher degrees of freedom can lead to data dredging and other questionable research practices where the different interpretations and analyses are taken for granted Their widespread use represents an inherent methodological limitation in scientific research, and contributes to an inflated rate of false-positive findings. They can also lead to overestimated effect sizes.

Crowdsourced science refers to collaborative contributions of a large group of people to the different steps of the research process in science. In psychology, the nature and scope of the collaborations can vary in their application and in the benefits it offers.

Multiverse analysis is a scientific method that specifies and then runs a set of plausible alternative models or statistical tests for a single hypothesis. It is a method to address the issue that the "scientific process confronts researchers with a multiplicity of seemingly minor, yet nontrivial, decision points, each of which may introduce variability in research outcomes". A problem also known as Researcher degrees of freedom or as the garden of forking paths. It is a method arising in response to the credibility and replication crisis taking place in science, because it can diagnose the fragility or robustness of a study's findings. Multiverse analyses have been used in the fields of psychology and neuroscience. It is also a form of meta-analysis allowing researchers to provide evidence on how different model specifications impact results for the same hypothesis, and thus can point scientists toward where they might need better theory or causal models.

References

  1. "Garden of forking paths". FORRT - Framework for Open and Reproducible Research Training. Retrieved 2023-07-28.
  2. Gelman, Andrew; Loken, Eric (November 14, 2013). "The garden of forking paths: Why multiple comparisons can be a problem, even when there is no "fishing expedition" or "p-hacking" and the research hypothesis was posited ahead of time" (PDF).
  3. Steegen, Sara; Tuerlinckx, Francis; Gelman, Andrew; Vanpaemel, Wolf (2016). "Increasing Transparency Through a Multiverse Analysis". Perspectives on Psychological Science . 11 (5): 702–712. doi:10.1177/1745691616658637. ISSN   1745-6916.
  4. 1 2 Harder, Jenna A. (2020). "The Multiverse of Methods: Extending the Multiverse Analysis to Address Data-Collection Decisions". Perspectives on Psychological Science. 15 (5): 1158–1177. doi:10.1177/1745691620917678. ISSN   1745-6916.
  5. 1 2 Clayson, Peter E. (2024-03-01). "Beyond single paradigms, pipelines, and outcomes: Embracing multiverse analyses in psychophysiology". International Journal of Psychophysiology. 197: 112311. doi:10.1016/j.ijpsycho.2024.112311. ISSN   0167-8760.
  6. Simonsohn, Uri; Simmons, Joseph P.; Nelson, Leif D. (2020). "Specification curve analysis". Nature Human Behaviour. 4 (11): 1208–1214. doi:10.1038/s41562-020-0912-z. ISSN   2397-3374.
  7. Patel, Chirag J.; Burford, Belinda; Ioannidis, John P.A. (2015). "Assessment of vibration of effects due to model specification can demonstrate the instability of observational associations". Journal of Clinical Epidemiology. 68 (9): 1046–1058. doi:10.1016/j.jclinepi.2015.05.029. ISSN   0895-4356. PMC   4555355 . PMID   26279400.