Combinatorial meta-analysis

Last updated

Combinatorial meta-analysis (CMA) is the study of the behaviour of statistical properties of combinations of studies from a meta-analytic dataset (typically in social science research). In an article that develops the notion of "gravity" in the context of meta-analysis, Travis Gee [1] proposed that the jackknife methods applied to meta-analysis in that article could be extended to examine all possible combinations of studies (where practical) or random subsets of studies (where the combinatorics of the situation made it computationally infeasible).

Contents

Concept

In the original article, [1] k objects (studies) are combined k-1 at a time (jackknife estimation), resulting in k estimates. It is observed that this is a special case of the more general approach of CMA which computes results for k studies taken 1, 2, 3 ... k  1, k at a time.

Where it is computationally feasible to obtain all possible combinations, the resulting distribution of statistics is termed "exact CMA." Where the number of possible combinations is prohibitively large, it is termed "approximate CMA."

CMA makes it possible to study the relative behaviour of different statistics under combinatorial conditions. This differs from the standard approach in meta-analysis of adopting a single method and computing a single result, and allows significant triangulation to occur, by computing different indices for each combination and examining whether they all tell the same story.

Implications

An implication of this is that where multiple random intercepts exist, the heterogeneity within certain combinations will be minimized. CMA can thus be used as a data mining method to identify the number of intercepts that may be present in the dataset by looking at which studies are included in the local minima that may be obtained through recombination.

A further implication of this is that arguments over inclusion or exclusion of studies may be moot when the distribution of all possible results is taken into account. A useful tool developed by Gee (reference to come when published) is the "PPES" plot (standing for "Probability of Positive Effect Size", assuming differences are scaled such that larger in a positive direction is desired). For each subset of combinations, where studies are taken j = 1, 2, ... k  1, k at a time, the proportion of results that show a positive effect size (either WMD or SMD will work) is taken, and this is plotted against j. This can be adapted to a "PMES" plot (standing for "Probability of Minimal Effect Size"), where the proportion of studies exceeding some minimal effect size (e.g., SMD = 0.10) is taken for each value of j = 1, 2, ... k  1, k. Where a clear effect is present, this plot should asymptote to near 1.0 fairly rapidly. With this, it is possible then that, for instance, disputes over the inclusion or exclusion of two or three studies out of a dozen or more may be framed in the context of a plot that shows a clear effect for any combination of 7 or more studies.

It is also possible through CMA to examine the relationship of covariates with effect sizes. For example, if industry funding is suspected as a source of bias, then the proportion of studies in a given subset that were industry funded can be computed and plotted directly against the effect size estimate. If average age in the various studies was itself fairly variable, then the mean of these means across studies in a given combination can be obtained, and similarly plotted.

Implementations

Gee's original software for performing jackknife and combinatorial meta analysis was based on older meta-analytic macros written in the SAS programming language. It was the basis of one report in the area of arthritis treatment. [2] While this software was shared with colleagues informally, it was not published. A later meta-analysis applied the concept in the context of the treatment of diarrhea. [3]

A jackknife method was applied to meta-analytic data some years later [4] but it does not appear that specialized software was developed for the task. Other commentators have also called for related methods, [5] apparently unaware of the original work. More recent work by a software porting team at Brown University [6] has implemented the concept in STATA. [7]

Limitations

CMA does not solve meta-analysis's problem of "garbage in, garbage out." However, when a class of studies is deemed garbage by a critic, it does offer a way of examining the extent to which those studies may have changed a result. Similarly, it offers no direct solution to the problem of which method to choose for combination or weighting. What it does offer, as noted above, is triangulation, where agreements between methods may be obtained, and disagreements between methods understood across the range of possible combinations of studies.

Related Research Articles

Combinatorics is an area of mathematics primarily concerned with counting, both as a means and an end in obtaining results, and certain properties of finite structures. It is closely related to many other areas of mathematics and has many applications ranging from logic to statistical physics and from evolutionary biology to computer science.

In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter. For example, given three fruits, say an apple, an orange and a pear, there are three combinations of two that can be drawn from this set: an apple and a pear; an apple and an orange; or a pear and an orange. More formally, a k-combination of a set S is a subset of k distinct elements of S. So, two combinations are identical if and only if each combination has the same members. If the set has n elements, the number of k-combinations, denoted by or , is equal to the binomial coefficient

<span class="mw-page-title-main">Discrete mathematics</span> Study of discrete mathematical structures

Discrete mathematics is the study of mathematical structures that can be considered "discrete" rather than "continuous". Objects studied in discrete mathematics include integers, graphs, and statements in logic. By contrast, discrete mathematics excludes topics in "continuous mathematics" such as real numbers, calculus or Euclidean geometry. Discrete objects can often be enumerated by integers; more formally, discrete mathematics has been characterized as the branch of mathematics dealing with countable sets. However, there is no exact definition of the term "discrete mathematics".

<span class="mw-page-title-main">Meta-analysis</span> Statistical method that summarizes data from multiple sources

A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting measurements that are expected to have some degree of error. The aim then is to use approaches from statistics to derive a pooled estimate closest to the unknown common truth based on how this error is perceived. It is thus a basic methodology of Metascience. Meta-analytic results are considered the most trustworthy source of evidence by the evidence-based medicine literature.

<span class="mw-page-title-main">Cross-validation (statistics)</span> Statistical model validation technique

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run, and a dataset of unknown data against which the model is tested. The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset.

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

In published academic research, publication bias occurs when the outcome of an experiment or research study biases the decision to publish or otherwise distribute it. Publishing only results that show a significant finding disturbs the balance of findings in favor of positive results. The study of publication bias is an important topic in metascience.

In mathematics, and in particular in combinatorics, the combinatorial number system of degree k, also referred to as combinadics, or the Macaulay representation of an integer, is a correspondence between natural numbers N and k-combinations. The combinations are represented as strictly decreasing sequences ck > ... > c2 > c1 ≥ 0 where each ci corresponds to the index of a chosen element in a given k-combination. Distinct numbers correspond to distinct k-combinations, and produce them in lexicographic order. The numbers less than correspond to all k-combinations of {0, 1, ..., n − 1}. The correspondence does not depend on the size n of the set that the k-combinations are taken from, so it can be interpreted as a map from N to the k-combinations taken from N; in this view the correspondence is a bijection.

<span class="mw-page-title-main">Factorial experiment</span> Experimental design in statistics

In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests
  2. Bootstrapping
  3. Cross validation

In statistics, (between-) study heterogeneity is a phenomenon that commonly occurs when attempting to undertake a meta-analysis. In a simplistic scenario, studies whose results are to be combined in the meta-analysis would all be undertaken in the same way and to the same experimental protocols. Differences between outcomes would only be due to measurement error. Study heterogeneity denotes the variability in outcomes that goes beyond what would be expected due to measurement error alone.

<span class="mw-page-title-main">Fisher's method</span> Statistical method

In statistics, Fisher's method, also known as Fisher's combined probability test, is a technique for data fusion or "meta-analysis" (analysis of analyses). It was developed by and named for Ronald Fisher. In its basic form, it is used to combine the results from several independence tests bearing upon the same overall hypothesis (H0).

<span class="mw-page-title-main">Forest plot</span> Graphical display of scientific results

A forest plot, also known as a blobbogram, is a graphical display of estimated results from a number of scientific studies addressing the same question, along with the overall results. It was developed for use in medical research as a means of graphically representing a meta-analysis of the results of randomized controlled trials. In the last twenty years, similar meta-analytical techniques have been applied in observational studies and forest plots are often used in presenting the results of such studies also.

<span class="mw-page-title-main">Funnel plot</span>

A funnel plot is a graph designed to check for the existence of publication bias; funnel plots are commonly used in systematic reviews and meta-analyses. In the absence of publication bias, it assumes that studies with high precision will be plotted near the average, and studies with low precision will be spread evenly on both sides of the average, creating a roughly funnel-shaped distribution. Deviation from this shape can indicate publication bias.

In statistics, a Yates analysis is an approach to analyzing data obtained from a designed experiment, where a factorial design has been used. Full- and fractional-factorial designs are common in designed experiments for engineering and scientific applications. In these designs, each factor is assigned two levels, typically called the low and high levels, and referred to as "-" and "+". For computational purposes, the factors are scaled so that the low level is assigned a value of -1 and the high level is assigned a value of +1.

<span class="mw-page-title-main">Plot (graphics)</span>

A plot is a graphical technique for representing a data set, usually as a graph showing the relationship between two or more variables. The plot can be drawn by hand or by a computer. In the past, sometimes mechanical or electronic plotters were used. Graphs are a visual representation of the relationship between variables, which are very useful for humans who can then quickly derive an understanding which may not have come from lists of values. Given a scale or ruler, graphs can also be used to read off the value of an unknown variable plotted as a function of a known one, but this can also be done with data presented in tabular form. Graphs of functions are used in mathematics, sciences, engineering, technology, finance, and other areas.

Psychometric software is software that is used for psychometric analysis of data from tests, questionnaires, or inventories reflecting latent psychoeducational variables. While some psychometric analyses can be performed with standard statistical software like SPSS, most analyses require specialized tools.

Seed-based d mapping or SDM is a statistical technique created by Joaquim Radua for meta-analyzing studies on differences in brain activity or structure which used neuroimaging techniques such as fMRI, VBM, DTI or PET. It may also refer to a specific piece of software created by the SDM Project to carry out such meta-analyses.

The missing heritability problem is the fact that single genetic variations cannot account for much of the heritability of diseases, behaviors, and other phenotypes. This is a problem that has significant implications for medicine, since a person's susceptibility to disease may depend more on the combined effect of all the genes in the background than on the disease genes in the foreground, or the role of genes may have been severely overestimated.

Meta-regression is defined to be a meta-analysis that uses regression analysis to combine, compare, and synthesize research findings from multiple studies while adjusting for the effects of available covariates on a response variable. A meta-regression analysis aims to reconcile conflicting studies or corroborate consistent ones; a meta-regression analysis is therefore characterized by the collated studies and their corresponding data sets—whether the response variable is study-level data or individual participant data. A data set is aggregate when it consists of summary statistics such as the sample mean, effect size, or odds ratio. On the other hand, individual participant data are in a sense raw in that all observations are reported with no abridgment and therefore no information loss. Aggregate data are easily compiled through internet search engines and therefore not expensive. However, individual participant data are usually confidential and are only accessible within the group or organization that performed the studies.

References

  1. 1 2 Gee, T. (2005) "Capturing study influence: The concept of 'gravity' in meta-analysis", Counselling, Psychotherapy, and Health, 1(1), 5275 Archived 2006-08-19 at the Wayback Machine
  2. Bellamy, N., Campbell, J. and Gee, T. (2005). Can study selection, variable management and time period influence observed effect sizes in systematic reviews of hyaluronan/hylan products?. In: R. Altman, Poster presentations. 10th World Congress on Osteoarthritis, Massachusetts, U.S.A., (S71-S71). 8-11 Dec, 2005.
  3. Marek Lukacik, MDa, Ronald L. Thomas, PhDb, Jacob V. Aranda, MD, PhDbA Meta-analysis of the Effects of Oral Zinc in the Treatment of Acute and Persistent Diarrhea, Pediatrics Vol. 121 No. 2 February 1, 2008, pp. 326 -336 (doi: 10.1542/peds.2007-0921)
  4. "Statistics Roundtable: The Trusty Jackknife | ASQ".
  5. "Commentary: Heterogeneity in meta-analysis should be expected and appropriately quantified". ije.oxfordjournals.org. Archived from the original on 10 November 2016. Retrieved 17 January 2022.
  6. Olkin I, Dahabreh IJ, Trikalinos TA. GOSH - A graphical display of study heterogeneity. Research Synthesis Methods. 2012;3(3):214-223.
  7. "ALLSUBSETS: Stata module to perform all subsets (Combinatorial) meta-analysis in a set of studies". 19 October 2012.