Pseudoreplication

Last updated

Pseudoreplication (sometimes unit of analysis error [1] ) has many definitions. Pseudoreplication was originally defined in 1984 by Stuart H. Hurlbert [2] as the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated (though samples may be) or replicates are not statistically independent. Subsequently, Millar and Anderson [3] identified it as a special case of inadequate specification of random factors where both random and fixed factors are present. It is sometimes narrowly interpreted as an inflation of the number of samples or replicates which are not statistically independent. [4] This definition omits the confounding of unit and treatment effects in a misspecified F-ratio. In practice, incorrect F-ratios for statistical tests of fixed effects often arise from a default F-ratio that is formed over the error rather the mixed term.

Contents

Lazic [5] defined pseudoreplication as a problem of correlated samples (e.g. from longitudinal studies) where correlation is not taken into account when computing the confidence interval for the sample mean. For the effect of serial or temporal correlation also see Markov chain central limit theorem.

Pseudoreplication due to correlation of samples: without accounting for correlation the 90% confidence interval for the sample mean is much too small. To get around this problem for example the blocking method can be applied where correlated samples are first grouped, then the (for each block) the corresponding sample means are computed. From these two "block sample means" the total sample mean is computed as their average as well as the standard deviation. This gives a better estimate for the confidence interval of the sample mean. Pseudoreplication correlation.webp
Pseudoreplication due to correlation of samples: without accounting for correlation the 90% confidence interval for the sample mean is much too small. To get around this problem for example the blocking method can be applied where correlated samples are first grouped, then the (for each block) the corresponding sample means are computed. From these two "block sample means" the total sample mean is computed as their average as well as the standard deviation. This gives a better estimate for the confidence interval of the sample mean.

The problem of inadequate specification arises when treatments are assigned to units that are subsampled and the treatment F-ratio in an analysis of variance (ANOVA) table is formed with respect to the residual mean square rather than with respect to the among unit mean square. The F-ratio relative to the within unit mean square is vulnerable to the confounding of treatment and unit effects, especially when experimental unit number is small (e.g. four tank units, two tanks treated, two not treated, several subsamples per tank). The problem is eliminated by forming the F-ratio relative to the correct mean square in the ANOVA table (tank by treatment MS in the example above), where this is possible. The problem is addressed by the use of mixed models. [3]

Hurlbert reported "pseudoreplication" in 48% of the studies he examined, that used inferential statistics. [2] Several studies examining scientific papers published up to 2016 similarly found about half of the papers were suspected of pseudoreplication. [4] When time and resources limit the number of experimental units, and unit effects cannot be eliminated statistically by testing over the unit variance, it is important to use other sources of information to evaluate the degree to which an F-ratio is confounded by unit effects.

Replication

Replication increases the precision of an estimate, while randomization addresses the broader applicability of a sample to a population. Replication must be appropriate: replication at the experimental unit level must be considered, in addition to replication within units.

Hypothesis testing

Statistical tests (e.g. t-test and the related ANOVA family of tests) rely on appropriate replication to estimate statistical significance. Tests based on the t and F distributions assume homogeneous, normal, and independent errors. Correlated errors can lead to false precision and p-values that are too small. [6]

Types

Hurlbert (1984) defined four types of pseudoreplication.

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Statistics</span> Study of the collection, analysis, interpretation, and presentation of data

Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

<i>F</i>-test Statistical hypothesis test, mostly using multiple restrictions

An F-test is any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates or nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value". The error of an observation is the deviation of the observed value from the true value of a quantity of interest. The residual is the difference between the observed value and the estimated value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals. In econometrics, "errors" are also called disturbances.

A t-test is a type of statistical analysis used to compare the averages of two groups and determine whether the differences between them are more likely to arise from random chance. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are different.

ANOVA gage repeatability and reproducibility is a measurement systems analysis technique that uses an analysis of variance (ANOVA) random effects model to assess a measurement system.

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests
  2. Bootstrapping
  3. Cross validation

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

Statistical conclusion validity is the degree to which conclusions about the relationship among variables based on the data are correct or "reasonable". This began as being solely about whether the statistical conclusion about the relationship of the variables was correct, but now there is a movement towards moving to "reasonable" conclusions that use: quantitative, statistical, and qualitative data. Fundamentally, two types of errors can occur: type I and type II. Statistical conclusion validity concerns the qualities of the study that make these types of errors more likely. Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures.

In statistics, one-way analysis of variance is a technique to compare whether two samples' means are significantly different. This analysis of variance technique requires a numeric response variable "Y" and a single explanatory variable "X", hence "one-way".

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

In statistics, restricted randomization occurs in the design of experiments and in particular in the context of randomized experiments and randomized controlled trials. Restricted randomization allows intuitively poor allocations of treatments to experimental units to be avoided, while retaining the theoretical benefits of randomization. For example, in a clinical trial of a new proposed treatment of obesity compared to a control, an experimenter would want to avoid outcomes of the randomization in which the new treatment was allocated only to the heaviest patients.

In statistics, a mixed-design analysis of variance model, also known as a split-plot ANOVA, is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures. Thus, in a mixed-design ANOVA model, one factor is a between-subjects variable and the other is a within-subjects variable. Thus, overall, the model is a type of mixed-effects model.

In randomized statistical experiments, generalized randomized block designs (GRBDs) are used to study the interaction between blocks and treatments. For a GRBD, each treatment is replicated at least two times in each block; this replication allows the estimation and testing of an interaction term in the linear model.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

In statistics, expected mean squares (EMS) are the expected values of certain statistics arising in partitions of sums of squares in the analysis of variance (ANOVA). They can be used for ascertaining which statistic should appear in the denominator in an F-test for testing a null hypothesis that a particular effect is absent.

References

  1. Hurlbert, Stuart H. (2009). "The ancient black art and transdisciplinary extent of pseudoreplication". Journal of Comparative Psychology. 123 (4): 434–443. doi:10.1037/a0016221. ISSN   1939-2087. PMID   19929111.
  2. 1 2 Hurlbert, Stuart H. (1984). "Pseudoreplication and the design of ecological field experiments" (PDF). Ecological Monographs. Ecological Society of America. 54 (2): 187–211. Bibcode:1984EcoM...54..187H. doi:10.2307/1942661. JSTOR   1942661.
  3. 1 2 Millar, R.B.; Anderson, M.R. (2004). "Remedies for pseudoreplication". Fisheries Research. 70 (2–3): 397–407. doi:10.1016/j.fishres.2004.08.016.
  4. 1 2 Gholipour, Bahar (2018-03-15). "Statistical errors may taint as many as half of mouse studies". Spectrum | Autism Research News. Retrieved 2018-03-24.
  5. 1 2 E, Lazic, Stanley (2010-01-14). "The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?". BMC Neuroscience. BioMed Central Ltd. 11: 5. doi: 10.1186/1471-2202-11-5 . OCLC   805414397. PMC   2817684 . PMID   20074371.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  6. Lazic, SE (2010). "The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis?". BMC Neuroscience. 11 (5): 5. doi: 10.1186/1471-2202-11-5 . PMC   2817684 . PMID   20074371.