Mixed-design analysis of variance

Last updated

In statistics, a mixed-design analysis of variance model, also known as a split-plot ANOVA, is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures. Thus, in a mixed-design ANOVA model, one factor (a fixed effects factor) is a between-subjects variable and the other (a random effects factor) is a within-subjects variable. Thus, overall, the model is a type of mixed-effects model.

Contents

A repeated measures design is used when multiple independent variables or measures exist in a data set, but all participants have been measured on each variable. [1] :506

An example

Variables in Andy Field's (2009) mixed-design ANOVA example. Participants would experience each level of the repeated variables but only one level of the between-subjects variable. Mixed-Design ANOVA Example.png
Variables in Andy Field's (2009) mixed-design ANOVA example. Participants would experience each level of the repeated variables but only one level of the between-subjects variable.

Andy Field (2009) [1] provided an example of a mixed-design ANOVA in which he wants to investigate whether personality or attractiveness is the most important quality for individuals seeking a partner. In his example, there is a speed dating event set up in which there are two sets of what he terms "stooge dates": a set of males and a set of females. The experimenter selects 18 individuals, 9 males and 9 females to play stooge dates. Stooge dates are individuals who are chosen by the experimenter and they vary in attractiveness and personality. For males and females, there are three highly attractive individuals, three moderately attractive individuals, and three highly unattractive individuals. Of each set of three, one individual has a highly charismatic personality, one is moderately charismatic and the third is extremely dull.

The participants are the individuals who sign up for the speed dating event and interact with each of the 9 individuals of the opposite sex. There are 10 males and 10 female participants. After each date, they rate on a scale of 0 to 100 how much they would like to have a date with that person, with a zero indicating "not at all" and 100 indicating "very much".

The random factors, or so-called repeated measures, are looks, which consists of three levels (very attractive, moderately attractive, and highly unattractive) and the personality, which again has three levels (highly charismatic, moderately charismatic, and extremely dull). The looks and personality have an overall random character because the precise level of each cannot be controlled by the experimenter (and indeed may be difficult to quantify [2] ); the 'blocking' into discrete categories is for convenience, and does not guarantee precisely the same level of looks or personality within a given block; [3] and the experimenter is interested in making inferences on the general population of daters, not just the 18 'stooges' [4] The fixed-effect factor, or so-called between-subjects measure, is gender because the participants making the ratings were either female or male, and precisely these statuses were designed by the experimenter.

ANOVA assumptions

When running an analysis of variance to analyse a data set, the data set should meet the following criteria:

  1. Normality: scores for each condition should be sampled from a normally distributed population.
  2. Homogeneity of variance: each population should have the same error variance.
  3. Sphericity of the covariance matrix: ensures the F ratios match the F distribution

For the between-subject effects to meet the assumptions of the analysis of variance, the variance for any level of a group must be the same as the variance for the mean of all other levels of the group. When there is homogeneity of variance, sphericity of the covariance matrix will occur, because for between-subjects independence has been maintained. [5] [ page needed ]

For the within-subject effects, it is important to ensure normality and homogeneity of variance are not being violated. [5] [ page needed ]

If the assumptions are violated, a possible solution is to use the Greenhouse–Geisser correction [6] or the Huynh & Feldt [7] adjustments to the degrees of freedom because they can correct for issues that can arise should the sphericity of the covariance matrix assumption be violated. [5] [ page needed ]

Partitioning the sums of squares and the logic of ANOVA

Due to the fact that the mixed-design ANOVA uses both between-subject variables and within-subject variables (a.k.a. repeated measures), it is necessary to partition out (or separate) the between-subject effects and the within-subject effects. [5] It is as if you are running two separate ANOVAs with the same data set, except that it is possible to examine the interaction of the two effects in a mixed design. As can be seen in the source table provided below, the between-subject variables can be partitioned into the main effect of the first factor and into the error term. The within-subjects terms can be partitioned into three terms: the second (within-subjects) factor, the interaction term for the first and second factors, and the error term. [5] [ page needed ] The main difference between the sum of squares of the within-subject factors and between-subject factors is that within-subject factors have an interaction factor.

More specifically, the total sum of squares in a regular one-way ANOVA would consist of two parts: variance due to treatment or condition (SSbetween-subjects) and variance due to error (SSwithin-subjects). Normally the SSwithin-subjects is a measurement of variance. In a mixed-design, you are taking repeated measures from the same participants and therefore the sum of squares can be broken down even further into three components: SSwithin-subjects (variance due to being in different repeated measure conditions), SSerror (other variance), and SSBT*WT (variance of interaction of between-subjects by within-subjects conditions). [5]

Each effect has its own F value. Both the between-subject and within-subject factors have their own MSerror term which is used to calculate separate F values.

Between-subjects:

Within-subjects:

Analysis of variance table

Results are often presented in a table of the following form. [5] [ page needed ]

SourceSSdfMSF
Between-subjects
FactorBSSSBSdfBSMSBSFBS
ErrorSSBS/EdfBS/EMSBS/E
Within-subjects
FactorWSSSWSdfWSMSWSFWS
FactorWS×BSSSBS×WSdfBS×WSMSBS×WSFBS×WS
ErrorSSWS/EdfWS/EMSWS/E
TotalSSTdfT

Degrees of freedom

In order to calculate the degrees of freedom for between-subjects effects, dfBS = R – 1, where R refers to the number of levels of between-subject groups. [5] [ page needed ]

In the case of the degrees of freedom for the between-subject effects error, dfBS(Error) = Nk – R, where Nk is equal to the number of participants, and again R is the number of levels.

To calculate the degrees of freedom for within-subject effects, dfWS = C – 1, where C is the number of within-subject tests. For example, if participants completed a specific measure at three time points, C = 3, and dfWS = 2.

The degrees of freedom for the interaction term of between-subjects by within-subjects term(s), dfBSXWS = (R – 1)(C – 1), where again R refers to the number of levels of the between-subject groups, and C is the number of within-subject tests.

Finally, the within-subject error is calculated by, dfWS(Error) = (Nk – R)(C – 1), in which Nk is the number of participants, R and C remain the same.

Follow-up tests

When there is a significant interaction between a between-subject factor and a within-subject factor, statisticians often recommended pooling the between-subject and within-subject MSerror terms. [5] [ page needed ][ citation needed ] This can be calculated in the following way:

MSWCELL = SSBSError + SSWSError / dfBSError + dfWSError

This pooled error is used when testing the effect of the between-subject variable within a level of the within-subject variable. If testing the within-subject variable at different levels of the between-subject variable, the MSws/e error term that tested the interaction is the correct error term to use. More generally, as described by Howell (1987 Statistical Methods for Psychology, 2nd edition, p 434), when doing simple effects based on the interactions one should use the pooled error when the factor being tested and the interaction were tested with different error terms. When the factor being tested and the interaction were tested with the same error term, that term is sufficient.

When following up interactions for terms that are both between-subjects or both within-subjects variables, the method is identical to follow-up tests in ANOVA. The MSError term that applies to the follow-up in question is the appropriate one to use, e.g. if following up a significant interaction of two between-subject effects, use the MSError term from between-subjects. [5] [ page needed ] See ANOVA.

See also

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

Analysis of covariance (ANCOVA) is a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

Interaction (statistics) Statistical term

In statistics, an interaction may arise when considering the relationship among three or more variables, and describes a situation in which the effect of one causal variable on an outcome depends on the state of a second causal variable. Although commonly thought of in terms of causal relationships, the concept of an interaction can also describe non-causal associations. Interactions are often considered in the context of regression analyses or factorial experiments.

Linear trend estimation is a statistical technique to aid interpretation of data. When a series of measurements of a process are treated as, for example, a sequences or time series, trend estimation can be used to make and justify statements about tendencies in the data, by relating the measurements to the times at which they occurred. This model can then be used to describe the behaviour of the observed data, without explaining it.

In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.

Factorial experiment Experimental design in statistics

In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all such factors. A full factorial design may also be called a fully crossed design. Such an experiment allows the investigator to study the effect of each factor on the response variable, as well as the effects of interactions between factors on the response variable.

ANOVA gauge repeatability and reproducibility is a measurement systems analysis technique that uses an analysis of variance (ANOVA) random effects model to assess a measurement system.

Multilevel models are statistical models of parameters that vary at more than one level. An example could be a model of student performance that contains measures for individual students as well as measures for classrooms within which the students are grouped. These models can be seen as generalizations of linear models, although they can also extend to non-linear models. These models became much more popular after sufficient computing power and software became available.

Pseudoreplication

Pseudoreplication has many definitions. Pseudoreplication was originally defined in 1984 by Stuart H. Hurlbert as the use of inferential statistics to test for treatment effects with data from experiments where either treatments are not replicated or replicates are not statistically independent. Subsequently, Millar and Anderson identified it as a special case of inadequate specification of random factors where both random and fixed factors are present. It is sometimes narrowly interpreted as an inflation of the number of samples or replicates which are not statistically independent. This definition omits the confounding of unit and treatment effects in a misspecified F-ratio. In practice, incorrect F-ratios for statistical tests of fixed effects often arise from a default F-ratio that is formed over the error rather the mixed term.

In computational biology and bioinformatics, analysis of variance – simultaneous component analysis is a method that partitions variation and enables interpretation of these partitions by SCA, a method that is similar to principal components analysis (PCA). Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze differences. Statistical coupling analysis (SCA) is a technique used in bioinformatics to measure covariation between pairs of amino acids in a protein multiple sequence alignment (MSA).

Mauchly's sphericity test or Mauchly's W is a statistical test used to validate a repeated measures analysis of variance (ANOVA). It was developed in 1940 by John Mauchly.

In statistics, one-way analysis of variance is a technique that can be used to compare whether two sample's means are significantly different or not. This technique can be used only for numerical response data, the "Y", usually one variable, and numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covariates – is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the 'factoring out' of noise or error that has been introduced by the covariant. A commonly used multivariate version of the ANOVA F-statistic is Wilks' Lambda (Λ), which represents the ratio between the error variance and the effect variance.

In statistics, restricted randomization occurs in the design of experiments and in particular in the context of randomized experiments and randomized controlled trials. Restricted randomization allows intuitively poor allocations of treatments to experimental units to be avoided, while retaining the theoretical benefits of randomization. For example, in a clinical trial of a new proposed treatment of obesity compared to a control, an experimenter would want to avoid outcomes of the randomization in which the new treatment was allocated only to the heaviest patients.

The following is a glossary of terms. It is not intended to be all-inclusive.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

One application of multilevel modeling (MLM) is the analysis of repeated measures data. Multilevel modeling for repeated measures data is most often discussed in the context of modeling change over time ; however, it may also be used for repeated measures data in which time is not a factor.

References

  1. 1 2 Field, A. (2009). Discovering Statistics Using SPSS (3rd edition). Los Angeles: Sage.
  2. Douglas C. Montgomery, Elizabeth A. Peck, and G. Geoffrey Vining; Introduction to Linear Regression Analysis; John Wiley & Sons, New York; 2001. Page 280.
  3. Marianne Müller (ETH Zurich); Applied Analysis of Variance and Experimental Design, Lecture slides for week 4 (compiled 2011-10-25, delivered circa late 2013). Accessed 2019-01-23.
  4. Gary W. Oehlert (University of Minnesota); A First Course in Design and Analysis of Experiments; self-published, USA; 2010. Page 289.
  5. 1 2 3 4 5 6 7 8 9 10 Howell, D. (2010). Statistical Methods for Psychology (7th edition). Australia: Wadsworth.
  6. Geisser, S. and Greenhouse, S.W. (1958). An extension of Box's result on the use of the F distribution in multivariate analysis. Annals of Mathematical Statistics, 29, 885–891
  7. Hyunh, H. and Feldt, L.S. (1970). Conditions under which mean square ratios in repeated measurements designs have exact F-distributions. Journal of the American Statistical Association, 65, 1582–1589

Further reading