Mauchly's sphericity test

Last updated September 17, 2024

Mauchly's sphericity test or Mauchly's W is a statistical test used to validate a repeated measures analysis of variance (ANOVA). It was developed in 1940 by John Mauchly.

Sphericity

Sphericity is an important assumption of a repeated-measures ANOVA. It is the condition of equal variances among the differences between all possible pairs of within-subject conditions (i.e., levels of the independent variable). If sphericity is violated (i.e., if the variances of the differences between all combinations of the conditions are not equal), then the variance calculations may be distorted, which would result in an inflated F-ratio.^[1] Sphericity can be evaluated when there are three or more levels of a repeated measure factor and, with each additional repeated measures factor, the risk for violating sphericity increases. If sphericity is violated, a decision must be made as to whether a univariate or multivariate analysis is selected. If a univariate method is selected, the repeated-measures ANOVA must be appropriately corrected depending on the degree to which sphericity has been violated.^[2]

Measurement of sphericity

Figure 1
Patient	Tx A	Tx B	Tx C	Tx A − Tx B	Tx A − Tx C	Tx B − Tx C
1	30	27	20	3	10	7
2	35	30	28	5	7	2
3	25	30	20	−5	5	10
4	15	15	12	0	3	3
5	9	12	7	−3	2	5
Variance:				17	10.3	10.3

To further illustrate the concept of sphericity, consider a matrix representing data from patients who receive three different types of drug treatments in Figure 1. Their outcomes are represented on the left-hand side of the matrix, while differences between the outcomes for each treatment are represented on the right-hand side. After obtaining the difference scores for all possible pairs of groups, the variances of each group difference can be contrasted. From the example in Figure 1, the variance of the differences between Treatment A and B (17) appear to be much greater than the variance of the differences between Treatment A and C (10.3) and between Treatment B and C (10.3). This suggests that the data may violate the assumption of sphericity. To determine whether statistically significant differences exist between the variances of the differences, Mauchly's test of sphericity can be performed.

Interpretation

Developed in 1940 by John W. Mauchly,^[3] Mauchly's test of sphericity is a popular test to evaluate whether the sphericity assumption has been violated. The null hypothesis of sphericity and alternative hypothesis of non-sphericity in the above example can be mathematically written in terms of difference scores.

H_{0}:\sigma _{{\text{Tx A}}-{\text{Tx B}}}^{2}=\sigma _{{\text{Tx A}}-{\text{Tx C}}}^{2}=\sigma _{{\text{Tx B}}-{\text{Tx C}}}^{2}

H_{1}:{\text{The variances are not all equal}}.

Interpreting Mauchly's test is fairly straightforward. When the probability of Mauchly's test statistic is greater than or equal to $\alpha$ (i.e., p > $\alpha$ , with $\alpha$ commonly being set to .05), we fail to reject the null hypothesis that the variances are equal. Therefore, we could conclude that the assumption has not been violated. However, when the probability of Mauchly's test statistic is less than or equal to $\alpha$ (i.e., p < $\alpha$ ), sphericity cannot be assumed and we would therefore conclude that there are significant differences between the variances of the differences.^[4] Sphericity is always met for two levels of a repeated measure factor and is, therefore, unnecessary to evaluate.^[1]

Statistical software should not provide output for a test of sphericity for two levels of a repeated measure factor; however, some versions of SPSS produce an output table with degrees of freedom equal to 0, and a period in place of a numeric p value.

Violations of sphericity

When sphericity has been established, the F-ratio is valid and therefore interpretable. However, if Mauchly's test is significant then the F-ratios produced must be interpreted with caution as the violations of this assumption can result in an increase in the Type I error rate, and influence the conclusions drawn from your analysis.^[4] In instances where Mauchly's test is significant, modifications need to be made to the degrees of freedom so that a valid F-ratio can be obtained.

In SPSS, three corrections are generated: the Greenhouse–Geisser correction (1959), the Huynh–Feldt correction (1976), and the lower-bound. Each of these corrections have been developed to alter the degrees of freedom and produce an F-ratio where the Type I error rate is reduced. The actual F-ratio does not change as a result of applying the corrections; only the degrees of freedom.^[4]

The test statistic for these estimates is denoted by epsilon (ε) and can be found on Mauchly's test output in SPSS. Epsilon provides a measure of departure from sphericity. By evaluating epsilon, we can determine the degree to which sphericity has been violated. If the variances of differences between all possible pairs of groups are equal and sphericity is exactly met, then epsilon will be exactly 1, indicating no departure from sphericity. If the variances of differences between all possible pairs of groups are unequal and sphericity is violated, epsilon will be below 1. The further epsilon is from 1, the worse the violation.^[5]

Of the three corrections, Huynh-Feldt is considered the least conservative, while Greenhouse–Geisser is considered more conservative and the lower-bound correction is the most conservative. When epsilon is > .75, the Greenhouse–Geisser correction is believed to be too conservative, and would result in incorrectly rejecting the null hypothesis that sphericity holds. Collier and colleagues^[6] showed this was true when epsilon was extended to as high as .90. The Huynh–Feldt correction, however, is believed to be too liberal and overestimates sphericity. This would result in incorrectly rejecting the alternative hypothesis that sphericity does not hold, when it does.^[7] Girden^[8] recommended a solution to this problem: when epsilon is > .75, the Huynh–Feldt correction should be applied and when epsilon is < .75 or nothing is known about sphericity, the Greenhouse–Geisser correction should be applied.

Another alternative procedure is using the multivariate test statistics (MANOVA) since they do not require the assumption of sphericity.^[9] However, this procedure can be less powerful than using a repeated measures ANOVA, especially when sphericity violation is not large or sample sizes are small.^[10] O’Brien and Kaiser^[11] suggested that when you have a large violation of sphericity (i.e., epsilon < .70) and your sample size is greater than k + 10 (i.e., the number of levels of the repeated measures factor + 10), then a MANOVA is more powerful; in other cases, repeated measures design should be selected.^[5] Additionally, the power of MANOVA is contingent upon the correlations between the dependent variables, so the relationship between the different conditions must also be considered.^[2]

SPSS provides an F-ratio from four different methods: Pillai's trace, Wilks’ lambda, Hotelling's trace, and Roy's largest root. In general, Wilks’ lambda has been recommended as the most appropriate multivariate test statistic to use.

Criticisms

While Mauchly's test is one of the most commonly used to evaluate sphericity, the test fails to detect departures from sphericity in small samples and over-detects departures from sphericity in large samples. Consequently, the sample size has an influence on the interpretation of the results.^[4] In practice, the assumption of sphericity is extremely unlikely to be exactly met so it is prudent to correct for a possible violation without actually testing for a violation.

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates (CV)'s, typically nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test because the latter converges to the former as the size of the dataset increases.

In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.

The Kruskal–Wallis test by ranks, Kruskal–Wallis $test$ , or one-way ANOVA on ranks is a non-parametric statistical test for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA).

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. This test is used because some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the null hypothesis that the population variances are equal. If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

In statistics, one-way analysis of variance is a technique to compare whether two or more samples' means are significantly different. This analysis of variance technique requires a numeric response variable "Y" and a single explanatory variable "X", hence "one-way".

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.

Named after the Dutch mathematician Bartel Leendert van der Waerden, the Van der Waerden test is a statistical test that k population distribution functions are equal. The Van der Waerden test converts the ranks from a standard Kruskal-Wallis test to quantiles of the standard normal distribution. These are called normal scores and the test is computed from these normal scores.

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

Multivariate analysis of covariance (MANCOVA) is an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covariates – is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the 'factoring out' of noise or error that has been introduced by the covariant. A commonly used multivariate version of the ANOVA F-statistic is Wilks' Lambda (Λ), which represents the ratio between the error variance and the effect variance.

In statistics, a mixed-design analysis of variance model, also known as a split-plot ANOVA, is used to test for differences between two or more independent groups whilst subjecting participants to repeated measures. Thus, in a mixed-design ANOVA model, one factor is a between-subjects variable and the other is a within-subjects variable. Thus, overall, the model is a type of mixed-effects model.

Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

In statistics, one purpose for the analysis of variance (ANOVA) is to analyze differences in means between groups. The test statistic, F, assumes independence of observations, homogeneous variances, and population normality. ANOVA on ranks is a statistic designed for situations when the normality assumption has been violated.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

The Greenhouse–Geisser correction $is a statistical method of adjusting for lack of sphericity in a repeated measures ANOVA. The correction functions as both an estimate of epsilon (sphericity) and a correction for lack of sphericity. The correction was proposed by Samuel Greenhouse and Seymour Geisser in 1959.$

<span class="mw-page-title-main">Homoscedasticity and heteroscedasticity</span> Statistical property

In statistics, a sequence of random variables is homoscedastic if all its random variables have the same finite variance; this is also known as homogeneity of variance. The complementary notion is called heteroscedasticity, also known as heterogeneity of variance. The spellings homoskedasticity and heteroskedasticity are also frequently used. “Skedasticity” comes from the Ancient Greek word “skedánnymi”, meaning “to scatter”. Assuming a variable is homoscedastic when in reality it is heteroscedastic results in unbiased but inefficient point estimates and in biased estimates of standard errors, and may result in overestimating the goodness of fit as measured by the Pearson coefficient.

References

1 2 Hinton, P. R., Brownlow, C., & McMurray, I. (2004). SPSS Explained. Routledge.{{cite book}}: CS1 maint: multiple names: authors list (link)
1 2 Field, A. P. (2005). Discovering Statistics Using SPSS. Sage Publications.
↑ Mauchly, J. W. (1940). "Significance Test for Sphericity of a Normal n-Variate Distribution". The Annals of Mathematical Statistics. 11 (2): 204–209. doi: 10.1214/aoms/1177731915 . JSTOR 2235878.
1 2 3 4 "Sphericity". Laerd Statistics.
1 2 "Sphericity in Repeated Measures Analysis of Variance" (PDF).
↑ Collier, R. O., Jr., Baker, F. B., Mandeville, G. K., & Hayes, T. F. (1967). "Estimates of test size for several test procedures based on conventional variance ratios in the repeated measures design". Psychometrika. 32 (3): 339–353. doi:10.1007/bf02289596. PMID 5234710. S2CID 42325937.{{cite journal}}: CS1 maint: multiple names: authors list (link)
↑ Maxwell, S.E. & Delaney, H.D. (1990). Designing experiments and analyzing data: A model comparison perspective. Belmont: Wadsworth.
↑ Girden, E. (1992). ANOVA: Repeated measures. Newbury Park, CA: Sage.
↑ Howell, D. C. (2009). Statistical Methods for Psychology. Wadsworth Publishing.
↑ "Mauchly Test" (PDF). Archived from the original (PDF) on 2013-05-11. Retrieved 2012-04-29.
↑ O'Brien, R. G. & Kaiser, M. K. (1985). "The MANOVA approach for analyzing repeated measures designs: An extensive primer". Psychological Bulletin. 97: 316–333. doi:10.1037/0033-2909.97.2.316.