One-way analysis of variance

Last updated

In statistics, one-way analysis of variance (or one-way ANOVA) is a technique to compare whether two or more samples' means are significantly different (using the F distribution). This analysis of variance technique requires a numeric response variable "Y" and a single explanatory variable "X", hence "one-way". [1]

Contents

The ANOVA tests the null hypothesis, which states that samples in all groups are drawn from populations with the same mean values. To do this, two estimates are made of the population variance. These estimates rely on various assumptions (see below). The ANOVA produces an F-statistic, the ratio of the variance calculated among the means to the variance within the samples. If the group means are drawn from populations with the same mean values, the variance between the group means should be lower than the variance of the samples, following the central limit theorem. A higher ratio therefore implies that the samples were drawn from populations with different mean values. [1]

Typically, however, the one-way ANOVA is used to test for differences among at least three groups, since the two-group case can be covered by a t-test (Gosset, 1908). When there are only two means to compare, the t-test and the F-test are equivalent; the relation between ANOVA and t is given by F = t2. An extension of one-way ANOVA is two-way analysis of variance that examines the influence of two different categorical independent variables on one dependent variable.

Assumptions

The results of a one-way ANOVA can be considered reliable as long as the following assumptions are met:

If data are ordinal, a non-parametric alternative to this test should be used such as Kruskal–Wallis one-way analysis of variance. If the variances are not known to be equal, a generalization of 2-sample Welch's t-test can be used. [2]

Departures from population normality

ANOVA is a relatively robust procedure with respect to violations of the normality assumption. [3]

The one-way ANOVA can be generalized to the factorial and multivariate layouts, as well as to the analysis of covariance.[ clarification needed ]

It is often stated in popular literature that none of these F-tests are robust when there are severe violations of the assumption that each population follows the normal distribution, particularly for small alpha levels and unbalanced layouts. [4] Furthermore, it is also claimed that if the underlying assumption of homoscedasticity is violated, the Type I error properties degenerate much more severely. [5]

However, this is a misconception, based on work done in the 1950s and earlier. The first comprehensive investigation of the issue by Monte Carlo simulation was Donaldson (1966). [6] He showed that under the usual departures (positive skew, unequal variances) "the F-test is conservative", and so it is less likely than it should be to find that a variable is significant. However, as either the sample size or the number of cells increases, "the power curves seem to converge to that based on the normal distribution". Tiku (1971) found that "the non-normal theory power of F is found to differ from the normal theory power by a correction term which decreases sharply with increasing sample size." [7] The problem of non-normality, especially in large samples, is far less serious than popular articles would suggest.

The current view is that "Monte-Carlo studies were used extensively with normal distribution-based tests to determine how sensitive they are to violations of the assumption of normal distribution of the analyzed variables in the population. The general conclusion from these studies is that the consequences of such violations are less severe than previously thought. Although these conclusions should not entirely discourage anyone from being concerned about the normality assumption, they have increased the overall popularity of the distribution-dependent statistical tests in all areas of research." [8]

For nonparametric alternatives in the factorial layout, see Sawilowsky. [9] For more discussion see ANOVA on ranks.

The case of fixed effects, fully randomized experiment, unbalanced data

The model

The normal linear model describes treatment groups with probability distributions which are identically bell-shaped (normal) curves with different means. Thus fitting the models requires only the means of each treatment group and a variance calculation (an average variance within the treatment groups is used). Calculations of the means and the variance are performed as part of the hypothesis test.

The commonly used normal linear models for a completely randomized experiment are: [10]

(the means model)

or

(the effects model)

where

is an index over experimental units
is an index over treatment groups
is the number of experimental units in the jth treatment group
is the total number of experimental units
are observations
is the mean of the observations for the jth treatment group
is the grand mean of the observations
is the jth treatment effect, a deviation from the grand mean
, are normally distributed zero-mean random errors.

The index over the experimental units can be interpreted several ways. In some experiments, the same experimental unit is subject to a range of treatments; may point to a particular unit. In others, each treatment group has a distinct set of experimental units; may simply be an index into the -th list.

The data and statistical summaries of the data

One form of organizing experimental observations is with groups in columns:

ANOVA data organization, Unbalanced, Single factor
Lists of Group Observations
1
2
3
Group Summary StatisticsGrand Summary Statistics
# Observed# Observed
SumSum
Sum SqSum Sq
MeanMean
VarianceVariance

Comparing model to summaries: and . The grand mean and grand variance are computed from the grand sums, not from group means and variances.

The hypothesis test

Given the summary statistics, the calculations of the hypothesis test are shown in tabular form. While two columns of SS are shown for their explanatory value, only one column is required to display results.

ANOVA table for fixed model, single factor, fully randomized experiment
Source of variationSums of squaresSums of squaresDegrees of freedomMean squareF
Explanatory SS [11] Computational SS [12] DFMS
Treatments
Error
Total

is the estimate of variance corresponding to of the model.

Analysis summary

The core ANOVA analysis consists of a series of calculations. The data is collected in tabular form. Then

If the experiment is balanced, all of the terms are equal so the SS equations simplify.

In a more complex experiment, where the experimental units (or environmental effects) are not homogeneous, row statistics are also used in the analysis. The model includes terms dependent on . Determining the extra terms reduces the number of degrees of freedom available.

Example

Consider an experiment to study the effect of three different levels of a factor on a response (e.g. three levels of a fertilizer on plant growth). If we had 6 observations for each level, we could write the outcome of the experiment in a table like this, where a1, a2, and a3 are the three levels of the factor being studied.

a1a2a3
6813
8129
4911
5118
367
4812

The null hypothesis, denoted H0, for the overall F-test for this experiment would be that all three levels of the factor produce the same response, on average. To calculate the F-ratio:

Step 1: Calculate the mean within each group:

Step 2: Calculate the overall mean:

where a is the number of groups.

Step 3: Calculate the "between-group" sum of squared differences:

where n is the number of data values per group.

The between-group degrees of freedom is one less than the number of groups

so the between-group mean square value is

Step 4: Calculate the "within-group" sum of squares. Begin by centering the data in each group

a1a2a3
6−5=18−9=−113−10=3
8−5=312−9=39−10=−1
4−5=−19−9=011−10=1
5−5=011−9=28−10=−2
3−5=−26−9=−37−10=−3
4−5=−18−9=−112−10=2

The within-group sum of squares is the sum of squares of all 18 values in this table

The within-group degrees of freedom is

Thus the within-group mean square value is

Step 5: The F-ratio is

The critical value is the number that the test statistic must exceed to reject the test. In this case, Fcrit(2,15) = 3.68 at α = 0.05. Since F=9.3 > 3.68, the results are significant at the 5% significance level. One would not accept the null hypothesis, concluding that there is strong evidence that the expected values in the three groups differ. The p-value for this test is 0.002.

After performing the F-test, it is common to carry out some "post-hoc" analysis of the group means. In this case, the first two group means differ by 4 units, the first and third group means differ by 5 units, and the second and third group means differ by only 1 unit. The standard error of each of these differences is . Thus the first group is strongly different from the other groups, as the mean difference is more than 3 times the standard error, so we can be highly confident that the population mean of the first group differs from the population means of the other groups. However, there is no evidence that the second and third groups have different population means from each other, as their mean difference of one unit is comparable to the standard error.

Note F(x, y) denotes an F-distribution cumulative distribution function with x degrees of freedom in the numerator and y degrees of freedom in the denominator.

See also

Notes

  1. 1 2 Howell, David (2002). Statistical Methods for Psychology. Duxbury. pp.  324–325. ISBN   0-534-37770-X.
  2. Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR   2332579.
  3. Kirk, RE (1995). Experimental Design: Procedures For The Behavioral Sciences (3 ed.). Pacific Grove, CA, USA: Brooks/Cole.
  4. Blair, R. C. (1981). "A reaction to 'Consequences of failure to meet assumptions underlying the fixed effects analysis of variance and covariance.'". Review of Educational Research. 51 (4): 499–507. doi:10.3102/00346543051004499.
  5. Randolf, E. A.; Barcikowski, R. S. (1989). "Type I error rate when real study values are used as population parameters in a Monte Carlo study". Paper Presented at the 11th Annual Meeting of the Mid-Western Educational Research Association, Chicago.
  6. Donaldson, Theodore S. (1966). "Power of the F-Test for Nonnormal Distributions and Unequal Error Variances". Paper Prepared for United States Air Force Project RAND.
  7. Tiku, M. L. (1971). "Power Function of the F-Test Under Non-Normal Situations". Journal of the American Statistical Association . 66 (336): 913–916. doi:10.1080/01621459.1971.10482371.
  8. "Getting Started with Statistics Concepts". Archived from the original on 2018-12-04. Retrieved 2016-09-22.
  9. Sawilowsky, S. (1990). "Nonparametric tests of interaction in experimental design". Review of Educational Research. 60 (1): 91–126. doi:10.3102/00346543060001091.
  10. Montgomery, Douglas C. (2001). Design and Analysis of Experiments (5th ed.). New York: Wiley. p. Section 3–2. ISBN   9780471316497.
  11. Moore, David S.; McCabe, George P. (2003). Introduction to the Practice of Statistics (4th ed.). W H Freeman & Co. p. 764. ISBN   0716796570.
  12. Winkler, Robert L.; Hays, William L. (1975). Statistics: Probability, Inference, and Decision (2nd ed.). New York: Holt, Rinehart and Winston. p.  761.

Further reading

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes a particular aspect of a probability distribution. There are different ways to quantify kurtosis for a theoretical distribution, and there are corresponding ways of estimating it using a sample from a population. Different measures of kurtosis may have different interpretations.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator measures the average of the squares of the errors—that is, the average squared difference between the estimated values and the actual value. MSE is a risk function, corresponding to the expected value of the squared error loss. The fact that MSE is almost always strictly positive is because of randomness or because the estimator does not account for information that could produce a more accurate estimate. In machine learning, specifically empirical risk minimization, MSE may refer to the empirical risk, as an estimate of the true MSE.

<i>F</i>-test Statistical hypothesis test, mostly using multiple restrictions

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

Analysis of covariance (ANCOVA) is a general linear model that blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of one or more categorical independent variables (IV) and across one or more continuous variables. For example, the categorical variable(s) might describe treatment and the continuous variable(s) might be covariates or nuisance variables; or vice versa. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as 'adjusting' the DV by the group means of the CV(s).

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test since the latter converges to the former as the size of the dataset increases.

Hotellings <i>T</i>-squared distribution Type of probability distribution

In statistics, particularly in hypothesis testing, the Hotelling's T-squared distribution (T2), proposed by Harold Hotelling, is a multivariate probability distribution that is tightly related to the F-distribution and is most notable for arising as the distribution of a set of sample statistics that are natural generalizations of the statistics underlying the Student's t-distribution. The Hotelling's t-squared statistic (t2) is a generalization of Student's t-statistic that is used in multivariate hypothesis testing.

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality. K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

Squared deviations from the mean (SDM) result from squaring deviations. In probability theory and statistics, the definition of variance is either the expected value of the SDM or its average value. Computations for analysis of variance involve the partitioning of a sum of SDM.

In statistics, pooled variance is a method for estimating variance of several different populations when the mean of each population may be different, but one may assume that the variance of each population is the same. The numerical estimate resulting from the use of this method is also called the pooled variance.

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provides only approximate solutions.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

In statistics, expected mean squares (EMS) are the expected values of certain statistics arising in partitions of sums of squares in the analysis of variance (ANOVA). They can be used for ascertaining which statistic should appear in the denominator in an F-test for testing a null hypothesis that a particular effect is absent.