Tukey's test of additivity

Last updated

In statistics, Tukey's test of additivity, [1] named for John Tukey, is an approach used in two-way ANOVA (regression analysis involving two qualitative factors) to assess whether the factor variables (categorical variables) are additively related to the expected value of the response variable. It can be applied when there are no replicated values in the data set, a situation in which it is impossible to directly estimate a fully general non-additive regression structure and still have information left to estimate the error variance. The test statistic proposed by Tukey has one degree of freedom under the null hypothesis, hence this is often called "Tukey's one-degree-of-freedom test."

Contents

Introduction

The most common setting for Tukey's test of additivity is a two-way factorial analysis of variance (ANOVA) with one observation per cell. The response variable Yij is observed in a table of cells with the rows indexed by i = 1,..., m and the columns indexed by j = 1,..., n. The rows and columns typically correspond to various types and levels of treatment that are applied in combination.

The additive model states that the expected response can be expressed EYij = μ + αi + βj, where the αi and βj are unknown constant values. The unknown model parameters are usually estimated as

where Yi is the mean of the ith row of the data table, Yj is the mean of the jth column of the data table, and Y•• is the overall mean of the data table.

The additive model can be generalized to allow for arbitrary interaction effects by setting EYij = μ + αi + βj + γij. However, after fitting the natural estimator of γij,

the fitted values

fit the data exactly. Thus there are no remaining degrees of freedom to estimate the variance σ2, and no hypothesis tests about the γij can performed.

Tukey therefore proposed a more constrained interaction model of the form

By testing the null hypothesis that λ = 0, we are able to detect some departures from additivity based only on the single parameter λ.

Method

To carry out Tukey's test, set

Then use the following test statistic [2]

Under the null hypothesis, the test statistic has an F distribution with 1, q degrees of freedom, where q = mn  (m + n) is the degrees of freedom for estimating the error variance.

See also

Related Research Articles

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator, ridge regression, or simply any degenerate estimator.

Quantum group Algebraic construct of interest in theoretical physics

In mathematics and theoretical physics, the term quantum group denotes one of a few different kinds of noncommutative algebras with additional structure. These include Drinfeld–Jimbo type quantum groups, compact matrix quantum groups, and bicrossproduct quantum groups.

The t-test is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis.

Coefficient of determination

In statistics, the coefficient of determination, denoted R2 or r2 and pronounced "R squared", is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

Electromagnetic tensor

In electromagnetism, the electromagnetic tensor or electromagnetic field tensor is a mathematical object that describes the electromagnetic field in spacetime. The field tensor was first used after the four-dimensional tensor formulation of special relativity was introduced by Hermann Minkowski. The tensor allows related physical laws to be written very concisely.

Simple linear regression

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

Theoretical motivation for general relativity

A theoretical motivation for general relativity, including the motivation for the geodesic equation and the Einstein field equation, can be obtained from special relativity by examining the dynamics of particles in circular orbits about the earth. A key advantage in examining circular orbits is that it is possible to know the solution of the Einstein Field Equation a priori. This provides a means to inform and verify the formalism.

In statistics, Levene's test is an inferential statistic used to assess the equality of variances for a variable calculated for two or more groups. Some common statistical procedures assume that variances of the populations from which different samples are drawn are equal. Levene's test assesses this assumption. It tests the null hypothesis that the population variances are equal. If the resulting p-value of Levene's test is less than some significance level (typically 0.05), the obtained differences in sample variances are unlikely to have occurred based on random sampling from a population with equal variances. Thus, the null hypothesis of equal variances is rejected and it is concluded that there is a difference between the variances in the population.

Omnibus tests are a kind of statistical test. They test whether the explained variance in a set of data is significantly greater than the unexplained variance, overall. One example is the F-test in the analysis of variance. There can be legitimate significant effects within a model even if the omnibus test is not significant. For instance, in a model with two independent variables, if only one variable exerts a significant effect on the dependent variable and the other does not, then the omnibus test may be non-significant. This fact does not affect the conclusions that may be drawn from the one significant variable. In order to test effects within an omnibus test, researchers often use contrasts.

Beta-binomial distribution

In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Bernoulli trials is either unknown or random. The beta-binomial distribution is the binomial distribution in which the probability of success at each of n trials is not fixed but randomly drawn from a beta distribution. It is frequently used in Bayesian statistics, empirical Bayes methods and classical statistics to capture overdispersion in binomial type distributed data.

In statistics, the variance inflation factor (VIF) is the ratio (quotient) of the variance of estimating some parameter in a model that includes multiple other terms (parameters) by the variance of a model constructed using only one term. It quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity. Cuthbert Daniel claims to have invented the concept behind the variance inflation factor, but did not come up with the name.

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and Inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

In statistics, a sum of squares due to lack of fit, or more tersely a lack-of-fit sum of squares, is one of the components of a partition of the sum of squares of residuals in an analysis of variance, used in the numerator in an F-test of the null hypothesis that says that a proposed model fits well. The other component is the pure-error sum of squares.

Structure constants coefficients specifying products of basis vectors as linear combinations of other basis vectors, sufficient to construct a (non-unital) algebra over a field from any vector space

In mathematics, the structure constants or structure coefficients of an algebra over a field are used to explicitly specify the product of two basis vectors in the algebra as a linear combination. Given the structure constants, the resulting product is bilinear and can be uniquely extended to all vectors in the vector space, thus uniquely determining the product for the algebra.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

In statistics, the two-way analysis of variance (ANOVA) is an extension of the one-way ANOVA that examines the influence of two different categorical independent variables on one continuous dependent variable. The two-way ANOVA not only aims at assessing the main effect of each independent variable but also if there is any interaction between them.

In mathematics, Ricci calculus constitutes the rules of index notation and manipulation for tensors and tensor fields on a differentiable manifold, with or without a metric tensor or connection. It is also the modern name for what used to be called the absolute differential calculus, developed by Gregorio Ricci-Curbastro in 1887–1896, and subsequently popularized in a paper written with his pupil Tullio Levi-Civita in 1900. Jan Arnoldus Schouten developed the modern notation and formalism for this mathematical framework, and made contributions to the theory, during its applications to general relativity and differential geometry in the early twentieth century.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

References

  1. Tukey, John (1949). "One degree of freedom for non-additivity". Biometrics. 5 (3): 232–242. doi:10.2307/3001938. JSTOR   3001938.
  2. Alin, A. and Kurt, S. (2006). “Testing non-additivity (interaction) in two-way ANOVA tables with no replication”. Statistical Methods in Medical Research15, 6385.