Duncan's new multiple range test

Last updated

In statistics, Duncan's new multiple range test (MRT) is a multiple comparison procedure developed by David B. Duncan in 1955. Duncan's MRT belongs to the general class of multiple comparison procedures that use the studentized range statistic qr to compare sets of means.

Contents

David B. Duncan developed this test as a modification of the Student–Newman–Keuls method that would have greater power. Duncan's MRT is especially protective against false negative (Type II) error at the expense of having a greater risk of making false positive (Type I) errors. Duncan's test is commonly used in agronomy and other agricultural research.

The result of the test is a set of subsets of means, where in each subset means have been found not to be significantly different from one another.

This test is often followed by the Compact Letter Display (CLD) methodology that renders the output of such test much more accessible to non-statistician audiences.

Definition

Assumptions:
1.A sample of observed means , which have been drawn independently from n normal populations with "true" means, respectively.
2.A common standard error . This standard error is unknown, but there is available the usual estimate , which is independent of the observed means and is based on a number of degrees of freedom, denoted by . (More precisely, , has the property that is distributed as with degrees of freedom, independently of sample means).

The exact definition of the test is:

The difference between any two means in a set of n means is significant provided the range of each and every subset which contains the given means is significant according to an level range test where , and is the number of means in the subset concerned.

Exception: The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range.

Procedure

The procedure consists of a series of pairwise comparisons between means. Each comparison is performed at a significance level , defined by the number of means separating the two means compared ( for separating means). The test are performed sequentially, where the result of a test determines which test is performed next.

The tests are performed in the following order: the largest minus the smallest, the largest minus the second smallest, up to the largest minus the second largest; then the second largest minus the smallest, the second largest minus the second smallest, and so on, finishing with the second smallest minus the smallest.

With only one exception, given below, each difference is significant if it exceeds the corresponding shortest significant range; otherwise it is not significant. Where the shortest significant range is the significant studentized range, multiplied by the standard error. The shortest significant range will be designated as , where is the number means in the subset. The sole exception to this rule is that no difference between two means can be declared significant if the two means concerned are both contained in a subset of the means which has a non-significant range.

An algorithm for performing the test is as follows:

       1.Rank the sample means, largest to smallest.        2. For each  sample mean, largest to smallest, do the following:        2.1 for each sample mean, (denoted ), for smallest up to .        2.1.1 compare  to critical value ,        2.1.2 if  does not exceed the critical value, the subset  is declared not significantly different:                2.1.2.1 Go to next iteration of loop 2.        2.1.3 Otherwise, keep going with loop 2.1

Critical values

Duncan's multiple range test makes use of the studentized range distribution in order to determine critical values for comparisons between means. Note that different comparisons between means may differ by their significance levels- since the significance level is subject to the size of the subset of means in question.

Let us denote as the quantile of the studentized range distribution, with p observations, and degrees of freedom for the second sample (see studentized range for more information). Let us denote as the standardized critical value, given by the rule:

If p=2

Else

The shortest critical range, (the actual critical value of the test) is computed as : . For ->∞, a tabulation exists for an exact value of Q (see link). A word of caution is needed here: notations for Q and R are not the same throughout literature, where Q is sometimes denoted as the shortest significant interval, and R as the significant quantile for studentized range distribution (Duncan's 1955 paper uses both notations in different parts).

Numeric example

Let us look at the example of 5 treatment means:

TreatmentsT1T2T3T4T5
Treatment Means9.815.417.621.610.8
Rank53214


With a standard error of , and (degrees of freedom for estimating the standard error). Using a known tabulation for Q, one reaches the values of :




Now we may obtain the values of the shortest significant range, by the formula:

Reaching:




Then, the observed differences between means are tested, beginning with the largest versus smallest, which would be compared with the least significant range Next, the difference of the largest and the second smallest is computed and compared with the least significant difference .

If an observed difference is greater than the corresponding shortest significant range, then we conclude that the pair of means in question is significantly different. If an observed difference is smaller than the corresponding shortest significant range, all differences sharing the same upper mean are considered insignificant, in order to prevent contradictions (differences sharing the same upper mean are shorter by construction).

For our case, the comparison will yield:












We see that there are significant differences between all pairs of treatments except (T3,T2) and (T5,T1). A graph underlining those means that are not significantly different is shown below:
T1 T5 T2 T3 T4

Protection and significance levels based on degrees of freedom

The new multiple range test proposed by Duncan makes use of special protection levels based upon degrees of freedom. Let be the protection level for testing the significance of a difference between two means; that is, the probability that a significant difference between two means will not be found if the population means are equal. Duncan reasons that one has p-1 degrees of freedom for testing p ranked mean, and hence one may conduct p-1 independent tests, each with protection level . Hence, the joint protection level is:

where

that is, the probability that one finds no significant differences in making p-1 independent tests, each at protection level , is , under the hypothesis that all p population means are equal. In general: the difference between any two means in a set of n means is significant provided the range of each and every subset, which contains the given means, is significant according to an –level range test, where p is the number of means in the subset concerned.

For , the protection level can be tabulated for various value of r as follows:

Protection level probability of falsely rejecting
p=20.950.05
p=30.9030.097
p=40.8570.143
p=50.8150.185
p=60.7740.226
p=70.7350.265

Note that although this procedure makes use of the Studentized range, his error rate is neither on an experiment-wise basis (as with Tukey's) nor on a per- comparisons basis. Duncan's multiple range test does not control the family-wise error rate. See Criticism Section for further details.

Duncan Bayesian multiple comparison procedure

Duncan (1965) also gave the first Bayesian multiple comparison procedure, for the pairwise comparisons among the means in a one-way layout. This multiple comparison procedure is different for the one discussed above.

Duncan's Bayesian MCP discusses the differences between ordered group means, where the statistics in question are pairwise comparison (no equivalent is defined for the property of a subset having 'significantly different' property).

Duncan modeled the consequences of two or more means being equal using additive loss functions within and across the pairwise comparisons. If one assumes the same loss function across the pairwise comparisons, one needs to specify only one constant K, and this indicates the relative seriousness of type I to type II errors in each pairwise comparison.

A study, which performed by Juliet Popper Shaffer (1998), has shown that the method proposed by Duncan, modified to provide weak control of FWE and using an empirical estimate of the variance of the population means, has good properties both from the Bayesian point of view, as a minimum- risk method, and from the frequentist point of view, with good average power.

In addition, results indicate considerable similarity in both risk and average power between Duncan's modified procedure and the Benjamini and Hochberg (1995) False discovery rate -controlling procedure, with the same weak family-wise error control.

Criticism

Duncan's test has been criticised as being too liberal by many statisticians including Henry Scheffé, and John W. Tukey. Duncan argued that a more liberal procedure was appropriate because in real world practice the global null hypothesis H0 = "All means are equal" is often false and thus traditional statisticians overprotect a probably false null hypothesis against type I errors. According to Duncan, one should adjust the protection levels for different p-mean comparisons according to the problem discussed. The example discussed by Duncan in his 1955 paper is of a comparison of many means (i.e. 100), when one is interested only in two-mean and three-mean comparisons, and general p-mean comparisons (deciding whether there is some difference between p-means) are of no special interest (if p is 15 or more for example). Duncan's multiple range test is very “liberal” in terms of Type I errors. The following example will illustrate why:

Let us assume one is truly interested, as Duncan suggested, only with the correct ranking of subsets of size 4 or below. Let us also assume that one performs the simple pairwise comparison with a protection level . Given an overall set of 100 means, let us look at the null hypotheses of the test:

There are null hypotheses for the correct ranking of each 2 means. The significance level of each hypothesis is

There are null hypotheses for the correct ranking of each 3 means. The significance level of each hypothesis is

There are null hypotheses for the correct ranking of each 4 means. The significance level of each hypothesis is

As we can see, the test has two main problems, regarding the type I errors:

  1. Duncan’s tests is based on the Newman–Keuls procedure, which does not protect the family-wise error rate (though protecting the per-comparison alpha level)
  2. Duncan’s test intentionally raises the alpha levels (Type I error rate) in each step of the Newman–Keuls procedure (significance levels of ).

Therefore, it is advised not to use the procedure discussed.

Duncan later developed the Duncan–Waller test which is based on Bayesian principles. It uses the obtained value of F to estimate the prior probability of the null hypothesis being true.

Different approaches to the problem

If one still wishes to address the problem of finding similar subsets of group means, other solutions are found in literature.

Tukey's range test is commonly used to compare pairs of means, this procedure controls the family-wise error rate in the strong sense.

Another solution is to perform Student's t-test of all pairs of means, and then to use FDR Controlling procedure (to control the expected proportion of incorrectly rejected null hypotheses).

Other possible solutions, which do not include hypothesis testing, but result in a partition of subsets include Clustering & Hierarchical Clustering. These solutions differ from the approach presented in this method:

Related Research Articles

In particle physics, the Dirac equation is a relativistic wave equation derived by British physicist Paul Dirac in 1928. In its free form, or including electromagnetic interactions, it describes all spin-12 massive particles, called "Dirac particles", such as electrons and quarks for which parity is a symmetry. It is consistent with both the principles of quantum mechanics and the theory of special relativity, and was the first theory to account fully for special relativity in the context of quantum mechanics. It was validated by accounting for the fine structure of the hydrogen spectrum in a completely rigorous way.

<span class="mw-page-title-main">Stress–energy tensor</span> Tensor describing energy momentum density in spacetime

The stress–energy tensor, sometimes called the stress–energy–momentum tensor or the energy–momentum tensor, is a tensor physical quantity that describes the density and flux of energy and momentum in spacetime, generalizing the stress tensor of Newtonian physics. It is an attribute of matter, radiation, and non-gravitational force fields. This density and flux of energy and momentum are the sources of the gravitational field in the Einstein field equations of general relativity, just as mass density is the source of such a field in Newtonian gravity.

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In differential geometry, the Einstein tensor is used to express the curvature of a pseudo-Riemannian manifold. In general relativity, it occurs in the Einstein field equations for gravitation that describe spacetime curvature in a manner that is consistent with conservation of energy and momentum.

When studying and formulating Albert Einstein's theory of general relativity, various mathematical structures and techniques are utilized. The main tools used in this geometrical theory of gravitation are tensor fields defined on a Lorentzian manifold representing spacetime. This article is a general description of the mathematics of general relativity.

<span class="mw-page-title-main">Electromagnetic tensor</span> Mathematical object that describes the electromagnetic field in spacetime

In electromagnetism, the electromagnetic tensor or electromagnetic field tensor is a mathematical object that describes the electromagnetic field in spacetime. The field tensor was first used after the four-dimensional tensor formulation of special relativity was introduced by Hermann Minkowski. The tensor allows related physical laws to be written very concisely, and allows for the quantization of the electromagnetic field by Lagrangian formulation described below.

In general relativity, a geodesic generalizes the notion of a "straight line" to curved spacetime. Importantly, the world line of a particle free from all external, non-gravitational forces is a particular type of geodesic. In other words, a freely moving or falling particle always moves along a geodesic.

In physics, precisely in the study of the theory of general relativity and many alternatives to it, the post-Newtonian formalism is a calculational tool that expresses Einstein's (nonlinear) equations of gravity in terms of the lowest-order deviations from Newton's law of universal gravitation. This allows approximations to Einstein's equations to be made in the case of weak fields. Higher-order terms can be added to increase accuracy, but for strong fields, it may be preferable to solve the complete equations numerically. Some of these post-Newtonian approximations are expansions in a small parameter, which is the ratio of the velocity of the matter forming the gravitational field to the speed of light, which in this case is better called the speed of gravity. In the limit, when the fundamental speed of gravity becomes infinite, the post-Newtonian expansion reduces to Newton's law of gravity.

In mathematical physics, the gamma matrices, also called the Dirac matrices, are a set of conventional matrices with specific anticommutation relations that ensure they generate a matrix representation of the Clifford algebra It is also possible to define higher-dimensional gamma matrices. When interpreted as the matrices of the action of a set of orthogonal basis vectors for contravariant vectors in Minkowski space, the column vectors on which the matrices act become a space of spinors, on which the Clifford algebra of spacetime acts. This in turn makes it possible to represent infinitesimal spatial rotations and Lorentz boosts. Spinors facilitate spacetime computations in general, and in particular are fundamental to the Dirac equation for relativistic spin particles. Gamma matrices were introduced by Paul Dirac in 1928.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

A theoretical motivation for general relativity, including the motivation for the geodesic equation and the Einstein field equation, can be obtained from special relativity by examining the dynamics of particles in circular orbits about the Earth. A key advantage in examining circular orbits is that it is possible to know the solution of the Einstein Field Equation a priori. This provides a means to inform and verify the formalism.

<span class="mw-page-title-main">Maxwell's equations in curved spacetime</span> Electromagnetism in general relativity

In physics, Maxwell's equations in curved spacetime govern the dynamics of the electromagnetic field in curved spacetime or where one uses an arbitrary coordinate system. These equations can be viewed as a generalization of the vacuum Maxwell's equations which are normally formulated in the local coordinates of flat spacetime. But because general relativity dictates that the presence of electromagnetic fields induce curvature in spacetime, Maxwell's equations in flat spacetime should be viewed as a convenient approximation.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In mathematics, Ricci calculus constitutes the rules of index notation and manipulation for tensors and tensor fields on a differentiable manifold, with or without a metric tensor or connection. It is also the modern name for what used to be called the absolute differential calculus, developed by Gregorio Ricci-Curbastro in 1887–1896, and subsequently popularized in a paper written with his pupil Tullio Levi-Civita in 1900. Jan Arnoldus Schouten developed the modern notation and formalism for this mathematical framework, and made contributions to the theory, during its applications to general relativity and differential geometry in the early twentieth century.

In comparison with General Relativity, dynamic variables of metric-affine gravitation theory are both a pseudo-Riemannian metric and a general linear connection on a world manifold . Metric-affine gravitation theory has been suggested as a natural generalization of Einstein–Cartan theory of gravity with torsion where a linear connection obeys the condition that a covariant derivative of a metric equals zero.

<span class="mw-page-title-main">Dual graviton</span> Hypothetical particle found in supergravity

In theoretical physics, the dual graviton is a hypothetical elementary particle that is a dual of the graviton under electric-magnetic duality, as an S-duality, predicted by some formulations of supergravity in eleven dimensions.

Buchholz's psi-functions are a hierarchy of single-argument ordinal functions introduced by German mathematician Wilfried Buchholz in 1986. These functions are a simplified version of the -functions, but nevertheless have the same strength as those. Later on this approach was extended by Jäger and Schütte.

<span class="mw-page-title-main">Stable count distribution</span> Probability distribution

In probability theory, the stable count distribution is the conjugate prior of a one-sided stable distribution. This distribution was discovered by Stephen Lihn in his 2017 study of daily distributions of the S&P 500 and the VIX. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

<span class="mw-page-title-main">Kaniadakis Gamma distribution</span> Continuous probability distribution

The Kaniadakis Generalized Gamma distribution is a four-parameter family of continuous statistical distributions, supported on a semi-infinite interval [0,∞), which arising from the Kaniadakis statistics. It is one example of a Kaniadakis distribution. The κ-Gamma is a deformation of the Generalized Gamma distribution.

References

    Tables for the Use of Range and Studentized Range in Tests of Hypotheses