Dunnett's test

Last updated

In statistics, Dunnett's test is a multiple comparison procedure [1] developed by Canadian statistician Charles Dunnett [2] to compare each of a number of treatments with a single control. [3] [4] Multiple comparisons to a control are also referred to as many-to-one comparisons.

Contents

History

Dunnett's test was developed in 1955; [5] an updated table of critical values was published in 1964. [6]

Multiple comparisons problem

The multiple comparisons, multiplicity or multiple testing problem occurs when one considers a set of statistical inferences simultaneously or infers a subset of parameters selected based on the observed values. The major issue in any discussion of multiple-comparison procedures is the question of the probability of Type I errors. Most differences among alternative techniques result from different approaches to the question of how to control these errors. The problem is in part technical; but it is really much more a subjective question of how you want to define the error rate and how large you are willing to let the maximum possible error rate be. [7] Dunnett's test are well known and widely used in multiple comparison procedure for simultaneously comparing, by interval estimation or hypothesis testing, all active treatments with a control when sampling from a distribution where the normality assumption is reasonable. Dunnett's test is designed to hold the family-wise error rate at or below when performing multiple comparisons of treatment group with control. [7]

Uses of Dunnett’s test

The original work on Multiple Comparisons problem was made by Tukey and Scheffé. Their method was a general one, which considered all kinds of pairwise comparisons. [7] Tukey's and Scheffé's methods allow any number of comparisons among a set of sample means. On the other hand, Dunnett's test only compares one group with the others, addressing a special case of multiple comparisons problem—pairwise comparisons of multiple treatment groups with a single control group. In the general case, where we compare each of the pairs, we make comparisons (where k is the number of groups), but in the treatment vs. controls case we will make only comparisons. If in the case of treatment and control groups we were to use the more general Tukey's and Scheffé's methods, they can result in unnecessarily wide confidence intervals. Dunnett's test takes into consideration the special structure of comparing treatment against control, yielding narrower confidence intervals. [5]
It is very common to use Dunnett's test in medical experiments, for example comparing blood count measurements on three groups of animals, one of which served as a control while the other two were treated with two different drugs. Another common use of this method is among agronomists: agronomists may want to study the effect of certain chemicals added to the soil on crop yield, so they will leave some plots untreated (control plots) and compare them to the plots where chemicals were added to the soil (treatment plots).

Formal description of Dunnett's test

Dunnett's test is performed by computing a Student's t-statistic for each experimental, or treatment, group where the statistic compares the treatment group to a single control group. [8] [9] Since each comparison has the same control in common, the procedure incorporates the dependencies between these comparisons. In particular, the t-statistics are all derived from the same estimate of the error variance which is obtained by pooling the sums of squares for error across all (treatment and control) groups. The formal test statistic for Dunnett's test is either the largest in absolute value of these t-statistics (if a two-tailed test is required), or the most negative or most positive of the t-statistics (if a one-tailed test is required).

In Dunnett's test we can use a common table of critical values, but more flexible options are nowadays readily available in many statistics packages. The critical values for any given percentage point depend on: whether a one- or- two-tailed test is performed; the number of groups being compared; the overall number of trials.

Assumptions

The analysis considers the case where the results of the experiment are numerical, and the experiment is performed to compare p treatments with a control group. The results can be summarized as a set of calculated means of the sets of observations, , while are referring to the treatment and is referring to the control set of observations, and is an independent estimate of the common standard deviation of all sets of observations. All of the sets of observations are assumed to be independently and normally distributed with a common variance and means . There is also an assumption that there is an available estimate for .

Calculation

Dunnett's test's calculation is a procedure that is based on calculating confidence statements about the true or the expected values of the differences , thus the differences between treatment groups' mean and control group's mean. This procedure ensures that the probability of all statements being simultaneously correct is equal to a specified value, . When calculating one sided upper (or lower) confidence interval for the true value of the difference between the mean of the treatment and the control group, constitutes the probability that this actual value will be less than the upper (or greater than the lower) limit of that interval. When calculating two-sided confidence interval, constitutes the probability that the true value will be between the upper and the lower limits.

First, we will denote the available N observations by when and and estimate the common variance by, for example: when is the mean of group and is the number of observations in group , and degrees of freedom. As mentioned before, we would like to obtain separate confidence limits for each of the differences such that the probability that all confidence intervals will contain the corresponding is equal to .

We will consider the general case where there are treatment groups and one control group. We will write:

we will also write: , which follows the Student's t-statistic distribution with n degrees of freedom. The lower confidence limits with joint confidence coefficient for the treatment effects will be given by:

and the constants are chosen so that . Similarly, the upper limits will be given by:

For bounding in both directions, the following interval might be taken:

when are chosen to satisfy . The solution to those particular values of for two sided test and for one sided test is given in the tables. [5] An updated table of critical values was published in 1964. [6]

Example: Breaking Strength of Fabric

The following example was adapted from one given by Villars and was presented in Dunnett's original paper. [5] The data represent measurements on the breaking strength of fabric treated by three different chemical processes compared with a standard method of manufacture. [10]

Breaking Strength (lbs)
StandardProcess 1Process 2Process 3
155555550
247644944
348645241
Means50615245
Variance1927921

Dunnett's Test can be calculated by applying the following steps:

1. Input Data with Means and Variances:

  • Collect measurements for each group (standard and treatment processes). See the data in the above table for each group's raw numbers, means, and variances.

2. Calculate Pooled Variance :

  • Compute the pooled variance across all groups. E.g.,
.

3. Calculate Standard Deviation :

  • Take the square root of the average variance. E.g.,
.

4. Calculate Standard Error:

  • The following formula gives the standard error for the difference of two means. E.g.,
.

5. Determine Critical Value :

  • Use Dunnett's tables to find for the given degrees of freedom and confidence level. E.g.,
For and :
One-sided:
Two-sided:

6. the quantity which must be added to and/or subtracted from the observed differences between the means to give their confidence limits is denoted as (this was termed "allowance" by Tukey), and can be calculated as follows:

  • Multiply by the standard error for the difference of two means. E.g.
One-sided:
Two-sided:

7. Compute Confidence Limits:

  • Calculate the confidence limits for each process compared to the standard. E.g.,
One-sided Limits:
Process 1:
Process 2:
Process 3:
Two-sided Limits:
Process 1:
Process 2:
Process 3:

8. Draw Conclusions:

  • Based on the computed confidence limits, make conclusions about each process compared to the standard. E.g.,
One-sided:
Process 1: Breaking strength exceeds the standard by at least 2.39 lbs.
Process 2: Breaking strength does not exceed the standard (negative value).
Process 3: Breaking strength does not exceed the standard (negative value).
Two-sided:
Process 1: Breaking strength exceeds the standard by between 0.75 lbs and 21.25 lbs.
Process 2: Breaking strength is between -8.25 lbs and 12.25 lbs (may or may not exceed the standard).
Process 3: Breaking strength is between -15.25 lbs and 5.25 lbs (may or may not exceed the standard).

Related Research Articles

<span class="mw-page-title-main">Natural logarithm</span> Logarithm to the base of the mathematical constant e

The natural logarithm of a number is its logarithm to the base of the mathematical constant e, which is an irrational and transcendental number approximately equal to 2.718281828459. The natural logarithm of x is generally written as ln x, logex, or sometimes, if the base e is implicit, simply log x. Parentheses are sometimes added for clarity, giving ln(x), loge(x), or log(x). This is done particularly when the argument to the logarithm is not a single symbol, so as to prevent ambiguity.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Allan variance</span> Measure of frequency stability in clocks and oscillators

The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as . The Allan deviation (ADEV), also known as sigma-tau, is the square root of the Allan variance, .

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

In mathematics, taking the nth root is an operation involving two numbers, the radicand and the index or degree. Taking the nth root is written as , where x is the radicand and n is the index. This is pronounced as "the nth root of x". The definition then of an nth root of a number x is a number r which, when raised to the power of the positive integer n, yields x:

In statistics, the power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a true positive detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

In the calculus of variations and classical mechanics, the Euler–Lagrange equations are a system of second-order ordinary differential equations whose solutions are stationary points of the given action functional. The equations were discovered in the 1750s by Swiss mathematician Leonhard Euler and Italian mathematician Joseph-Louis Lagrange.

<i>F</i>-test Statistical hypothesis test, mostly using multiple restrictions

An F-test is any statistical test used to compare the variances of two samples or the ratio of variances between multiple samples. The test statistic, random variable F, is used to determine if the tested data has an F-distribution under the true null hypothesis, and true customary assumptions about the error term (ε). It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact "F-tests" mainly arise when the models have been fitted to the data using least squares. The name was coined by George W. Snedecor, in honour of Ronald Fisher. Fisher initially developed the statistic as the variance ratio in the 1920s.

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test since the latter converges to the former as the size of the dataset increases.

Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complex studies, different sample sizes may be allocated, such as in stratified surveys or experimental designs with multiple treatment groups. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

In continuum mechanics, the finite strain theory—also called large strain theory, or large deformation theory—deals with deformations in which strains and/or rotations are large enough to invalidate assumptions inherent in infinitesimal strain theory. In this case, the undeformed and deformed configurations of the continuum are significantly different, requiring a clear distinction between them. This is commonly the case with elastomers, plastically deforming materials and other fluids and biological soft tissue.

Methods of computing square roots are algorithms for approximating the non-negative square root of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.

In mathematics, an infinite periodic continued fraction is a continued fraction that can be placed in the form

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.

A paired difference test, better known as a paired comparison, is a type of location test that is used when comparing two sets of paired measurements to assess whether their population means differ. A paired difference test is designed for situations where there is dependence between pairs of measurements. That applies in a within-subjects study design, i.e., in a study where the same set of subjects undergo both of the conditions being compared.

The Newman–Keuls or Student–Newman–Keuls (SNK)method is a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. It was named after Student (1927), D. Newman, and M. Keuls. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). The Newman–Keuls method is similar to Tukey's range test as both procedures use studentized range statistics. Unlike Tukey's range test, the Newman–Keuls method uses different critical values for different pairs of mean comparisons. Thus, the procedure is more likely to reveal significant differences between group means and to commit type I errors by incorrectly rejecting a null hypothesis when it is true. In other words, the Neuman-Keuls procedure is more powerful but less conservative than Tukey's range test.

In statistics, the strictly standardized mean difference (SSMD) is a measure of effect size. It is the mean divided by the standard deviation of a difference between two random values each from one of two groups. It was initially proposed for quality control and hit selection in high-throughput screening (HTS) and has become a statistical parameter measuring effect sizes for the comparison of any two groups with random values.

In statistics, particularly regression analysis, the Working–Hotelling procedure, named after Holbrook Working and Harold Hotelling, is a method of simultaneous estimation in linear regression models. One of the first developments in simultaneous inference, it was devised by Working and Hotelling for the simple linear regression model in 1929. It provides a confidence region for multiple mean responses, that is, it gives the upper and lower bounds of more than one value of a dependent variable at several levels of the independent variables at a certain confidence level. The resulting confidence bands are known as the Working–Hotelling–Scheffé confidence bands.

References

  1. Upton G. & Cook I. (2006.) A Dictionary of Statistics, 2e, Oxford University Press, Oxford, United Kingdom.
  2. Rumsey, Deborah (2009-08-19). Statistics II for Dummies . Wiley. p.  186 . Retrieved 2012-08-22. dunnett's test developed by.
  3. Everett B. S. & Shrondal A. (2010.) The Cambridge Dictionary of Statistics, 4e, Cambridge University Press, Cambridge, United Kingdom.
  4. "Statistical Software | University of Kentucky Information Technology". Uky.edu. Archived from the original on 2012-07-31. Retrieved 2012-08-22.
  5. 1 2 3 4 Dunnett C. W. (1955). "A multiple comparison procedure for comparing several treatments with a control". Journal of the American Statistical Association. 50: 1096–1121. doi:10.1080/01621459.1955.10501294.
  6. 1 2 Dunnett C. W. (1964.) "New tables for multiple comparisons with a control", Biometrics, 20:482491.
  7. 1 2 3 Howell, David C. Statistical Methods for Psychology (8th ed.).
  8. Dunnett's test, HyperStat Online: An Introductory Statistics Textbook and Online Tutorial for Help in Statistics Courses
  9. Mechanics of Different Tests - Biostatistics BI 345 Archived 2010-06-01 at the Wayback Machine , Saint Anselm College
  10. Villars, Donald Statler (1951). Statistical Design and Analysis of Experiments for Development Research. Dubuque, Iowa: Wm. C. Brown Co.