Welch's t-test

Last updated

In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, is an adaptation of Student's t-test, [1] and is more reliable when the two samples have unequal variances and possibly unequal sample sizes. [2] [3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test [2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity. [3]

Contents

Assumptions

Student's t-test assumes that the sample means being compared for two populations are normally distributed, and that the populations have equal variances. Welch's t-test is designed for unequal population variances, but the assumption of normality is maintained. [1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.

Calculations

Welch's t-test defines the statistic t by the following formula:

where and are the sample mean and its standard error, with denoting the corrected sample standard deviation, and sample size . Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.

The degrees of freedom   associated with this variance estimate is approximated using the Welch–Satterthwaite equation: [4]

This expression can be simplified when :

Here, is the degrees of freedom associated with the i-th variance estimate.

The statistic is approximately from the t-distribution since we have an approximation of the chi-square distribution. This approximation is better done when both and are larger than 5. [5] [6]

Statistical test

Once t and have been computed, these statistics can be used with the t-distribution to test one of two possible null hypotheses:

The approximate degrees of freedom are real numbers and used as such in statistics-oriented software, whereas they are rounded down to the nearest integer in spreadsheets.

Advantages and limitations

Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced. [2] Welch's t-test can be generalized to more than 2-samples, [7] which is more robust than one-way analysis of variance (ANOVA).

It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test. [8] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes. [9] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test. [10]

Examples

The following three examples compare Welch's t-test and Student's t-test. Samples are from random normal distributions using the R programming language.

For all three examples, the population means were and .

The first example is for unequal but near variances (, ) and equal sample sizes (). Let A1 and A2 denote two random samples:

The second example is for unequal variances (, ) and unequal sample sizes (, ). The smaller sample has the larger variance:

The third example is for unequal variances (, ) and unequal sample sizes (, ). The larger sample has the larger variance:

Reference p-values were obtained by simulating the distributions of the t statistics for the null hypothesis of equal population means (). Results are summarised in the table below, with two-tailed p-values:

Sample A1Sample A2Student's t-testWelch's t-test
Example
11520.87.91523.03.8−2.46280.0210.021−2.4624.90.0210.017
21020.69.02022.10.9−2.10280.0450.150−1.579.90.1490.144
31019.41.42021.617.1−1.64280.1100.036−2.2224.50.0360.042

Welch's t-test and Student's t-test gave identical results when the two samples have similar variances and sample sizes (Example 1). But note that even if you sample data from populations with identical variances, the sample variances will differ, as will the results of the two t-tests. So with actual data, the two tests will almost always give somewhat different results.

For unequal variances, Student's t-test gave a low p-value when the smaller sample had a larger variance (Example 2) and a high p-value when the larger sample had a larger variance (Example 3). For unequal variances, Welch's t-test gave p-values close to simulated p-values.

Software implementations

Language/ProgramFunctionDocumentation
LibreOffice TTEST(Data1; Data2; Mode; Type) [11]
MATLAB ttest2(data1, data2, 'Vartype', 'unequal') [12]
Microsoft Excel pre 2010 (Student's T Test)TTEST(array1, array2, tails, type) [13]
Microsoft Excel 2010 and later (Student's T Test)T.TEST(array1, array2, tails, type) [14]
Minitab Accessed through menu [15]
SAS (Software) Default output from proc ttest (labeled "Satterthwaite")
Python (through 3rd-party library SciPy)scipy.stats.ttest_ind(a, b, equal_var=False) [16]
R t.test(data1, data2) [17]
Haskell Statistics.Test.StudentT.welchTTest SamplesDiffer data1 data2 [18]
JMP Oneway( Y( YColumn), X( XColumn), Unequal Variances( 1 ) ); [19]
Julia UnequalVarianceTTest(data1, data2) [20]
Stata ttestvarname1==varname2,welch [21]
Google Sheets TTEST(range1, range2, tails, type) [22]
GraphPad Prism It is a choice on the t test dialog.
IBM SPSS Statistics An option in the menu [23] [24]
GNU Octave welch_test(x, y) [25]

See also

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

<span class="mw-page-title-main">Allan variance</span> Measure of frequency stability in clocks and oscillators

The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as . The Allan deviation (ADEV), also known as sigma-tau, is the square root of the Allan variance, .

Students <i>t</i>-distribution Probability distribution

In probability and statistics, Student's t-distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

<span class="mw-page-title-main">Pearson correlation coefficient</span> Measure of linear correlation

In statistics, the Pearson correlation coefficient (PCC) is a correlation coefficient that measures linear correlation between two sets of data. It is the ratio between the covariance of two variables and the product of their standard deviations; thus, it is essentially a normalized measurement of the covariance, such that the result always has a value between −1 and 1. As with covariance itself, the measure can only reflect a linear correlation of variables, and ignores many other types of relationships or correlations. As a simple example, one would expect the age and height of a sample of teenagers from a high school to have a Pearson correlation coefficient significantly greater than 0, but less than 1.

A t-test is a type of statistical analysis used to compare the averages of two groups and determine whether the differences between them are more likely to arise from random chance. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are different.

In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's t-statistic, with the estimate of error varying between points.

<span class="mw-page-title-main">Scaled inverse chi-squared distribution</span> Probability distribution

The scaled inverse chi-squared distribution is the distribution for x = 1/s2, where s2 is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

<span class="mw-page-title-main">Rice distribution</span> Probability distribution

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

In statistics, the Behrens–Fisher problem, named after Walter-Ulrich Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its value can depend on the parameters of the model, but its distribution must not. If it is a statistic, then it is known as an ancillary statistic.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

In statistics, the reduced chi-square statistic is used extensively in goodness of fit testing. It is also known as mean squared weighted deviation (MSWD) in isotopic dating and variance of unit weight in the context of weighted least squares.

<span class="mw-page-title-main">Variance gamma process</span> Concept in probability

In the theory of stochastic processes, a part of the mathematical theory of probability, the variance gamma process (VG), also known as Laplace motion, is a Lévy process determined by a random time change. The process has finite moments distinguishing it from many Lévy processes. There is no diffusion component in the VG process and it is thus a pure jump process. The increments are independent and follow a variance-gamma distribution, which is a generalization of the Laplace distribution.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

References

  1. 1 2 Welch, B. L. (1947). "The generalization of "Student's" problem when several different population variances are involved". Biometrika . 34 (1–2): 28–35. doi:10.1093/biomet/34.1-2.28. MR   0019277. PMID   20287819.
  2. 1 2 3 Ruxton, G. D. (2006). "The unequal variance t-test is an underused alternative to Student's t-test and the Mann–Whitney U test". Behavioral Ecology . 17 (4): 688–690. doi: 10.1093/beheco/ark016 .
  3. 1 2 Derrick, B; Toher, D; White, P (2016). "Why Welchs test is Type I error robust" (PDF). The Quantitative Methods for Psychology. 12 (1): 30–38. doi: 10.20982/tqmp.12.1.p030 .
  4. 7.3.1. Do two processes have the same mean?, Engineering Statistics Handbook, NIST. (Online source accessed 2021-07-30.)
  5. Allwood, Michael (2008). "The Satterthwaite Formula for Degrees of Freedom in the Two-Sample t-Test" (PDF). p. 6.
  6. Yates; Moore; Starnes (2008). The Practice of Statistics (3rd ed.). New York: W.H. Freeman and Company. p. 792. ISBN   9780716773092.
  7. Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR   2332579.
  8. Zimmerman, D. W. (2004). "A note on preliminary tests of equality of variances". British Journal of Mathematical and Statistical Psychology . 57 (Pt 1): 173–181. doi:10.1348/000711004849222. PMID   15171807.
  9. Fagerland, M. W. (2012). "t-tests, non-parametric tests, and large studies—a paradox of statistical practice?". BMC Medical Research Methodology. 12: 78. doi: 10.1186/1471-2288-12-78 . PMC   3445820 . PMID   22697476.
  10. Fagerland, M. W.; Sandvik, L. (2009). "Performance of five two-sample location tests for skewed distributions with unequal variances". Contemporary Clinical Trials . 30 (5): 490–496. doi:10.1016/j.cct.2009.06.007. PMID   19577012.
  11. "Statistical Functions Part Five - LibreOffice Help".
  12. "Two-sample t-test - MATLAB ttest2 - MathWorks United Kingdom".
  13. "TTEST - Excel - Microsoft Office". office.microsoft.com. Archived from the original on 2010-06-13.
  14. "T.TEST function".
  15. Overview for 2-Sample t - Minitab: — official documentation for Minitab version 18. Accessed 2020-09-19.
  16. "Scipy.stats.ttest_ind — SciPy v1.7.1 Manual".
  17. "R: Student's t-Test".
  18. "Statistics.Test.StudentT".
  19. "Index of /Support/Help".
  20. "Welcome to Read the Docs — HypothesisTests.jl latest documentation".
  21. "Stata 17 help for ttest".
  22. "T.TEST - Docs Editors Help".
  23. Jeremy Miles: Unequal variances t-test or U Mann-Whitney test?, Accessed 2014-04-11
  24. One-Sample Test — Official documentation for SPSS Statistics version 24. Accessed 2019-01-22.
  25. "Function Reference: Welch_test".