Statistical test of whether two populations have equal means
Welch's t-test, or unequal variances t-test in statistics is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test,[1] and is more reliable when the two samples have unequal variances and possibly unequal sample sizes.[2][3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test[2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity.[3] Sometimes, it is referred as Satterthwaite or Welch–Satterthwaite test.
Student's t-test assumes that the sample means being compared for two populations are normally distributed, and that the populations have equal variances. Welch's t-test is designed for unequal population variances, but the assumption of normality is maintained.[1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.
Calculations
Welch's t-test defines the statistic t by the following formula:
where is the degrees of freedom associated with the i-th variance estimate.
This expression can be simplified when :
where is the degrees of freedom associated with each variance estimate.
The statistic is approximately from the t-distribution, since we have an approximation of the chi-square distribution. This approximation is better done when both and are larger than 5.[5][6]
Statistical test
Once t and have been computed, these statistics can be used with the t-distribution to test one of two possible null hypotheses:
A two-tailed test, in which the two population means are equal; or
A one-tailed test, in which one of the population means is greater than or equal to the other.
The approximate degrees of freedom are real numbers and used as such in statistics-oriented software, whereas they are rounded down to the nearest integer in spreadsheets.
Confidence intervals
Based on Welch's t-test, it's possible to also construct a two sided confidence interval for the difference in means (while not having to assume equal variances). This will be by taking:
Based on the above definitions of and .
Advantages and limitations
Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced.[2] Welch's t-test can be generalized to more than 2-samples,[7] which is more robust than one-way analysis of variance (ANOVA).
It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test.[8] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes.[9] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test.[10]
Resampling-based versions
A permutation and bootstrapped version of the Welch t-test has also been developed to address distributional requirements without relying on large sample sizes. The Welch t statistic additionally satisfies the requirement for exchangeable samples under permutation testing in the presence of unequal variances - an instance of the Behrens–Fisher problem. This approach has been extensively discussed in the statistical literature.[11][12][13][14][15][16][17]
Results of the Welch t-test are automatically outputted in the result sheet when conducting a two-sample t-test (Statistics: Hypothesis Testing: Two-Sample t-test)
Implementations of resampling-based variants of the Welch t-test are available in R, including permutation procedures in the MKinfer (perm.t.test) and nptest (np.loc.test) packages, as well as a bootstrap version in MKinfer (boot.t.test).[35][36]
↑ Yates; Moore; Starnes (2008). The Practice of Statistics (3rded.). New York: W. H. Freeman and Company. p.792. ISBN9780716773092.
↑ Welch, B. L. (1951). "On the Comparison of Several Mean Values: An Alternative Approach". Biometrika. 38 (3/4): 330–336. doi:10.2307/2332579. JSTOR2332579.
↑ Janssen, Arnold (1997). "Studentized permutation tests for non-i.i.d. hypotheses and the generalized Behrens–Fisher problem". Statistics & Probability Letters. 36: 9–21. doi:10.1016/S0167-7152(97)00043-6. MR1491070.
↑ Janssen, Arnold (2005). "Resampling Student's t-type statistics". Annals of the Institute of Statistical Mathematics. 57: 507–529. doi:10.1007/BF02509237.
↑ Janssen, Arnold; Pauls, Thorsten (2003). "How do bootstrap and permutation tests work?". The Annals of Statistics. 31 (3): 768–806. doi:10.1214/aos/1056562462.
↑ Chung, EunYi; Romano, Joseph P. (2013). "Exact and asymptotically robust permutation tests". The Annals of Statistics. 41 (2): 484–507. doi:10.1214/13-AOS1090.
↑ Noguchi, Koji; Konietschke, Frank; Marmolejo-Ramos, Fernando (2021). "Permutation tests are robust and powerful at 0.5% and 5% significance levels". Communications in Statistics – Simulation and Computation. doi:10.3758/s13428-021-01595-5.
↑ Janssen, Arnold; Pauls, Thorsten (2005). "A Monte Carlo comparison of studentized bootstrap and permutation tests for heteroscedastic two-sample problems". Statistics & Decisions. doi:10.1007/BF02741303.
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.