In statistics, Welch's t-test, or unequal variances t-test, is a two-sample location test which is used to test the (null) hypothesis that two populations have equal means. It is named for its creator, Bernard Lewis Welch, and is an adaptation of Student's t-test, [1] and is more reliable when the two samples have unequal variances and possibly unequal sample sizes. [2] [3] These tests are often referred to as "unpaired" or "independent samples" t-tests, as they are typically applied when the statistical units underlying the two samples being compared are non-overlapping. Given that Welch's t-test has been less popular than Student's t-test [2] and may be less familiar to readers, a more informative name is "Welch's unequal variances t-test" — or "unequal variances t-test" for brevity. [3] Sometimes, it is referred as Satterthwaite or Welch–Satterthwaite test.
Student's t-test assumes that the sample means being compared for two populations are normally distributed, and that the populations have equal variances. Welch's t-test is designed for unequal population variances, but the assumption of normality is maintained. [1] Welch's t-test is an approximate solution to the Behrens–Fisher problem.
Welch's t-test defines the statistic t by the following formula:
where and are the sample mean and its standard error, with denoting the corrected sample standard deviation, and sample size . Unlike in Student's t-test, the denominator is not based on a pooled variance estimate.
The degrees of freedom associated with this variance estimate is approximated using the Welch–Satterthwaite equation: [4]
This expression can be simplified when :
Here, is the degrees of freedom associated with the i-th variance estimate.
The statistic is approximately from the t-distribution since we have an approximation of the chi-square distribution. This approximation is better done when both and are larger than 5. [5] [6]
Once t and have been computed, these statistics can be used with the t-distribution to test one of two possible null hypotheses:
The approximate degrees of freedom are real numbers and used as such in statistics-oriented software, whereas they are rounded down to the nearest integer in spreadsheets.
Based on Welch's t-test, it's possible to also construct a two sided confidence interval for the difference in means (while not having to assume equal variances). This will be by taking:
Based on the above definitions of and .
Welch's t-test is more robust than Student's t-test and maintains type I error rates close to nominal for unequal variances and for unequal sample sizes under normality. Furthermore, the power of Welch's t-test comes close to that of Student's t-test, even when the population variances are equal and sample sizes are balanced. [2] Welch's t-test can be generalized to more than 2-samples, [7] which is more robust than one-way analysis of variance (ANOVA).
It is not recommended to pre-test for equal variances and then choose between Student's t-test or Welch's t-test. [8] Rather, Welch's t-test can be applied directly and without any substantial disadvantages to Student's t-test as noted above. Welch's t-test remains robust for skewed distributions and large sample sizes. [9] Reliability decreases for skewed distributions and smaller samples, where one could possibly perform Welch's t-test. [10]
Language/Program | Function | Documentation |
---|---|---|
LibreOffice | TTEST(Data1; Data2; Mode; Type) | [11] |
MATLAB | ttest2(data1, data2, 'Vartype', 'unequal') | [12] |
Microsoft Excel pre 2010 (Student's T Test) | TTEST(array1, array2, tails, type) | [13] |
Microsoft Excel 2010 and later (Student's T Test) | T.TEST(array1, array2, tails, type) | [14] |
Minitab | Accessed through menu | [15] |
Origin software | Results of the Welch t-test are automatically outputted in the result sheet when conducting a two-sample t-test (Statistics: Hypothesis Testing: Two-Sample t-test) | [16] |
SAS (Software) | Default output from proc ttest (labeled "Satterthwaite") | |
Python (through 3rd-party library SciPy) | scipy.stats.ttest_ind(a, b, equal_var=False) | [17] |
R | t.test(data1, data2, var.equal = FALSE) | [18] |
JavaScript | ttest2(data1, data2) | [19] |
Haskell | Statistics.Test.StudentT.welchTTest SamplesDiffer data1 data2 | [20] |
JMP | Oneway( Y( YColumn), X( XColumn), Unequal Variances( 1 ) ); | [21] |
Julia | UnequalVarianceTTest(data1, data2) | [22] |
Stata | ttestvarname1==varname2,welch | [23] |
Google Sheets | TTEST(range1, range2, tails, type) | [24] |
GraphPad Prism | It is a choice on the t test dialog. | |
IBM SPSS Statistics | An option in the menu | [25] [26] |
GNU Octave | welch_test(x, y) | [27] |
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
In probability theory and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.
In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.
In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.
Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test because the latter converges to the former as the size of the dataset increases.
In statistics, a studentized residual is the dimensionless ratio resulting from the division of a residual by an estimate of its standard deviation, both expressed in the same units. It is a form of a Student's t-statistic, with the estimate of error varying between points.
In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.
In statistics, the Behrens–Fisher problem, named after Walter-Ulrich Behrens and Ronald Fisher, is the problem of interval estimation and hypothesis testing concerning the difference between the means of two normally distributed populations when the variances of the two populations are not assumed to be equal, based on two independent samples.
The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.
In statistics and uncertainty analysis, the Welch–Satterthwaite equation is used to calculate an approximation to the effective degrees of freedom of a linear combination of independent sample variances, also known as the pooled degrees of freedom, corresponding to the pooled variance.
In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot need not be a statistic — the function and its value can depend on the parameters of the model, but its distribution must not. If it is a statistic, then it is known as an ancillary statistic.
In probability and statistics, studentized range distribution is the continuous probability distribution of the studentized range of an i.i.d. sample from a normally distributed population.
In statistics, one-way analysis of variance is a technique to compare whether two or more samples' means are significantly different. This analysis of variance technique requires a numeric response variable "Y" and a single explanatory variable "X", hence "one-way".
Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to correctly interpret the statistical significance of the difference between means that have been selected for comparison because of their extreme values.
In statistics, the reduced chi-square statistic is used extensively in goodness of fit testing. It is also known as mean squared weighted deviation (MSWD) in isotopic dating and variance of unit weight in the context of weighted least squares.
In the theory of stochastic processes, a part of the mathematical theory of probability, the variance gamma (VG) process, also known as Laplace motion, is a Lévy process determined by a random time change. The process has finite moments, distinguishing it from many Lévy processes. There is no diffusion component in the VG process and it is thus a pure jump process. The increments are independent and follow a variance-gamma distribution, which is a generalization of the Laplace distribution.
In statistics, the multivariate Behrens–Fisher problem is the problem of testing for the equality of means from two multivariate normal distributions when the covariance matrices are unknown and possibly not equal. Since this is a generalization of the univariate Behrens-Fisher problem, it inherits all of the difficulties that arise in the univariate problem.