Normality test

Last updated

In statistics, normality tests are used to determine if a data set is well-modeled by a normal distribution and to compute how likely it is for a random variable underlying the data set to be normally distributed.

Contents

More precisely, the tests are a form of model selection, and can be interpreted several ways, depending on one's interpretations of probability:

A normality test is used to determine whether sample data has been drawn from a normally distributed population (within some tolerance). A number of statistical tests, such as the Student's t-test and the one-way and two-way ANOVA, require a normally distributed sample population.

Graphical methods

An informal approach to testing normality is to compare a histogram of the sample data to a normal probability curve. The empirical distribution of the data (the histogram) should be bell-shaped and resemble the normal distribution. This might be difficult to see if the sample is small. In this case one might proceed by regressing the data against the quantiles of a normal distribution with the same mean and variance as the sample. Lack of fit to the regression line suggests a departure from normality (see Anderson Darling coefficient and minitab).

A graphical tool for assessing normality is the normal probability plot, a quantile-quantile plot (QQ plot) of the standardized data against the standard normal distribution. Here the correlation between the sample data and normal quantiles (a measure of the goodness of fit) measures how well the data are modeled by a normal distribution. For normal data the points plotted in the QQ plot should fall approximately on a straight line, indicating high positive correlation. These plots are easy to interpret and also have the benefit that outliers are easily identified.

Back-of-the-envelope test

Simple back-of-the-envelope test takes the sample maximum and minimum and computes their z-score, or more properly t-statistic (number of sample standard deviations that a sample is above or below the sample mean), and compares it to the 68–95–99.7 rule: if one has a 3σ event (properly, a 3s event) and substantially fewer than 300 samples, or a 4s event and substantially fewer than 15,000 samples, then a normal distribution will understate the maximum magnitude of deviations in the sample data.

This test is useful in cases where one faces kurtosis risk – where large deviations matter – and has the benefits that it is very easy to compute and to communicate: non-statisticians can easily grasp that "6σ events are very rare in normal distributions".

Frequentist tests

Tests of univariate normality include the following:

A 2011 study concludes that Shapiro–Wilk has the best power for a given significance, followed closely by Anderson–Darling when comparing the Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors, and Anderson–Darling tests. [1]

Some published works recommend the Jarque–Bera test, [2] [3] but the test has weakness. In particular, the test has low power for distributions with short tails, especially for bimodal distributions. [4] Some authors have declined to include its results in their studies because of its poor overall performance. [5]

Historically, the third and fourth standardized moments (skewness and kurtosis) were some of the earliest tests for normality. The Lin–Mudholkar test specifically targets asymmetric alternatives. [6] The Jarque–Bera test is itself derived from skewness and kurtosis estimates. Mardia's multivariate skewness and kurtosis tests generalize the moment tests to the multivariate case. [7] Other early test statistics include the ratio of the mean absolute deviation to the standard deviation and of the range to the standard deviation. [8]

More recent tests of normality include the energy test [9] (Székely and Rizzo) and the tests based on the empirical characteristic function (ECF) (e.g. Epps and Pulley, [10] Henze–Zirkler, [11] BHEP test [12] ). The energy and the ECF tests are powerful tests that apply for testing univariate or multivariate normality and are statistically consistent against general alternatives.

The normal distribution has the highest entropy of any distribution for a given standard deviation. There are a number of normality tests based on this property, the first attributable to Vasicek. [13]

Bayesian tests

Kullback–Leibler divergences between the whole posterior distributions of the slope and variance do not indicate non-normality. However, the ratio of expectations of these posteriors and the expectation of the ratios give similar results to the Shapiro–Wilk statistic except for very small samples, when non-informative priors are used. [14]

Spiegelhalter suggests using a Bayes factor to compare normality with a different class of distributional alternatives. [15] This approach has been extended by Farrell and Rogers-Stewart. [16]

Applications

One application of normality tests is to the residuals from a linear regression model. [17] If they are not normally distributed, the residuals should not be used in Z tests or in any other tests derived from the normal distribution, such as t tests, F tests and chi-squared tests. If the residuals are not normally distributed, then the dependent variable or at least one explanatory variable may have the wrong functional form, or important variables may be missing, etc. Correcting one or more of these systematic errors may produce residuals that are normally distributed; in other words, non-normality of residuals is often a model deficiency rather than a data problem. [18]

See also

Notes

  1. Razali, Nornadiah; Wah, Yap Bee (2011). "Power comparisons of Shapiro–Wilk, Kolmogorov–Smirnov, Lilliefors and Anderson–Darling tests" (PDF). Journal of Statistical Modeling and Analytics. 2 (1): 21–33. Archived from the original (PDF) on 2015-06-30.
  2. Judge, George G.; Griffiths, W. E.; Hill, R. Carter; Lütkepohl, Helmut; Lee, T. (1988). Introduction to the Theory and Practice of Econometrics (Second ed.). Wiley. pp. 890–892. ISBN   978-0-471-08277-4.
  3. Gujarati, Damodar N. (2002). Basic Econometrics (Fourth ed.). McGraw Hill. pp. 147–148. ISBN   978-0-07-123017-9.
  4. Thadewald, Thorsten; Büning, Herbert (1 January 2007). "Jarque–Bera Test and its Competitors for Testing Normality – A Power Comparison". Journal of Applied Statistics. 34 (1): 87–105. CiteSeerX   10.1.1.507.1186 . doi:10.1080/02664760600994539. S2CID   13866566.
  5. Sürücü, Barış (1 September 2008). "A power comparison and simulation study of goodness-of-fit tests". Computers & Mathematics with Applications. 56 (6): 1617–1625. doi: 10.1016/j.camwa.2008.03.010 .
  6. Lin, C. C.; Mudholkar, G. S. (1980). "A simple test for normality against asymmetric alternatives". Biometrika. 67 (2): 455–461. doi:10.1093/biomet/67.2.455.
  7. Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika 57, 519–530.
  8. Filliben, J. J. (February 1975). "The Probability Plot Correlation Coefficient Test for Normality". Technometrics. 17 (1): 111–117. doi:10.2307/1268008. JSTOR   1268008.
  9. Székely, G. J. and Rizzo, M. L. (2005) A new test for multivariate normality, Journal of Multivariate Analysis 93, 58–80.
  10. Epps, T. W., and Pulley, L. B. (1983). A test for normality based on the empirical characteristic function. Biometrika 70, 723–726.
  11. Henze, N., and Zirkler, B. (1990). A class of invariant and consistent tests for multivariate normality. Communications in Statistics – Theory and Methods 19, 3595–3617.
  12. Henze, N., and Wagner, T. (1997). A new approach to the BHEP tests for multivariate normality. Journal of Multivariate Analysis 62, 1–23.
  13. Vasicek, Oldrich (1976). "A Test for Normality Based on Sample Entropy". Journal of the Royal Statistical Society. Series B (Methodological). 38 (1): 54–59. JSTOR   2984828.
  14. Young K. D. S. (1993), "Bayesian diagnostics for checking assumptions of normality". Journal of Statistical Computation and Simulation, 47 (3–4),167–180
  15. Spiegelhalter, D.J. (1980). An omnibus test for normality for small samples. Biometrika, 67, 493–496. doi : 10.1093/biomet/67.2.493
  16. Farrell, P.J., Rogers-Stewart, K. (2006) "Comprehensive study of tests for normality and symmetry: extending the Spiegelhalter test". Journal of Statistical Computation and Simulation, 76(9), 803 – 816. doi : 10.1080/10629360500109023
  17. Portney, L.G. & Watkins, M.P. (2000). Foundations of clinical research: applications to practice. New Jersey: Prentice Hall Health. pp. 516–517. ISBN   0838526950.{{cite book}}: CS1 maint: multiple names: authors list (link)
  18. Pek, Jolynn; Wong, Octavia; Wong, Augustine C. M. (2018-11-06). "How to Address Non-normality: A Taxonomy of Approaches, Reviewed, and Illustrated". Frontiers in Psychology. 9: 2104. doi: 10.3389/fpsyg.2018.02104 . ISSN   1664-1078. PMC   6232275 . PMID   30459683.

Further reading


Related Research Articles

<span class="mw-page-title-main">Kolmogorov–Smirnov test</span> Non-parametric statistical test between two distributions

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to test whether a sample came from a given reference probability distribution, or to test whether two samples came from the same distribution. Intuitively, the test provides a method to qualitatively answer the question "How likely is it that we would see a collection of samples like this if they were drawn from that probability distribution?" or, in the second case, "How likely is it that we would see two sets of samples like this if they were drawn from the same probability distribution?". It is named after Andrey Kolmogorov and Nikolai Smirnov.

In probability theory and statistics, kurtosis is a measure of the "tailedness" of the probability distribution of a real-valued random variable. Like skewness, kurtosis describes a particular aspect of a probability distribution. There are different ways to quantify kurtosis for a theoretical distribution, and there are corresponding ways of estimating it using a sample from a population. Different measures of kurtosis may have different interpretations.

<span class="mw-page-title-main">Skewness</span> Measure of the asymmetry of random variables

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

Student's t-test is a statistical test used to test whether the difference between the response of two groups is statistically significant or not. It is any statistical hypothesis test in which the test statistic follows a Student's t-distribution under the null hypothesis. It is most commonly applied when the test statistic would follow a normal distribution if the value of a scaling term in the test statistic were known. When the scaling term is estimated based on the data, the test statistic—under certain conditions—follows a Student's t distribution. The t-test's most common application is to test whether the means of two populations are significantly different. In many cases, a Z-test will yield very similar results to a t-test since the latter converges to the former as the size of the dataset increases.

<span class="mw-page-title-main">Normal probability plot</span> Graphical technique in statistics

The normal probability plot is a graphical technique to identify substantive departures from normality. This includes identifying outliers, skewness, kurtosis, a need for transformations, and mixtures. Normal probability plots are made of raw data, residuals from model fits, and estimated parameters.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In statistics, the Lilliefors test is a normality test based on the Kolmogorov–Smirnov test. It is used to test the null hypothesis that data come from a normally distributed population, when the null hypothesis does not specify which normal distribution; i.e., it does not specify the expected value and variance of the distribution. It is named after Hubert Lilliefors, professor of statistics at George Washington University.

The Shapiro–Wilk test is a test of normality. It was published in 1965 by Samuel Sanford Shapiro and Martin Wilk.

Robust statistics are statistics which maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality. K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.

<span class="mw-page-title-main">Q–Q plot</span> Plot of the empirical distribution of p-values against the theoretical one

In statistics, a Q–Q plot (quantile–quantile plot) is a probability plot, a graphical method for comparing two probability distributions by plotting their quantiles against each other. A point (x, y) on the plot corresponds to one of the quantiles of the second distribution (y-coordinate) plotted against the same quantile of the first distribution (x-coordinate). This defines a parametric curve where the parameter is the index of the quantile interval.

<span class="mw-page-title-main">68–95–99.7 rule</span> Shorthand used in statistics

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

<span class="mw-page-title-main">Data transformation (statistics)</span>

In statistics, data transformation is the application of a deterministic mathematical function to each point in a data set—that is, each data point zi is replaced with the transformed value yi = f(zi), where f is a function. Transforms are usually applied so that the data appear to more closely meet the assumptions of a statistical inference procedure that is to be applied, or to improve the interpretability or appearance of graphs.

In statistics, D'Agostino's K2 test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to gauge the compatibility of given data with the null hypothesis that the data is a realization of independent, identically distributed Gaussian random variables. The test is based on transformations of the sample kurtosis and skewness, and has power only against the alternatives that the distribution is skewed and/or kurtic.

Named after the Dutch mathematician Bartel Leendert van der Waerden, the Van der Waerden test is a statistical test that k population distribution functions are equal. The Van der Waerden test converts the ranks from a standard Kruskal-Wallis test to quantiles of the standard normal distribution. These are called normal scores and the test is computed from these normal scores.

In statistics, the Jarque–Bera test is a goodness-of-fit test of whether sample data have the skewness and kurtosis matching a normal distribution. The test is named after Carlos Jarque and Anil K. Bera. The test statistic is always nonnegative. If it is far from zero, it signals the data do not have a normal distribution.

Anil K. Bera is an Indian-American econometrician. He is Professor of Economics at University of Illinois at Urbana–Champaign's Department of Economics. He is most noted for his work with Carlos Jarque on the Jarque–Bera test.

The Shapiro–Francia test is a statistical test for the normality of a population, based on sample data. It was introduced by S. S. Shapiro and R. S. Francia in 1972 as a simplification of the Shapiro–Wilk test.