Reduced chi-squared statistic

Last updated

In statistics, the reduced chi-square statistic is used extensively in goodness of fit testing. It is also known as mean squared weighted deviation (MSWD) in isotopic dating [1] and variance of unit weight in the context of weighted least squares. [2] [3]

Contents

Its square root is called regression standard error, [4] standard error of the regression, [5] [6] or standard error of the equation [7] (see Ordinary least squares § Reduced chi-squared)

Definition

It is defined as chi-square per degree of freedom: [8] [9] [10] [11] :85 [12] [13] [14] [15]

where the chi-squared is a weighted sum of squared deviations:

with inputs: variance , observations O, and calculated data C. [8] The degree of freedom, , equals the number of observations n minus the number of fitted parameters m.

In weighted least squares, the definition is often written in matrix notation as

where r is the vector of residuals, and W is the weight matrix, the inverse of the input (diagonal) covariance matrix of observations. If W is non-diagonal, then generalized least squares applies.

In ordinary least squares, the definition simplifies to:

where the numerator is the residual sum of squares (RSS).

When the fit is just an ordinary mean, then equals the sample standard deviation.

Discussion

As a general rule, when the variance of the measurement error is known a priori, a indicates a poor model fit. A indicates that the fit has not fully captured the data (or that the error variance has been underestimated). In principle, a value of around indicates that the extent of the match between observations and estimates is in accord with the error variance. A indicates that the model is "overfitting" the data: either the model is improperly fitting noise, or the error variance has been overestimated. [11] :89

When the variance of the measurement error is only partially known, the reduced chi-squared may serve as a correction estimated a posteriori.

Applications

Geochronology

In geochronology, the MSWD is a measure of goodness of fit that takes into account the relative importance of both the internal and external reproducibility, with most common usage in isotopic dating. [16] [17] [1] [18] [19] [20]

In general when:

MSWD = 1 if the age data fit a univariate normal distribution in t (for the arithmetic mean age) or log(t) (for the geometric mean age) space, or if the compositional data fit a bivariate normal distribution in [log(U/He),log(Th/He)]-space (for the central age).

MSWD < 1 if the observed scatter is less than that predicted by the analytical uncertainties. In this case, the data are said to be "underdispersed", indicating that the analytical uncertainties were overestimated.

MSWD > 1 if the observed scatter exceeds that predicted by the analytical uncertainties. In this case, the data are said to be "overdispersed". This situation is the rule rather than the exception in (U-Th)/He geochronology, indicating an incomplete understanding of the isotope system. Several reasons have been proposed to explain the overdispersion of (U-Th)/He data, including unevenly distributed U-Th distributions and radiation damage.

Often the geochronologist will determine a series of age measurements on a single sample, with the measured value having a weighting and an associated error for each age determination. As regards weighting, one can either weight all of the measured ages equally, or weight them by the proportion of the sample that they represent. For example, if two thirds of the sample was used for the first measurement and one third for the second and final measurement, then one might weight the first measurement twice that of the second.

The arithmetic mean of the age determinations is

but this value can be misleading, unless each determination of the age is of equal significance.

When each measured value can be assumed to have the same weighting, or significance, the biased and unbiased (or "sample" and "population" respectively) estimators of the variance are computed as follows:

The standard deviation is the square root of the variance.

When individual determinations of an age are not of equal significance, it is better to use a weighted mean to obtain an "average" age, as follows:

The biased weighted estimator of variance can be shown to be

which can be computed as

The unbiased weighted estimator of the sample variance can be computed as follows:

Again, the corresponding standard deviation is the square root of the variance.

The unbiased weighted estimator of the sample variance can also be computed on the fly as follows:

The unweighted mean square of the weighted deviations (unweighted MSWD) can then be computed, as follows:

By analogy, the weighted mean square of the weighted deviations (weighted MSWD) can be computed as follows:

Rasch analysis

In data analysis based on the Rasch model, the reduced chi-squared statistic is called the outfit mean-square statistic, and the information-weighted reduced chi-squared statistic is called the infit mean-square statistic. [21]

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

<span class="mw-page-title-main">Variance</span> Statistical measure of how far values spread from their average

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by , , , , or .

The weighted arithmetic mean is similar to an ordinary arithmetic mean, except that instead of each of the data points contributing equally to the final average, some data points contribute more than others. The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

<span class="mw-page-title-main">Allan variance</span> Measure of frequency stability in clocks and oscillators

The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as . The Allan deviation (ADEV), also known as sigma-tau, is the square root of the Allan variance, .

<span class="mw-page-title-main">Student's t-distribution</span> Probability distribution

In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.

<span class="mw-page-title-main">Chi-squared distribution</span> Probability distribution and special case of gamma distribution

In probability theory and statistics, the chi-squared distribution with degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables. The chi-squared distribution is a special case of the gamma distribution and is one of the most widely used probability distributions in inferential statistics, notably in hypothesis testing and in construction of confidence intervals. This distribution is sometimes called the central chi-squared distribution, a special case of the more general noncentral chi-squared distribution.

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value". The error of an observation is the deviation of the observed value from the true value of a quantity of interest. The residual is the difference between the observed value and the estimated value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals. In econometrics, "errors" are also called disturbances.

<span class="mw-page-title-main">Standard error</span> Statistical property

The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM). The standard error is a key ingredient in producing confidence intervals.

In statistics, Cochran's theorem, devised by William G. Cochran, is a theorem used to justify results relating to the probability distributions of statistics that are used in the analysis of variance.

In statistics, a studentized residual is the dimensionless ratio resulting from the division of a residual by an estimate of its standard deviation, both expressed in the same units. It is a form of a Student's t-statistic, with the estimate of error varying between points.

<span class="mw-page-title-main">Scaled inverse chi-squared distribution</span> Probability distribution

The scaled inverse chi-squared distribution is the distribution for x = 1/s2, where s2 is a sample mean of the squares of ν independent normal random variables that have mean 0 and inverse variance 1/σ2 = τ2. The distribution is therefore parametrised by the two quantities ν and τ2, referred to as the number of chi-squared degrees of freedom and the scaling parameter, respectively.

<span class="mw-page-title-main">Rice distribution</span> Probability distribution

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

<span class="mw-page-title-main">Chi distribution</span> Probability distribution

In probability theory and statistics, the chi distribution is a continuous probability distribution over the non-negative real line. It is the distribution of the positive square root of a sum of squared independent Gaussian random variables. Equivalently, it is the distribution of the Euclidean distance between a multivariate Gaussian random variable and the origin. It is thus related to the chi-squared distribution by describing the distribution of the positive square roots of a variable obeying a chi-squared distribution.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

In statistics, the inverse Wishart distribution, also called the inverted Wishart distribution, is a probability distribution defined on real-valued positive-definite matrices. In Bayesian statistics it is used as the conjugate prior for the covariance matrix of a multivariate normal distribution.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics and in particular statistical theory, unbiased estimation of a standard deviation is the calculation from a statistical sample of an estimated value of the standard deviation of a population of values, in such a way that the expected value of the calculation equals the true value. Except in some important situations, outlined later, the task has little relevance to applications of statistics since its need is avoided by standard procedures, such as the use of significance tests and confidence intervals, or by using Bayesian analysis.

In statistics, pooled variance is a method for estimating variance of several different populations when the mean of each population may be different, but one may assume that the variance of each population is the same. The numerical estimate resulting from the use of this method is also called the pooled variance.

In statistics, inverse-variance weighting is a method of aggregating two or more random variables to minimize the variance of the weighted average. Each random variable is weighted in inverse proportion to its variance, i.e., proportional to its precision.

References

  1. 1 2 Wendt, I., and Carl, C., 1991,The statistical distribution of the mean squared weighted deviation, Chemical Geology, 275–285.
  2. Strang, Gilbert; Borre, Kae (1997). Linear algebra, geodesy, and GPS. Wellesley-Cambridge Press. p. 301. ISBN   9780961408862.
  3. Koch, Karl-Rudolf (2013). Parameter Estimation and Hypothesis Testing in Linear Models. Springer Berlin Heidelberg. Section 3.2.5. ISBN   9783662039762.
  4. Julian Faraway (2000), Practical Regression and Anova using R
  5. Kenney, J.; Keeping, E. S. (1963). Mathematics of Statistics. van Nostrand. p. 187.
  6. Zwillinger, D. (1995). Standard Mathematical Tables and Formulae. Chapman&Hall/CRC. p. 626. ISBN   0-8493-2479-3.
  7. Hayashi, Fumio (2000). Econometrics. Princeton University Press. ISBN   0-691-01018-8.
  8. 1 2 Laub, Charlie; Kuhl, Tonya L. (n.d.), How Bad is Good? A Critical Look at the Fitting of Reflectivity Models using the Reduced Chi-Square Statistic (PDF), University California, Davis, archived from the original (PDF) on 6 October 2016, retrieved 30 May 2015
  9. Taylor, John Robert (1997), An introduction to error analysis, University Science Books, p. 268
  10. Kirkman, T. W. (n.d.), Chi-Square Curve Fitting , retrieved 30 May 2015
  11. 1 2 Bevington, Philip R. (1969), Data Reduction and Error Analysis for the Physical Sciences, New York: McGraw-Hill
  12. Measurements and Their Uncertainties: A Practical Guide to Modern Error Analysis, By Ifan Hughes, Thomas Hase
  13. Dealing with Uncertainties: A Guide to Error Analysis, By Manfred Drosg
  14. Practical Statistics for Astronomers, By J. V. Wall, C. R. Jenkins
  15. Computational Methods in Physics and Engineering, By Samuel Shaw Ming Wong
  16. Dickin, A. P. 1995. Radiogenic Isotope Geology. Cambridge University Press, Cambridge, UK, 1995, ISBN   0-521-43151-4, ISBN   0-521-59891-5
  17. McDougall, I. and Harrison, T. M. 1988. Geochronology and Thermochronology by the 40Ar/39Ar Method. Oxford University Press.
  18. Lance P. Black, Sandra L. Kamo, Charlotte M. Allen, John N. Aleinikoff, Donald W. Davis, Russell J. Korsch, Chris Foudoulis 2003. TEMORA 1: a new zircon standard for Phanerozoic U–Pb geochronology. Chemical Geology 200, 155–170.
  19. M. J. Streule, R. J. Phillips, M. P. Searle, D. J. Waters and M. S. A. Horstwood 2009. Evolution and chronology of the Pangong Metamorphic Complex adjacent to themodelling and U-Pb geochronology Karakoram Fault, Ladakh: constraints from thermobarometry, metamorphic modelling and U-Pb geochronology. Journal of the Geological Society 166, 919–932 doi : 10.1144/0016-76492008-117
  20. Roger Powell, Janet Hergt, Jon Woodhead 2002. Improving isochron calculations with robust statistics and the bootstrap. Chemical Geology 185, 191–204.
  21. Linacre, J. M. (2002). "What do Infit and Outfit, Mean-square and Standardized mean?". Rasch Measurement Transactions. 16 (2): 878.