Ancillary statistic

Last updated November 06, 2024

In statistics, ancillarity is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. An ancillary statistic has the same distribution regardless of the value of the parameters and thus provides no information about them.^[1]^[2]^[3] It is opposed to the concept of a complete statistic which contains no ancillary information. It is closely related to the concept of a sufficient statistic which contains all of the information that the dataset provides about the parameters.

Examples

Suppose X₁, ..., X_n are independent and identically distributed, and are normally distributed with unknown expected value μ and known variance 1. Let

{\overline {X}}_{n}={\frac {X_{1}+\,\cdots \,+X_{n}}{n}}

be the sample mean.

The following statistical measures of dispersion of the sample

Range: max(X₁, ..., X_n) − min(X₁, ..., X_n)
Interquartile range: Q₃−Q₁
Sample variance:

{\hat {\sigma }}^{2}:=\,{\frac {\sum \left(X_{i}-{\overline {X}}\right)^{2}}{n}}

are all ancillary statistics, because their sampling distributions do not change as μ changes. Computationally, this is because in the formulas, the μ terms cancel – adding a constant number to a distribution (and all samples) changes its sample maximum and minimum by the same amount, so it does not change their difference, and likewise for others: these measures of dispersion do not depend on location.

Conversely, given i.i.d. normal variables with known mean 1 and unknown variance σ², the sample mean ${\overline {X}}$ is not an ancillary statistic of the variance, as the sampling distribution of the sample mean is N(1, σ²/n), which does depend on σ² – this measure of location (specifically, its standard error) depends on dispersion.^[8]

In location-scale families

In a location family of distributions, $(X_{1}-X_{n},X_{2}-X_{n},\dots ,X_{n-1}-X_{n})$ is an ancillary statistic.

In a scale family of distributions, $\left({\frac {X_{1}}{X_{n}}},{\frac {X_{2}}{X_{n}}},\dots ,{\frac {X_{n-1}}{X_{n}}}\right)$ is an ancillary statistic.

In a location-scale family of distributions, $({\frac {X_{1}-X_{n}}{S}},{\frac {X_{2}-X_{n}}{S}},\dots ,{\frac {X_{n-1}-X_{n}}{S}})$ , where $S^{2}$ is the sample variance, is an ancillary statistic.^[3]^[9]

In recovery of information

It turns out that, if $T_{1}$ is a non-sufficient statistic and $T_{2}$ is ancillary, one can sometimes recover all the information about the unknown parameter contained in the entire data by reporting $T_{1}$ while conditioning on the observed value of $T_{2}$ . This is known as conditional inference.^[3]

For example, suppose that $X_{1},X_{2}$ follow the $N(\theta ,1)$ distribution where $\theta$ is unknown. Note that, even though $X_{1}$ is not sufficient for $\theta$ (since its Fisher information is 1, whereas the Fisher information of the complete statistic ${\overline {X}}$ is 2), by additionally reporting the ancillary statistic $X_{1}-X_{2}$ , one obtains a joint distribution with Fisher information 2.^[3]

Ancillary complement

Given a statistic T that is not sufficient, an ancillary complement is a statistic U that is ancillary and such that (T, U) is sufficient.^[2] Intuitively, an ancillary complement "adds the missing information" (without duplicating any).

The statistic is particularly useful if one takes T to be a maximum likelihood estimator, which in general will not be sufficient; then one can ask for an ancillary complement. In this case, Fisher argues that one must condition on an ancillary complement to determine information content: one should consider the Fisher information content of T to not be the marginal of T, but the conditional distribution of T, given U: how much information does Tadd? This is not possible in general, as no ancillary complement need exist, and if one exists, it need not be unique, nor does a maximum ancillary complement exist.

Example

In baseball, suppose a scout observes a batter in N at-bats. Suppose (unrealistically) that the number N is chosen by some random process that is independent of the batter's ability – say a coin is tossed after each at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number N of at-bats and the number X of hits: the data (X, N) are a sufficient statistic. The observed batting average X/N fails to convey all of the information available in the data because it fails to report the number N of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere near as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number N of at-bats is an ancillary statistic because

It is a part of the observable data (it is a statistic), and
Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an ancillary complement to the observed batting average X/N, i.e., the batting average X/N is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with N, it becomes sufficient.

Notes

↑ Lehmann, E. L.; Scholz, F. W. (1992). "Ancillarity" (PDF). Lecture Notes-Monograph Series. Institute of Mathematical Statistics Lecture Notes - Monograph Series. 17: 32–51. doi:10.1214/lnms/1215458837. ISBN 0-940600-24-2. ISSN 0749-2170. JSTOR 4355624.
1 2 Ghosh, M.; Reid, N.; Fraser, D. A. S. (2010). "Ancillary statistics: A review". Statistica Sinica. 20 (4): 1309–1332. ISSN 1017-0405. JSTOR 24309506.
1 2 3 4 Mukhopadhyay, Nitis (2000). Probability and Statistical Inference. United States of America: Marcel Dekker, Inc. pp. 309–318. ISBN 0-8247-0379-0.
↑ Dawid, Philip (2011), DasGupta, Anirban (ed.), "Basu on Ancillarity", Selected Works of Debabrata Basu, New York, NY: Springer, pp. 5–8, doi: 10.1007/978-1-4419-5825-9_2 , ISBN 978-1-4419-5825-9
↑ Fisher, R. A. (1925). "Theory of Statistical Estimation". Mathematical Proceedings of the Cambridge Philosophical Society. 22 (5): 700–725. Bibcode:1925PCPS...22..700F. doi:10.1017/S0305004100009580. hdl: 2440/15186 . ISSN 0305-0041.
↑ Basu, D. (1964). "Recovery of Ancillary Information". Sankhyā: The Indian Journal of Statistics, Series A (1961-2002). 26 (1): 3–16. ISSN 0581-572X. JSTOR 25049300.
↑ Stigler, Stephen M. (2001), Ancillary history, Institute of Mathematical Statistics Lecture Notes - Monograph Series, Beachwood, OH: Institute of Mathematical Statistics, pp. 555–567, doi: 10.1214/lnms/1215090089 , ISBN 978-0-940600-50-8 , retrieved 2023-04-24
↑ Buehler, Robert J. (1982). "Some Ancillary Statistics and Their Properties". Journal of the American Statistical Association. 77 (379): 581–589. doi:10.1080/01621459.1982.10477850. hdl: 11299/199392 . ISSN 0162-1459.
↑ "Ancillary statistics" (PDF).

Related Research Articles

In probability theory and statistics, Student's $t$ distribution $is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.$

In probability theory and statistics, the chi-squared distribution with $degrees of freedom is the distribution of a sum of the squares of independent standard normal random variables.$

In statistics, sufficiency is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. A sufficient statistic contains all of the information that the dataset provides about the model parameters. It is closely related to the concepts of an ancillary statistic which contains no information about the model parameters, and of a complete statistic which only contains information about the parameters and no ancillary information.

In probability theory, the law of large numbers (LLN) is a mathematical law that states that the average of the results obtained from a large number of independent random samples converges to the true value, if it exists. More formally, the LLN states that given a sample of independent and identically distributed values, the sample mean converges to the true mean.

In statistics, completeness is a property of a statistic computed on a sample dataset in relation to a parametric model of the dataset. It is opposed to the concept of an ancillary statistic. While an ancillary statistic contains no information about the model parameters, a complete statistic contains only information about the parameters, and no ancillary information. It is closely related to the concept of a sufficient statistic which contains all of the information that the dataset provides about the parameters.

A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-test tests the mean of a distribution. For each significance level in the confidence interval, the Z-test has a single critical value which makes it more convenient than the Student's t-test whose critical values are defined by the sample size. Both the Z-test and Student's t-test have similarities in that they both help determine the significance of a set of data. However, the z-test is rarely used in practice because the population deviation is difficult to determine.

In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value". The error of an observation is the deviation of the observed value from the true value of a quantity of interest. The residual is the difference between the observed value and the estimated value of the quantity of interest. The distinction is most important in regression analysis, where the concepts are sometimes called the regression errors and regression residuals and where they lead to the concept of studentized residuals. In econometrics, "errors" are also called disturbances.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In statistics, the correlation ratio is a measure of the curvilinear relationship between the statistical dispersion within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviations representing these types of variation. The context here is the same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.

In statistics, sometimes the covariance matrix of a multivariate random variable is not known but has to be estimated. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of a sample from the multivariate distribution. Simple cases, where observations are complete, can be dealt with by using the sample covariance matrix. The sample covariance matrix (SCM) is an unbiased and efficient estimator of the covariance matrix if the space of covariance matrices is viewed as an extrinsic convex cone in R^p×p; however, measured using the intrinsic geometry of positive-definite matrices, the SCM is a biased and inefficient estimator. In addition, if the random variable has a normal distribution, the sample covariance matrix has a Wishart distribution and a slightly differently scaled version of it is the maximum likelihood estimate. Cases involving missing data, heteroscedasticity, or autocorrelated residuals require deeper considerations. Another issue is the robustness to outliers, to which sample covariance matrices are highly sensitive.

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot need not be a statistic — the function and its 'value' can depend on the parameters of the model, but its 'distribution' must not. If it is a statistic, then it is known as an 'ancillary statistic'.

In statistics, Basu's theorem states that any boundedly complete minimal sufficient statistic is independent of any ancillary statistic. This is a 1955 result of Debabrata Basu.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased.

Exact statistics, such as that described in exact test, is a branch of statistics that was developed to provide more accurate results pertaining to statistical testing and interval estimation by eliminating procedures based on asymptotic and approximate statistical methods. The main characteristic of exact methods is that statistical tests and confidence intervals are based on exact probability statements that are valid for any sample size. Exact statistical methods help avoid some of the unreasonable assumptions of traditional statistical methods, such as the assumption of equal variances in classical ANOVA. They also allow exact inference on variance components of mixed models.

In statistics, a generalized p-value is an extended version of the classical p-value, which except in a limited number of applications, provides only approximate solutions.

In statistics, an F-test of equality of variances is a test for the null hypothesis that two normal populations have the same variance. Notionally, any F-test can be regarded as a comparison of two variances, but the specific case being discussed in this article is that of two populations, where the test statistic used is the ratio of two sample variances. This particular situation is of importance in mathematical statistics since it provides a basic exemplar case in which the F-distribution can be derived. For application in applied statistics, there is concern that the test is so sensitive to the assumption of normality that it would be inadvisable to use it as a routine test for the equality of variances. In other words, this is a case where "approximate normality", is not good enough to make the test procedure approximately valid to an acceptable degree.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. V-statistics are closely related to U-statistics introduced by Wassily Hoeffding in 1948. A V-statistic is a statistical function defined by a particular statistical functional of a probability distribution.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Lehmann, E. L.; Scholz, F. W. (1992). "Ancillarity" (PDF). Lecture Notes-Monograph Series. Institute of Mathematical Statistics Lecture Notes - Monograph Series. 17: 32–51. doi:10.1214/lnms/1215458837. ISBN 0-940600-24-2. ISSN 0749-2170. JSTOR 4355624.

[fraser-2] 1 2 Ghosh, M.; Reid, N.; Fraser, D. A. S. (2010). "Ancillary statistics: A review". Statistica Sinica. 20 (4): 1309–1332. ISSN 1017-0405. JSTOR 24309506.

[:0-3] 1 2 3 4 Mukhopadhyay, Nitis (2000). Probability and Statistical Inference. United States of America: Marcel Dekker, Inc. pp. 309–318. ISBN 0-8247-0379-0.

[4] Dawid, Philip (2011), DasGupta, Anirban (ed.), "Basu on Ancillarity", Selected Works of Debabrata Basu, New York, NY: Springer, pp. 5–8, doi: 10.1007/978-1-4419-5825-9_2 , ISBN 978-1-4419-5825-9

[5] Fisher, R. A. (1925). "Theory of Statistical Estimation". Mathematical Proceedings of the Cambridge Philosophical Society. 22 (5): 700–725. Bibcode:1925PCPS...22..700F. doi:10.1017/S0305004100009580. hdl: 2440/15186 . ISSN 0305-0041.

[6] Basu, D. (1964). "Recovery of Ancillary Information". Sankhyā: The Indian Journal of Statistics, Series A (1961-2002). 26 (1): 3–16. ISSN 0581-572X. JSTOR 25049300.

[7] Stigler, Stephen M. (2001), Ancillary history, Institute of Mathematical Statistics Lecture Notes - Monograph Series, Beachwood, OH: Institute of Mathematical Statistics, pp. 555–567, doi: 10.1214/lnms/1215090089 , ISBN 978-0-940600-50-8 , retrieved 2023-04-24

[8] Buehler, Robert J. (1982). "Some Ancillary Statistics and Their Properties". Journal of the American Statistical Association. 77 (379): 581–589. doi:10.1080/01621459.1982.10477850. hdl: 11299/199392 . ISSN 0162-1459.

[9] "Ancillary statistics" (PDF).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]