# V-statistic

Last updated

V-statistics are a class of statistics named for Richard von Mises who developed their asymptotic distribution theory in a fundamental paper in 1947. [1] V-statistics are closely related to U-statistics [2] [3] (U for "unbiased") introduced by Wassily Hoeffding in 1948. [4] A V-statistic is a statistical function (of a sample) defined by a particular statistical functional of a probability distribution.

## Statistical functions

Statistics that can be represented as functionals ${\displaystyle T(F_{n})}$ of the empirical distribution function ${\displaystyle (F_{n})}$ are called statistical functionals. [5] Differentiability of the functional T plays a key role in the von Mises approach; thus von Mises considers differentiable statistical functionals. [1]

### Examples of statistical functions

1. The k-th central moment is the functional${\displaystyle T(F)=\int (x-\mu )^{k}\,dF(x)}$, where ${\displaystyle \mu =E[X]}$ is the expected value of X. The associated statistical function is the sample k-th central moment,
${\displaystyle T_{n}=m_{k}=T(F_{n})={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\overline {x}})^{k}.}$
2. The chi-squared goodness-of-fit statistic is a statistical function T(Fn), corresponding to the statistical functional
${\displaystyle T(F)=\sum _{i=1}^{k}{\frac {(\int _{A_{i}}\,dF-p_{i})^{2}}{p_{i}}},}$

where Ai are the k cells and pi are the specified probabilities of the cells under the null hypothesis.

3. The Cramér–von-Mises and Anderson–Darling goodness-of-fit statistics are based on the functional
${\displaystyle T(F)=\int (F(x)-F_{0}(x))^{2}\,w(x;F_{0})\,dF_{0}(x),}$

where w(x; F0) is a specified weight function and F0 is a specified null distribution. If w is the identity function then T(Fn) is the well known Cramér–von-Mises goodness-of-fit statistic; if ${\displaystyle w(x;F_{0})=[F_{0}(x)(1-F_{0}(x))]^{-1}}$ then T(Fn) is the Anderson–Darling statistic.

### Representation as a V-statistic

Suppose x1, ..., xn is a sample. In typical applications the statistical function has a representation as the V-statistic

${\displaystyle V_{mn}={\frac {1}{n^{m}}}\sum _{i_{1}=1}^{n}\cdots \sum _{i_{m}=1}^{n}h(x_{i_{1}},x_{i_{2}},\dots ,x_{i_{m}}),}$

where h is a symmetric kernel function. Serfling [6] discusses how to find the kernel in practice. Vmn is called a V-statistic of degree m.

A symmetric kernel of degree 2 is a function h(x, y), such that h(x, y) = h(y, x) for all x and y in the domain of h. For samples x1, ..., xn, the corresponding V-statistic is defined

${\displaystyle V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}h(x_{i},x_{j}).}$

### Example of a V-statistic

1. An example of a degree-2 V-statistic is the second central moment m2. If h(x, y) = (xy)2/2, the corresponding V-statistic is
${\displaystyle V_{2,n}={\frac {1}{n^{2}}}\sum _{i=1}^{n}\sum _{j=1}^{n}{\frac {1}{2}}(x_{i}-x_{j})^{2}={\frac {1}{n}}\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2},}$

which is the maximum likelihood estimator of variance. With the same kernel, the corresponding U-statistic is the (unbiased) sample variance:

${\displaystyle s^{2}={n \choose 2}^{-1}\sum _{i.

## Asymptotic distribution

In examples 1–3, the asymptotic distribution of the statistic is different: in (1) it is normal, in (2) it is chi-squared, and in (3) it is a weighted sum of chi-squared variables.

Von Mises' approach is a unifying theory that covers all of the cases above. [1] Informally, the type of asymptotic distribution of a statistical function depends on the order of "degeneracy," which is determined by which term is the first non-vanishing term in the Taylor expansion of the functional T. In case it is the linear term, the limit distribution is normal; otherwise higher order types of distributions arise (under suitable conditions such that a central limit theorem holds).

There are a hierarchy of cases parallel to asymptotic theory of U-statistics. [7] Let A(m) be the property defined by:

A(m):
1. Var(h(X1, ..., Xk)) = 0 for k < m, and Var(h(X1, ..., Xk)) > 0 for k = m;
2. nm/2Rmn tends to zero (in probability). (Rmn is the remainder term in the Taylor series for T.)

Case m = 1 (Non-degenerate kernel):

If A(1) is true, the statistic is a sample mean and the Central Limit Theorem implies that T(Fn) is asymptotically normal.

In the variance example (4), m2 is asymptotically normal with mean ${\displaystyle \sigma ^{2}}$ and variance ${\displaystyle (\mu _{4}-\sigma ^{4})/n}$, where ${\displaystyle \mu _{4}=E(X-E(X))^{4}}$.

Case m = 2 (Degenerate kernel):

Suppose A(2) is true, and ${\displaystyle E[h^{2}(X_{1},X_{2})]<\infty ,\,E|h(X_{1},X_{1})|<\infty ,}$ and ${\displaystyle E[h(x,X_{1})]\equiv 0}$. Then nV2,n converges in distribution to a weighted sum of independent chi-squared variables:

${\displaystyle nV_{2,n}{\stackrel {d}{\longrightarrow }}\sum _{k=1}^{\infty }\lambda _{k}Z_{k}^{2},}$

where ${\displaystyle Z_{k}}$ are independent standard normal variables and ${\displaystyle \lambda _{k}}$ are constants that depend on the distribution F and the functional T. In this case the asymptotic distribution is called a quadratic form of centered Gaussian random variables. The statistic V2,n is called a degenerate kernel V-statistic. The V-statistic associated with the Cramer–von Mises functional [1] (Example 3) is an example of a degenerate kernel V-statistic. [8]

## Notes

1. von Mises (1947), p. 309; Serfling (1980), p. 210.
2. Serfling (1980, Section 6.5)
3. Serfling (1980, Ch. 5–6); Lee (1990, Ch. 3)
4. See Lee (1990, p. 160) for the kernel function.

## Related Research Articles

In statistics, the Kolmogorov–Smirnov test is a nonparametric test of the equality of continuous, one-dimensional probability distributions that can be used to compare a sample with a reference probability distribution, or to compare two samples. It is named after Andrey Kolmogorov and Nikolai Smirnov.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920, thereby serving as a bridge between classical and modern probability theory.

In statistics, the likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, carpooling, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, the kth order statistic of a statistical sample is equal to its kth-smallest value. Together with rank statistics, order statistics are among the most fundamental tools in non-parametric statistics and inference.

In mathematical analysis, asymptotic analysis, also known as asymptotics, is a method of describing limiting behavior.

In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Roughly speaking, this means that if two functions and in the RKHS are close in norm, i.e., is small, then and are also pointwise close, i.e., is small for all . The reverse does not need to be true.

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds.

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises–Fisher distribution on the N-dimensional sphere.

In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. In some fields such as signal processing and econometrics it is also termed the Parzen–Rosenblatt window method, after Emanuel Parzen and Murray Rosenblatt, who are usually credited with independently creating it in its current form. One of the famous applications of kernel density estimation is in estimating the class-conditional marginal densities of data when using a naive Bayes classifier, which can improve its prediction accuracy.

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

The Anderson–Darling test is a statistical test of whether a given sample of data is drawn from a given probability distribution. In its basic form, the test assumes that there are no parameters to be estimated in the distribution being tested, in which case the test and its set of critical values is distribution-free. However, the test is most often used in contexts where a family of distributions is being tested, in which case the parameters of that family need to be estimated and account must be taken of this in adjusting either the test-statistic or its critical values. When applied to testing whether a normal distribution adequately describes a set of data, it is one of the most powerful statistical tools for detecting most departures from normality. K-sample Anderson–Darling tests are available for testing whether several collections of observations can be modelled as coming from a single population, where the distribution function does not have to be specified.

In statistics the Cramér–von Mises criterion is a criterion used for judging the goodness of fit of a cumulative distribution function compared to a given empirical distribution function , or for comparing two empirical distributions. It is also used as a part of other algorithms, such as minimum distance estimation. It is defined as

In probability theory, heavy-tailed distributions are probability distributions whose tails are not exponentially bounded: that is, they have heavier tails than the exponential distribution. In many applications it is the right tail of the distribution that is of interest, but a distribution may have a heavy left tail, or both tails may be heavy.

In statistics, Kernel regression is a non-parametric technique to estimate the conditional expectation of a random variable. The objective is to find a non-linear relation between a pair of random variables X and Y.

In statistical theory, a U-statistic is a class of statistics that is especially important in estimation theory; the letter "U" stands for unbiased. In elementary statistics, U-statistics arise naturally in producing minimum-variance unbiased estimators.

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

In statistics, the Fisher–Tippett–Gnedenko theorem is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of 3 possible distributions, the Gumbel distribution, the Fréchet distribution, or the Weibull distribution. Credit for the extreme value theorem and its convergence details are given to Fréchet (1927), Ronald Fisher and Leonard Henry Caleb Tippett (1928), Mises (1936) and Gnedenko (1943).

Kernel density estimation is a nonparametric technique for density estimation i.e., estimation of probability density functions, which is one of the fundamental questions in statistics. It can be viewed as a generalisation of histogram density estimation with improved statistical properties. Apart from histograms, other types of density estimators include parametric, spline, wavelet and Fourier series. Kernel density estimators were first introduced in the scientific literature for univariate data in the 1950s and 1960s and subsequently have been widely adopted. It was soon recognised that analogous estimators for multivariate data would be an important addition to multivariate statistics. Based on research carried out in the 1990s and 2000s, multivariate kernel density estimation has reached a level of maturity comparable to its univariate counterparts.

## References

• Hoeffding, W. (1948). "A class of statistics with asymptotically normal distribution". Annals of Mathematical Statistics. 19 (3): 293–325. doi:. JSTOR   2235637.
• Koroljuk, V.S.; Borovskich, Yu.V. (1994). Theory of U-statistics (English translation by P.V.Malyshev and D.V.Malyshev from the 1989 Ukrainian ed.). Dordrecht: Kluwer Academic Publishers. ISBN   0-7923-2608-3.
• Lee, A.J. (1990). U-Statistics: theory and practice. New York: Marcel Dekker, Inc. ISBN   0-8247-8253-4.
• Neuhaus, G. (1977). "Functional limit theorems for U-statistics in the degenerate case". Journal of Multivariate Analysis. 7 (3): 424–439. doi:.
• Rosenblatt, M. (1952). "Limit theorems associated with variants of the von Mises statistic". Annals of Mathematical Statistics. 23 (4): 617–623. doi:. JSTOR   2236587.
• Serfling, R.J. (1980). Approximation theorems of mathematical statistics. New York: John Wiley & Sons. ISBN   0-471-02403-1.
• Taylor, R.L.; Daffer, P.Z.; Patterson, R.F. (1985). Limit theorems for sums of exchangeable random variables. New Jersey: Rowman and Allanheld.
• von Mises, R. (1947). "On the asymptotic distribution of differentiable statistical functions". Annals of Mathematical Statistics. 18 (2): 309–348. doi:. JSTOR   2235734.