# Asymptotic theory (statistics)

Last updated

In statistics: asymptotic theory, or large sample theory, is a framework for assessing properties of estimators and statistical tests. Within this framework, it is often assumed that the sample size n may grow indefinitely; the properties of estimators and tests are then evaluated under the limit of n → ∞. In practice, a limit evaluation is considered to be approximately valid for large finite sample sizes too. [1]

## Overview

Most statistical problems begin with a dataset of size n. The asymptotic theory proceeds by assuming that it is possible (in principle) to keep collecting additional data, thus that the sample size grows infinitely, i.e. n → ∞. Under the assumption, many results can be obtained that are unavailable for samples of finite size. An example is the weak law of large numbers. The law states that for a sequence of independent and identically distributed (IID) random variables X1, X2, ..., if one value is drawn from each random variable and the average of the first n values is computed as Xn, then the Xn converge in probability to the population mean E[Xi] as n → ∞. [2]

In asymptotic theory, the standard approach is n → ∞. For some statistical models, slightly different approaches of asymptotics may be used. For example, with panel data, it is commonly assumed that one dimension in the data remains fixed, whereas the other dimension grows: T = constant and N → ∞, or vice versa. [2]

Besides the standard approach to asymptotics, other alternative approaches exist:

• Within the local asymptotic normality framework, it is assumed that the value of the "true parameter" in the model varies slightly with n, such that the n-th model corresponds to θn = θ + h/n. This approach lets us study the regularity of estimators.
• When statistical tests are studied for their power to distinguish against the alternatives that are close to the null hypothesis, it is done within the so-called "local alternatives" framework: the null hypothesis is H0: θ = θ0 and the alternative is H1: θ = θ0 + h/n. This approach is especially popular for the unit root tests.
• There are models where the dimension of the parameter space Θn slowly expands with n, reflecting the fact that the more observations there are, the more structural effects can be feasibly incorporated in the model.
• In kernel density estimation and kernel regression, an additional parameter is assumed—the bandwidth h. In those models, it is typically taken that h → 0 as n → ∞. The rate of convergence must be chosen carefully, though, usually hn−1/5.

In many cases, highly accurate results for finite samples can be obtained via numerical methods (i.e. computers); even in such cases, though, asymptotic analysis can be useful. This point was made by Small (2010 , §1.4), as follows.

A primary goal of asymptotic analysis is to obtain a deeper qualitative understanding of quantitative tools. The conclusions of an asymptotic analysis often supplement the conclusions which can be obtained by numerical methods.

## Asymptotic properties

### Estimators

#### Consistency

A sequence of estimates is said to be consistent, if it converges in probability to the true value of the parameter being estimated:

${\displaystyle {\hat {\theta }}_{n}\ {\xrightarrow {\overset {}{p}}}\ \theta _{0}.}$

That is, roughly speaking with an infinite amount of data the estimator (the formula for generating the estimates) would almost surely give the correct result for the parameter being estimated. [2]

#### Asymptotic distribution

If it is possible to find sequences of non-random constants {an}, {bn} (possibly depending on the value of θ0), and a non-degenerate distribution G such that

${\displaystyle b_{n}({\hat {\theta }}_{n}-a_{n})\ {\xrightarrow {d}}\ G,}$

then the sequence of estimators ${\displaystyle \textstyle {\hat {\theta }}_{n}}$ is said to have the asymptotic distribution G.

Most often, the estimators encountered in practice are asymptotically normal, meaning their asymptotic distribution is the normal distribution, with an = θ0, bn = n, and G = N(0, V) :

${\displaystyle {\sqrt {n}}({\hat {\theta }}_{n}-\theta _{0})\ {\xrightarrow {d}}\ {\mathcal {N}}(0,V).}$

## Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

The likelihood function describes the joint probability of the observed data as a function of the parameters of the chosen statistical model. For each specific parameter value in the parameter space, the likelihood function therefore assigns a probabilistic prediction to the observed data . Since it is essentially the product of sampling densities, the likelihood generally encapsulates both the data-generating process as well as the missing-data mechanism that produced the observed sample.

Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory, the law of large numbers (LLN) is a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value and tends to become closer to the expected value as more trials are performed.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In statistics, the Wald test assesses constraints on statistical parameters based on the weighted distance between the unrestricted estimate and its hypothesized value under the null hypothesis, where the weight is the precision of the estimate. Intuitively, the larger this weighted distance, the less likely it is that the constraint is true. While the finite sample distributions of Wald tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis, a fact that can be used to determine statistical significance.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, an empirical distribution function is the distribution function associated with the empirical measure of a sample. This cumulative distribution function is a step function that jumps up by 1/n at each of the n data points. Its value at any specified value of the measured variable is the fraction of observations of the measured variable that are less than or equal to the specified value.

In statistics, the delta method is a result concerning the approximate probability distribution for a function of an asymptotically normal statistical estimator from knowledge of the limiting variance of that estimator.

Robust statistics is statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation. 48 samples of robust M-estimators can be founded in a recent review study.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

Bayesian econometrics is a branch of econometrics which applies Bayesian principles to economic modelling. Bayesianism is based on a degree-of-belief interpretation of probability, as opposed to a relative-frequency interpretation.

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

In statistics, the Hájek–Le Cam convolution theorem states that any regular estimator in a parametric model is asymptotically equivalent to a sum of two independent random variables, one of which is normal with asymptotic variance equal to the inverse of Fisher information, and the other having arbitrary distribution.

In the comparison of various statistical procedures, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator, experiment, or test needs fewer observations than a less efficient one to achieve a given performance. This article primarily deals with efficiency of estimators.

In Bayesian inference, the Bernstein-von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of infinite data to a multivariate normal distribution centered at the maximum likelihood estimator with covariance matrix given by , where is the true population parameter and is the Fisher information matrix at the true population parameter value.

## References

1. Höpfner, R. (2014), Asymptotic Statistics, Walter de Gruyter. 286 pag. ISBN   3110250241, ISBN   978-3110250244
2. A.DasGupta. Asymptotic Theory of Statistics and Probability (2008) 756 pag. ISBN   0387759700, ISBN   978-0387759708