Local asymptotic normality

Last updated

In statistics, local asymptotic normality is a property of a sequence of statistical models, which allows this sequence to be asymptotically approximated by a normal location model, after an appropriate rescaling of the parameter. An important example when the local asymptotic normality holds is in the case of i.i.d sampling from a regular parametric model.

Contents

The notion of local asymptotic normality was introduced by Le Cam (1960) and is fundamental in the treatment of estimator and test efficiency. [1]

Definition

A sequence of parametric statistical models {Pn,θ: θ ∈ Θ} is said to be locally asymptotically normal (LAN) at θ if there exist matrices rn and Iθ and a random vector Δn,θ ~ N(0, Iθ) such that, for every converging sequence hnh, [2]

where the derivative here is a Radon–Nikodym derivative, which is a formalised version of the likelihood ratio, and where o is a type of big O in probability notation. In other words, the local likelihood ratio must converge in distribution to a normal random variable whose mean is equal to minus one half the variance:

The sequences of distributions and are contiguous. [2]

Example

The most straightforward example of a LAN model is an iid model whose likelihood is twice continuously differentiable. Suppose {X1, X2, …, Xn} is an iid sample, where each Xi has density function f(x, θ). The likelihood function of the model is equal to

If f is twice continuously differentiable in θ, then

Plugging in , gives

By the central limit theorem, the first term (in parentheses) converges in distribution to a normal random variable Δθ ~ N(0, Iθ), whereas by the law of large numbers the expression in second parentheses converges in probability to Iθ, which is the Fisher information matrix:

Thus, the definition of the local asymptotic normality is satisfied, and we have confirmed that the parametric model with iid observations and twice continuously differentiable likelihood has the LAN property.

See also

Notes

  1. Vaart, A. W. van der (1998-10-13). Asymptotic Statistics. Cambridge University Press. ISBN   978-0-511-80225-6.
  2. 1 2 van der Vaart (1998 , pp. 103–104)

Related Research Articles

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In information geometry, the Fisher information metric is a particular Riemannian metric which can be defined on a smooth statistical manifold, i.e., a smooth manifold whose points are probability measures defined on a common probability space. It can be used to calculate the informational difference between measurements.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

In mathematical statistics, the Kullback–Leibler divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In calculus, the Leibniz integral rule for differentiation under the integral sign states that for an integral of the form

In statistics, the delta method is a method of deriving the asymptotic distribution a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics and econometrics, extremum estimators are a wide class of estimators for parametric models that are calculated through maximization of a certain objective function, which depends on the data. The general theory of extremum estimators was developed by Amemiya (1985).

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

In probability theory and statistics, empirical likelihood (EL) is a nonparametric method for estimating the parameters of statistical models. It requires fewer assumptions about the error distribution while retaining some of the merits in likelihood-based inference. The estimation method requires that the data are independent and identically distributed (iid). It performs well even when the distribution is asymmetric or censored. EL methods can also handle constraints and prior information on parameters. Art Owen pioneered work in this area with his 1988 paper.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

In econometrics, the information matrix test is used to determine whether a regression model is misspecified. The test was developed by Halbert White, who observed that in a correctly specified model and under standard regularity assumptions, the Fisher information matrix can be expressed in either of two ways: as the outer product of the gradient, or as a function of the Hessian matrix of the log-likelihood function.

Hypertabastic survival models were introduced in 2007 by Mohammad Tabatabai, Zoran Bursac, David Williams, and Karan Singh. This distribution can be used to analyze time-to-event data in biomedical and public health areas and normally called survival analysis. In engineering, the time-to event analysis is referred to as reliability theory and in business and economics it is called duration analysis. Other fields may use different names for the same analysis. These survival models are applicable in many fields such as biomedical, behavioral science, social science, statistics, medicine, bioinformatics, medicalinformatics, data science especially in machine learning, computational biology, business economics, engineering, and commercial entities. They not only look at the time to event, but whether or not the event occurred. These time-to-event models can be applied in a variety of applications for instance, time after diagnosis of cancer until death, comparison of individualized treatment with standard care in cancer research, time until an individual defaults on loans, relapsed time for drug and smoking cessation, time until property sold after being put on the market, time until an individual upgrades to a new phone, time until job relocation, time until bones receive microscopic fractures when undergoing different stress levels, time from marriage until divorce, time until infection due to catheter, and time from bridge completion until first repair.

References