Delta method

Last updated

In statistics, the delta method is a method of deriving the asymptotic distribution of a random variable. It is applicable when the random variable being considered can be defined as a differentiable function of a random variable which is asymptotically Gaussian.

Contents

History

The delta method was derived from propagation of error, and the idea behind was known in the early 20th century. [1] Its statistical application can be traced as far back as 1928 by T. L. Kelley. [2] A formal description of the method was presented by J. L. Doob in 1935. [3] Robert Dorfman also described a version of it in 1938. [4]

Univariate delta method

While the delta method generalizes easily to a multivariate setting, careful motivation of the technique is more easily demonstrated in univariate terms. Roughly, if there is a sequence of random variables Xn satisfying

where θ and σ2 are finite valued constants and denotes convergence in distribution, then

for any function g satisfying the property that its first derivative, evaluated at , exists and is non-zero valued.

The intuition of the delta method is that any such g function, in a "small enough" range of the function, can be approximated via a first order Taylor series (which is basically a linear function). If the random variable is roughly normal then a linear transformation of it is also normal. Small range can be achieved when approximating the function around the mean, when the variance is "small enough". When g is applied to a random variable such as the mean, the delta method would tend to work better as the sample size increases, since it would help reduce the variance, and thus the taylor approximation would be applied to a smaller range of the function g at the point of interest.

Proof in the univariate case

Demonstration of this result is fairly straightforward under the assumption that is differentiable near the neighborhood of and is continuous at with . To begin, we use the mean value theorem (i.e.: the first order approximation of a Taylor series using Taylor's theorem):

where lies between Xn and θ. Note that since and , it must be that and since g′(θ) is continuous, applying the continuous mapping theorem yields

where denotes convergence in probability.

Rearranging the terms and multiplying by gives

Since

by assumption, it follows immediately from appeal to Slutsky's theorem that

This concludes the proof.

Proof with an explicit order of approximation

Alternatively, one can add one more step at the end, to obtain the order of approximation:

This suggests that the error in the approximation converges to 0 in probability.

Multivariate delta method

By definition, a consistent estimator B converges in probability to its true value β, and often a central limit theorem can be applied to obtain asymptotic normality:

where n is the number of observations and Σ is a (symmetric positive semi-definite) covariance matrix. Suppose we want to estimate the variance of a scalar-valued function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as

which implies the variance of h(B) is approximately

One can use the mean value theorem (for real-valued functions of many variables) to see that this does not rely on taking first order approximation.

The delta method therefore implies that

or in univariate terms,

Example: the binomial proportion

Suppose Xn is binomial with parameters and n. Since

we can apply the Delta method with g(θ) = log(θ) to see

Hence, even though for any finite n, the variance of does not actually exist (since Xn can be zero), the asymptotic variance of does exist and is equal to

Note that since p>0, as , so with probability converging to one, is finite for large n.

Moreover, if and are estimates of different group rates from independent samples of sizes n and m respectively, then the logarithm of the estimated relative risk has asymptotic variance equal to

This is useful to construct a hypothesis test or to make a confidence interval for the relative risk.

Alternative form

The delta method is often used in a form that is essentially identical to that above, but without the assumption that Xn or B is asymptotically normal. Often the only context is that the variance is "small". The results then just give approximations to the means and covariances of the transformed quantities. For example, the formulae presented in Klein (1953, p. 258) are: [5]

where hr is the rth element of h(B) and Bi is the ith element of B.

Second-order delta method

When g′(θ) = 0 the delta method cannot be applied. However, if g′′(θ) exists and is not zero, the second-order delta method can be applied. By the Taylor expansion, , so that the variance of relies on up to the 4th moment of .

The second-order delta method is also useful in conducting a more accurate approximation of 's distribution when sample size is small. . For example, when follows the standard normal distribution, can be approximated as the weighted sum of a standard normal and a chi-square with degree-of-freedom of 1.

Nonparametric delta method

A version of the delta method exists in nonparametric statistics. Let be an independent and identically distributed random variable with a sample of size with an empirical distribution function , and let be a functional. If is Hadamard differentiable with respect to the Chebyshev metric, then

where and , with denoting the empirical influence function for . A nonparametric pointwise asymptotic confidence interval for is therefore given by

where denotes the -quantile of the standard normal. See Wasserman (2006) p. 19f. for details and examples.

See also

Related Research Articles

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space under the operation of composition.

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter k and a scale parameter θ
  2. With a shape parameter and a rate parameter

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form and with parametric extension for arbitrary real constants a, b and non-zero c. It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".

<span class="mw-page-title-main">Cramér–Rao bound</span> Lower bound on variance of an estimator

In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and Calyampudi Radhakrishna Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It is also known as Fréchet-Cramér–Rao or Fréchet-Darmois-Cramér-Rao lower bound. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

<span class="mw-page-title-main">Consistent estimator</span> Statistical estimator converging in probability to a true parameter as sample size increases

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

<span class="mw-page-title-main">Ordinary least squares</span> Method for estimating the unknown parameters in a linear regression model

In statistics, ordinary least squares (OLS) is a type of linear least squares method for choosing the unknown parameters in a linear regression model by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the input dataset and the output of the (linear) function of the independent variable. Some sources consider OLS to be linear regression.

<span class="mw-page-title-main">Regression dilution</span> Statistical bias in linear regressions

Regression dilution, also known as regression attenuation, is the biasing of the linear regression slope towards zero, caused by errors in the independent variable.

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

<span class="mw-page-title-main">Ornstein–Uhlenbeck process</span> Stochastic process modeling random walk with friction

In mathematics, the Ornstein–Uhlenbeck process is a stochastic process with applications in financial mathematics and the physical sciences. Its original application in physics was as a model for the velocity of a massive Brownian particle under the influence of friction. It is named after Leonard Ornstein and George Eugene Uhlenbeck.

<span class="mw-page-title-main">Simple linear regression</span> Linear regression model with a single explanatory variable

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

Experimental uncertainty analysis is a technique that analyses a derived quantity, based on the uncertainties in the experimentally measured quantities that are used in some form of mathematical relationship ("model") to calculate that derived quantity. The model used to convert the measurements into the derived quantity is usually based on fundamental principles of a science or engineering discipline.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.

In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

A Stein discrepancy is a statistical divergence between two probability measures that is rooted in Stein's method. It was first formulated as a tool to assess the quality of Markov chain Monte Carlo samplers, but has since been used in diverse settings in statistics, machine learning and computer science.

References

  1. Portnoy, Stephen (2013). "Letter to the Editor". The American Statistician . 67 (3): 190. doi:10.1080/00031305.2013.820668. S2CID   219596186.
  2. Kelley, Truman L. (1928). Crossroads in the Mind of Man: A Study of Differentiable Mental Abilities. pp. 49–50. ISBN   978-1-4338-0048-1.
  3. Doob, J. L. (1935). "The Limiting Distributions of Certain Statistics". Annals of Mathematical Statistics . 6 (3): 160–169. doi: 10.1214/aoms/1177732594 . JSTOR   2957546.
  4. Ver Hoef, J. M. (2012). "Who invented the delta method?". The American Statistician . 66 (2): 124–127. doi:10.1080/00031305.2012.687494. JSTOR   23339471.
  5. Klein, L. R. (1953). A Textbook of Econometrics. p. 258.

Further reading