Empirical likelihood

Last updated November 12, 2024

In probability theory and statistics, empirical likelihood (EL) is a nonparametric method for estimating the parameters of statistical models. It requires fewer assumptions about the error distribution while retaining some of the merits in likelihood-based inference. The estimation method requires that the data are independent and identically distributed (iid). It performs well even when the distribution is asymmetric or censored.^[1] EL methods can also handle constraints and prior information on parameters. Art Owen pioneered work in this area with his 1988 paper.^[2]

Definition

Given a set of $n$ i.i.d. realizations $y_{i}$ of random variables $Y_{i}$ , then the empirical distribution function is ${\hat {F}}(y):=\sum _{i=1}^{n}\pi _{i}I(Y_{i}<y)$ , with the indicator function $I$ and the (normalized) weights $\pi _{i}$ . Then, the empirical likelihood is:^[3]

L:=\prod _{i=1}^{n}{\frac {{\hat {F}}(y_{i})-{\hat {F}}(y_{i}-\delta y)}{\delta y}},

where $\delta y$ is a small number (potentially the difference to the next smaller sample).

Empirical likelihood estimation can be augmented with side information by using further constraints (similar to the generalized estimating equations approach) for the empirical distribution function. E.g. a constraint like the following can be incorporated using a Lagrange multiplier $E[h(Y;\theta )]=\int _{-\infty }^{\infty }h(y;\theta )dF=0$ which implies ${\hat {E}}[h(y;\theta )]=\sum _{i=1}^{n}h(y_{i};\theta )\pi _{i}=0$ .

With similar constraints, we could also model correlation.

Discrete random variables

The empirical-likelihood method can also be also employed for discrete distributions.^[4] Given $\ p_{i}:={\hat {F}}(y_{i})-{\hat {F}}(y_{i}-\delta y),\ i=1,...,n$ such that $p_{i}\geq 0{\text{ and }}\sum _{i=1}^{n}\ p_{i}=1.$

Then the empirical likelihood is again $L(p_{1},...,p_{n})=\prod _{i=1}^{n}\ p_{i}$ .

Using the Lagrangian multiplier method to maximize the logarithm of the empirical likelihood subject to the trivial normalization constraint, we find $p_{i}=1/n$ as a maximum. Therefore, ${\hat {F}}$ is the empirical distribution function.

Estimation Procedure

EL estimates are calculated by maximizing the empirical likelihood function (see above) subject to constraints based on the estimating function and the trivial assumption that the probability weights of the likelihood function sum to 1.^[5] This procedure is represented as:

\max _{\pi _{i},\theta }\ln(L)=\max _{\pi _{i},\theta }\sum _{i=1}^{n}\ln \pi _{i}

subject to the constraints

s.t.\sum _{i=1}^{n}\pi _{i}=1,\sum _{i=1}^{n}\pi _{i}h(y_{i};\theta )=0,\forall i\in [1..n]\quad 0\leq \pi _{i}.

^[6]^{: Equation (73)}

The value of the theta parameter can be found by solving the Lagrangian function

{\mathcal {L}}=\sum _{i=1}^{n}\ln \pi _{i}+\mu (1-\sum _{i=1}^{n}\pi _{i})-n\tau '\sum _{i=1}^{n}\pi _{i}h(y_{i};\theta ).

^[6]^{: Equation (74)}

There is a clear analogy between this maximization problem and the one solved for maximum entropy.

The parameters $\pi _{i}$ are nuisance parameters.

Empirical Likelihood Ratio (ELR)

An empirical likelihood ratio function is defined and used to obtain confidence intervals parameter of interest θ similar to parametric likelihood ratio confidence intervals.^[7]^[8] Let L(F) be the empirical likelihood of function $F$ , then the ELR would be:

$R(F)=L(F)/L(F_{n})$ .

Consider sets of the form

$C=\{T(F)|R(F)\geq r\}$ .

Under such conditions a test of $T(F)=t$ rejects when t does not belong to $C$ , that is, when no distribution F with $T(F)=t$ has likelihood $L(F)\geq rL(F_{n})$ .

The central result is for the mean of X. Clearly, some restrictions on $F$ are needed, or else $C=\mathbb {R} ^{p}$ whenever $r<1$ . To see this, let:

$F=\epsilon \delta _{x}+(1-\epsilon )F_{n}$

If $\epsilon$ is small enough and $\epsilon >0$ , then $R(F)\geq r$ .

But then, as $x$ ranges through $\mathbb {R} ^{p}$ , so does the mean of $F$ , tracing out $C=\mathbb {R} ^{p}$ . The problem can be solved by restricting to distributions F that are supported in a bounded set. It turns out to be possible to restrict attention t distributions with support in the sample, in other words, to distribution $F\ll F_{n}$ . Such method is convenient since the statistician might not be willing to specify a bounded support for $F$ , and since $t$ converts the construction of $C$ into a finite dimensional problem.

Other Applications

The use of empirical likelihood is not limited to confidence intervals. In efficient quantile regression, an EL-based categorization^[9] procedure helps determine the shape of the true discrete distribution at level p, and also provides a way of formulating a consistent estimator. In addition, EL can be used in place of parametric likelihood to form model selection criteria.^[10] Empirical likelihood can naturally be applied in survival analysis ^[11] or regression problems^[12]

Literature

Nordman, Daniel J., and Soumendra N. Lahiri. "A review of empirical likelihood methods for time series." Journal of Statistical Planning and Inference 155 (2014): 1-18. https://doi.org/10.1016/j.jspi.2013.10.001

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In mathematics and physical science, spherical harmonics are special functions defined on the surface of a sphere. They are often employed in solving partial differential equations in many scientific fields. The table of spherical harmonics contains a list of common spherical harmonics.

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $k$ and a scale parameter $θ$
With a shape parameter $and an inverse scale parameter ⁠ ⁠, called a rate parameter.$

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form $and with parametric extension for arbitrary real constants a, b and non-zero c . It is named after the mathematician Carl Friedrich Gauss. The graph of a Gaussian is a characteristic symmetric "bell curve" shape. The parameter a is the height of the curve's peak, b is the position of the center of the peak, and c controls the width of the "bell".$

In mathematical optimization and decision theory, a loss function or cost function is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost" associated with the event. An optimization problem seeks to minimize a loss function. An objective function is either a loss function or its opposite, in which case it is to be maximized. The loss function could include terms from several levels of the hierarchy.

In statistics, G-tests are likelihood-ratio or maximum likelihood statistical significance tests that are increasingly being used in situations where chi-squared tests were previously recommended.

The classical XY model is a lattice model of statistical mechanics. In general, the XY model can be seen as a specialization of Stanley's n-vector model for $n = 2$ .

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In mathematics, vector spherical harmonics (VSH) are an extension of the scalar spherical harmonics for use with vector fields. The components of the VSH are complex-valued functions expressed in the spherical coordinate basis vectors.

In statistical decision theory, where we are faced with the problem of estimating a deterministic parameter (vector) $from observations an estimator is called minimax if its maximal risk is minimal among all estimators of . In a sense this means that is an estimator which performs best in the worst possible case allowed in the problem.$

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In probability theory and directional statistics, a wrapped probability distribution is a continuous probability distribution that describes data points that lie on a unit n-sphere. In one dimension, a wrapped distribution consists of points on the unit circle. If $is a random variate in the interval with probability density function (PDF), then is a circular variable distributed according to the wrapped distribution and is an angular variable in the interval distributed according to the wrapped distribution .$

<span class="mw-page-title-main">Wrapped Cauchy distribution</span>

In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.

In probability theory and directional statistics, a circular uniform distribution is a probability distribution on the unit circle whose density is uniform for all angles.

In statistics, local asymptotic normality is a property of a sequence of statistical models, which allows this sequence to be asymptotically approximated by a normal location model, after an appropriate rescaling of the parameter. An important example when the local asymptotic normality holds is in the case of i.i.d sampling from a regular parametric model.

In optics, the Fraunhofer diffraction equation is used to model the diffraction of waves when the diffraction pattern is viewed at a long distance from the diffracting object, and also when it is viewed at the focal plane of an imaging lens.

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

References

↑ Owen, Art B. (2001). Empirical likelihood. Boca Raton, Fla. ISBN 978-1-4200-3615-2. OCLC 71012491.{{cite book}}: CS1 maint: location missing publisher (link)
↑ Owen, Art B. (1988). "Empirical likelihood ratio confidence intervals for a single functional". Biometrika. 75 (2): 237–249. doi:10.1093/biomet/75.2.237. ISSN 0006-3444.
↑ ${\frac {{\hat {F}}(y_{i})-{\hat {F}}(y_{i}-\delta y)}{\delta y}}$ is an estimate of the probability density, compare histogram
↑ Wang, Dong; Chen, Song Xi (2009-02-01). "Empirical likelihood for estimating equations with missing values". The Annals of Statistics. 37 (1). arXiv: 0903.0726 . doi:10.1214/07-aos585. ISSN 0090-5364. S2CID 5427751.
↑ Mittelhammer, Judge, and Miller (2000), 292.
1 2 Bera, Anil K.; Bilias, Yannis (2002). "The MM, ME, ML, EL, EF and GMM approaches to estimation: a synthesis". Journal of Econometrics. 107 (1–2): 51–86. doi:10.1016/S0304-4076(01)00113-0.
↑ Owen, Art (1990-03-01). "Empirical Likelihood Ratio Confidence Regions". The Annals of Statistics. 18 (1). doi: 10.1214/aos/1176347494 . ISSN 0090-5364.
↑ Dong, Lauren Bin; Giles, David E. A. (2007-01-30). "An Empirical Likelihood Ratio Test for Normality". Communications in Statistics - Simulation and Computation. 36 (1): 197–215. doi:10.1080/03610910601096544. ISSN 0361-0918. S2CID 16866055.
↑ Chen, Jien; Lazar, Nicole A. (2010-01-27). "Quantile estimation for discrete data via empirical likelihood". Journal of Nonparametric Statistics. 22 (2): 237–255. doi:10.1080/10485250903301525. ISSN 1048-5252. S2CID 119684596.
↑ Chen, Chixiang; Wang, Ming; Wu, Rongling; Li, Runze (2022). "A Robust Consistent Information Criterion for Model Selection Based on Empirical Likelihood". Statistica Sinica. arXiv: 2006.13281 . doi:10.5705/ss.202020.0254 (inactive 2024-11-11). ISSN 1017-0405. S2CID 220042083.{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
↑ Zhou, M. (2015). Empirical Likelihood Method in Survival Analysis (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b18598
↑ Chen, Song Xi, and Ingrid Van Keilegom. "A review on empirical likelihood methods for regression." TEST volume 18, pages 415–447, (2009) https://doi.org/10.1007/s11749-009-0159-5

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Owen, Art B. (2001). Empirical likelihood. Boca Raton, Fla. ISBN 978-1-4200-3615-2. OCLC 71012491.{{cite book}}: CS1 maint: location missing publisher (link)

[2] Owen, Art B. (1988). "Empirical likelihood ratio confidence intervals for a single functional". Biometrika. 75 (2): 237–249. doi:10.1093/biomet/75.2.237. ISSN 0006-3444.

[3] ${\frac {{\hat {F}}(y_{i})-{\hat {F}}(y_{i}-\delta y)}{\delta y}}$ is an estimate of the probability density, compare histogram

[4] Wang, Dong; Chen, Song Xi (2009-02-01). "Empirical likelihood for estimating equations with missing values". The Annals of Statistics. 37 (1). arXiv: 0903.0726 . doi:10.1214/07-aos585. ISSN 0090-5364. S2CID 5427751.

[5] Mittelhammer, Judge, and Miller (2000), 292.

[Bera2002-6] 1 2 Bera, Anil K.; Bilias, Yannis (2002). "The MM, ME, ML, EL, EF and GMM approaches to estimation: a synthesis". Journal of Econometrics. 107 (1–2): 51–86. doi:10.1016/S0304-4076(01)00113-0.

[7] Owen, Art (1990-03-01). "Empirical Likelihood Ratio Confidence Regions". The Annals of Statistics. 18 (1). doi: 10.1214/aos/1176347494 . ISSN 0090-5364.

[8] Dong, Lauren Bin; Giles, David E. A. (2007-01-30). "An Empirical Likelihood Ratio Test for Normality". Communications in Statistics - Simulation and Computation. 36 (1): 197–215. doi:10.1080/03610910601096544. ISSN 0361-0918. S2CID 16866055.

[9] Chen, Jien; Lazar, Nicole A. (2010-01-27). "Quantile estimation for discrete data via empirical likelihood". Journal of Nonparametric Statistics. 22 (2): 237–255. doi:10.1080/10485250903301525. ISSN 1048-5252. S2CID 119684596.

[10] Chen, Chixiang; Wang, Ming; Wu, Rongling; Li, Runze (2022). "A Robust Consistent Information Criterion for Model Selection Based on Empirical Likelihood". Statistica Sinica. arXiv: 2006.13281 . doi:10.5705/ss.202020.0254 (inactive 2024-11-11). ISSN 1017-0405. S2CID 220042083.{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)

[11] Zhou, M. (2015). Empirical Likelihood Method in Survival Analysis (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/b18598

[12] Chen, Song Xi, and Ingrid Van Keilegom. "A review on empirical likelihood methods for regression." TEST volume 18, pages 415–447, (2009) https://doi.org/10.1007/s11749-009-0159-5

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]