Hellinger distance

Last updated

In probability and statistics, the Hellinger distance (closely related to, although different from, the Bhattacharyya distance) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909. [1] [2]

Contents

It is sometimes called the Jeffreys distance. [3] [4]

Definition

Measure theory

To define the Hellinger distance in terms of measure theory, let and denote two probability measures on a measure space that are absolutely continuous with respect to an auxiliary measure . Such a measure always exists, e.g . The square of the Hellinger distance between and is defined as the quantity

Here, and , i.e. and are the Radon–Nikodym derivatives of P and Q respectively with respect to . This definition does not depend on , i.e. the Hellinger distance between P and Q does not change if is replaced with a different probability measure with respect to which both P and Q are absolutely continuous. For compactness, the above formula is often written as

Probability theory using Lebesgue measure

To define the Hellinger distance in terms of elementary probability theory, we take λ to be the Lebesgue measure, so that dP / dλ and dQ / dλ are simply probability density functions. If we denote the densities as f and g, respectively, the squared Hellinger distance can be expressed as a standard calculus integral

where the second form can be obtained by expanding the square and using the fact that the integral of a probability density over its domain equals 1.

The Hellinger distance H(P, Q) satisfies the property (derivable from the Cauchy–Schwarz inequality)

Discrete distributions

For two discrete probability distributions and , their Hellinger distance is defined as

which is directly related to the Euclidean norm of the difference of the square root vectors, i.e.

Also, [ citation needed ]

Properties

The Hellinger distance forms a bounded metric on the space of probability distributions over a given probability space.

The maximum distance 1 is achieved when P assigns probability zero to every set to which Q assigns a positive probability, and vice versa.

Sometimes the factor in front of the integral is omitted, in which case the Hellinger distance ranges from zero to the square root of two.

The Hellinger distance is related to the Bhattacharyya coefficient as it can be defined as

Hellinger distances are used in the theory of sequential and asymptotic statistics. [5] [6]

The squared Hellinger distance between two normal distributions and is:

The squared Hellinger distance between two multivariate normal distributions and is [7]

The squared Hellinger distance between two exponential distributions and is:

The squared Hellinger distance between two Weibull distributions and (where is a common shape parameter and are the scale parameters respectively):

The squared Hellinger distance between two Poisson distributions with rate parameters and , so that and , is:

The squared Hellinger distance between two beta distributions and is:

where is the beta function.

The squared Hellinger distance between two gamma distributions and is:

where is the gamma function.

Connection with total variation distance

The Hellinger distance and the total variation distance (or statistical distance) are related as follows: [8]

The constants in this inequality may change depending on which renormalization you choose ( or ).

These inequalities follow immediately from the inequalities between the 1-norm and the 2-norm.

See also

Notes

  1. Nikulin, M.S. (2001) [1994], "Hellinger distance", Encyclopedia of Mathematics , EMS Press
  2. Hellinger, Ernst (1909), "Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen", Journal für die reine und angewandte Mathematik (in German), 1909 (136): 210–271, doi:10.1515/crll.1909.136.210, JFM   40.0393.01, S2CID   121150138
  3. "Jeffreys distance - Encyclopedia of Mathematics". encyclopediaofmath.org. Retrieved 2022-05-24.
  4. Jeffreys, Harold (1946-09-24). "An invariant form for the prior probability in estimation problems". Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences. 186 (1007): 453–461. Bibcode:1946RSPSA.186..453J. doi: 10.1098/rspa.1946.0056 . ISSN   0080-4630. PMID   20998741. S2CID   19490929.
  5. Torgerson, Erik (1991). "Comparison of Statistical Experiments". Encyclopedia of Mathematics. Vol. 36. Cambridge University Press.
  6. Liese, Friedrich; Miescke, Klaus-J. (2008). Statistical Decision Theory: Estimation, Testing, and Selection. Springer. ISBN   978-0-387-73193-3.
  7. Pardo, L. (2006). Statistical Inference Based on Divergence Measures. New York: Chapman and Hall/CRC. p. 51. ISBN   1-58488-600-5.
  8. Harsha, Prahladh (September 23, 2011). "Lecture notes on communication complexity" (PDF).

Related Research Articles

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

<span class="mw-page-title-main">Gumbel distribution</span> Particular case of the generalized extreme value distribution

In probability theory and statistics, the Gumbel distribution is used to model the distribution of the maximum of a number of samples of various distributions.

<i>F</i>-distribution Continuous probability distribution

In probability theory and statistics, the F-distribution or F-ratio, also known as Snedecor's F distribution or the Fisher–Snedecor distribution, is a continuous probability distribution that arises frequently as the null distribution of a test statistic, most notably in the analysis of variance (ANOVA) and other F-tests.

In statistics, the Bhattacharyya distance is a quantity which represents a notion of similarity between two probability distributions. It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.

<span class="mw-page-title-main">Lévy distribution</span> Probability distribution

In probability theory and statistics, the Lévy distribution, named after Paul Lévy, is a continuous probability distribution for a non-negative random variable. In spectroscopy, this distribution, with frequency as the dependent variable, is known as a van der Waals profile. It is a special case of the inverse-gamma distribution. It is a stable distribution.

<span class="mw-page-title-main">Pearson distribution</span> Family of continuous probability distributions

The Pearson distribution is a family of continuous probability distributions. It was first published by Karl Pearson in 1895 and subsequently extended by him in 1901 and 1916 in a series of articles on biostatistics.

In general relativity, the Gibbons–Hawking–York boundary term is a term that needs to be added to the Einstein–Hilbert action when the underlying spacetime manifold has a boundary.

<span class="mw-page-title-main">Covariant formulation of classical electromagnetism</span> Ways of writing certain laws of physics

The covariant formulation of classical electromagnetism refers to ways of writing the laws of classical electromagnetism in a form that is manifestly invariant under Lorentz transformations, in the formalism of special relativity using rectilinear inertial coordinate systems. These expressions both make it simple to prove that the laws of classical electromagnetism take the same form in any inertial coordinate system, and also provide a way to translate the fields and forces from one frame to another. However, this is not as general as Maxwell's equations in curved spacetime or non-rectilinear coordinate systems.

<span class="mw-page-title-main">Maxwell's equations in curved spacetime</span> Electromagnetism in general relativity

In physics, Maxwell's equations in curved spacetime govern the dynamics of the electromagnetic field in curved spacetime or where one uses an arbitrary coordinate system. These equations can be viewed as a generalization of the vacuum Maxwell's equations which are normally formulated in the local coordinates of flat spacetime. But because general relativity dictates that the presence of electromagnetic fields induce curvature in spacetime, Maxwell's equations in flat spacetime should be viewed as a convenient approximation.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

Financial models with long-tailed distributions and volatility clustering have been introduced to overcome problems with the realism of classical financial models. These classical models of financial time series typically assume homoskedasticity and normality and as such cannot explain stylized phenomena such as skewness, heavy tails, and volatility clustering of the empirical asset returns in finance. In 1963, Benoit Mandelbrot first used the stable distribution to model the empirical distributions which have the skewness and heavy-tail property. Since -stable distributions have infinite -th moments for all , the tempered stable processes have been proposed for overcoming this limitation of the stable distribution.

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

<span class="mw-page-title-main">Relativistic Lagrangian mechanics</span> Mathematical formulation of special and general relativity

In theoretical physics, relativistic Lagrangian mechanics is Lagrangian mechanics applied in the context of special relativity and general relativity.

References