In statistics, the Bhattacharyya distance is a quantity which represents a notion of similarity between two probability distributions. [1] It is closely related to the Bhattacharyya coefficient, which is a measure of the amount of overlap between two statistical samples or populations.
It is not a metric, despite being named a "distance", since it does not obey the triangle inequality.
Both the Bhattacharyya distance and the Bhattacharyya coefficient are named after Anil Kumar Bhattacharyya, a statistician who worked in the 1930s at the Indian Statistical Institute. [2] He has developed this through a series of papers. [3] [4] [5] He developed the method to measure the distance between two non-normal distributions and illustrated this with the classical multinomial populations, [3] this work despite being submitted for publication in 1941, appeared almost five years later in Sankhya. [3] [2] Consequently, Professor Bhattacharyya started working toward developing a distance metric for probability distributions that are absolutely continuous with respect to the Lebesgue measure and published his progress in 1942, at Proceedings of the Indian Science Congress [4] and the final work has appeared in 1943 in the Bulletin of the Calcutta Mathematical Society. [5]
For probability distributions and on the same domain , the Bhattacharyya distance is defined as
where
is the Bhattacharyya coefficient for discrete probability distributions.
For continuous probability distributions, with and where and are the probability density functions, the Bhattacharyya coefficient is defined as
More generally, given two probability measures on a measurable space , let be a (sigma finite) measure such that and are absolutely continuous with respect to i.e. such that , and for probability density functions with respect to defined -almost everywhere. Such a measure, even such a probability measure, always exists, e.g. . Then define the Bhattacharyya measure on by
It does not depend on the measure , for if we choose a measure such that and an other measure choice are absolutely continuous i.e. and , then
and similarly for . We then have
We finally define the Bhattacharyya coefficient
By the above, the quantity does not depend on , and by the Cauchy inequality . Using , and ,
Let , , where is the normal distribution with mean and variance ; then
And in general, given two multivariate normal distributions ,
where [6] Note that the first term is a squared Mahalanobis distance.
and .
does not obey the triangle inequality, though the Hellinger distance does.
The Bhattacharyya distance can be used to upper and lower bound the Bayes error rate:
where and is the posterior probability. [7]
The Bhattacharyya coefficient quantifies the "closeness" of two random statistical samples.
Given two sequences from distributions , bin them into buckets, and let the frequency of samples from in bucket be , and similarly for , then the sample Bhattacharyya coefficient is
which is an estimator of . The quality of estimation depends on the choice of buckets; too few buckets would overestimate , while too many would underestimate.
A common task in classification is estimating the separability of classes. Up to a multiplicative factor, the squared Mahalanobis distance is a special case of the Bhattacharyya distance when the two classes are normally distributed with the same variances. When two classes have similar means but significantly different variances, the Mahalanobis distance would be close to zero, while the Bhattacharyya distance would not be.
The Bhattacharyya coefficient is used in the construction of polar codes. [8]
The Bhattacharyya distance is used in feature extraction and selection, [9] image processing, [10] speaker recognition, [11] phone clustering, [12] and in genetics. [13]
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.
In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.
The Einstein–Hilbert action in general relativity is the action that yields the Einstein field equations through the stationary-action principle. With the (− + + +) metric signature, the gravitational part of the action is given as
In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one reference probability distribution P is different from a second probability distribution Q. Mathematically, it is defined as
In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.
In Bayesian statistics, the Jeffreys prior is a non-informative prior distribution for a parameter space. Named after Sir Harold Jeffreys, its density function is proportional to the square root of the determinant of the Fisher information matrix:
Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.
In physics, Maxwell's equations in curved spacetime govern the dynamics of the electromagnetic field in curved spacetime or where one uses an arbitrary coordinate system. These equations can be viewed as a generalization of the vacuum Maxwell's equations which are normally formulated in the local coordinates of flat spacetime. But because general relativity dictates that the presence of electromagnetic fields induce curvature in spacetime, Maxwell's equations in flat spacetime should be viewed as a convenient approximation.
In quantum mechanics, notably in quantum information theory, fidelity quantifies the "closeness" between two density matrices. It expresses the probability that one state will pass a test to identify as the other. It is not a metric on the space of density matrices, but it can be used to define the Bures metric on this space.
The sensitivity index or discriminability index or detectability index is a dimensionless statistic used in signal detection theory. A higher index indicates that the signal can be more readily detected.
In mathematics, Gaussian measure is a Borel measure on finite-dimensional Euclidean space , closely related to the normal distribution in statistics. There is also a generalization to infinite-dimensional spaces. Gaussian measures are named after the German mathematician Carl Friedrich Gauss. One reason why Gaussian measures are so ubiquitous in probability theory is the central limit theorem. Loosely speaking, it states that if a random variable is obtained by summing a large number of independent random variables with variance 1, then has variance and its law is approximately Gaussian.
Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.
Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.
In probability and statistics, the Hellinger distance is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.
In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.
In the mathematical theory of random matrices, the Marchenko–Pastur distribution, or Marchenko–Pastur law, describes the asymptotic behavior of singular values of large rectangular random matrices. The theorem is named after soviet mathematicians Volodymyr Marchenko and Leonid Pastur who proved this result in 1967.
In mathematical physics, the Belinfante–Rosenfeld tensor is a modification of the stress–energy tensor that is constructed from the canonical stress–energy tensor and the spin current so as to be symmetric yet still conserved.
In probability theory and statistics, the normal-inverse-Wishart distribution is a multivariate four-parameter family of continuous probability distributions. It is the conjugate prior of a multivariate normal distribution with unknown mean and covariance matrix.
Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.