Chernoff's distribution

Last updated August 19, 2023

In probability theory, Chernoff's distribution, named after Herman Chernoff, is the probability distribution of the random variable

Z={\underset {s\in \mathbf {R} }{\operatorname {argmax} }}\ (W(s)-s^{2}),

where W is a "two-sided" Wiener process (or two-sided "Brownian motion") satisfying W(0) = 0. If

V(a,c)={\underset {s\in \mathbf {R} }{\operatorname {argmax} }}\ (W(s)-c(s-a)^{2}),

then V(0, c) has density

f_{c}(t)={\frac {1}{2}}g_{c}(t)g_{c}(-t)

where g_c has Fourier transform given by

{\hat {g}}_{c}(s)={\frac {(2/c)^{1/3}}{\operatorname {Ai} (i(2c^{2})^{-1/3}s)}},\ \ \ s\in \mathbf {R}

and where Ai is the Airy function. Thus f_c is symmetric about 0 and the density ƒ_Z = ƒ₁. Groeneboom (1989)^[1] shows that

f_{Z}(z)\sim {\frac {1}{2}}{\frac {4^{4/3}|z|}{\operatorname {Ai} '({\tilde {a}}_{1})}}\exp \left(-{\frac {2}{3}}|z|^{3}+2^{1/3}{\tilde {a}}_{1}|z|\right){\text{ as }}z\rightarrow \infty

where ${\tilde {a}}_{1}\approx -2.3381$ is the largest zero of the Airy function Ai and where $\operatorname {Ai} '({\tilde {a}}_{1})\approx 0.7022$ . In the same paper, Groeneboom also gives an analysis of the process $\{V(a,1):a\in \mathbf {R} \}$ . The connection with the statistical problem of estimating a monotone density is discussed in Groeneboom (1985).^[2] Chernoff's distribution is now known to appear in a wide range of monotone problems including isotonic regression.^[3]

The Chernoff distribution should not be confused with the Chernoff geometric distribution^[4] (called the Chernoff point in information geometry) induced by the Chernoff information.

History

Groeneboom, Lalley and Temme^[5] state that the first investigation of this distribution was probably by Chernoff in 1964,^[6] who studied the behavior of a certain estimator of a mode. In his paper, Chernoff characterized the distribution through an analytic representation through the heat equation with suitable boundary conditions. Initial attempts at approximating Chernoff's distribution via solving the heat equation, however, did not achieve satisfactory precision due to the nature of the boundary conditions.^[5] The computation of the distribution is addressed, for example, in Groeneboom and Wellner (2001).^[7]

The connection of Chernoff's distribution with Airy functions was also found independently by Daniels and Skyrme^[8] and Temme,^[9] as cited in Groeneboom, Lalley and Temme. These two papers, along with Groeneboom (1989), were all written in 1984.^[5]

Related Research Articles

The Cauchy distribution, named after Augustin Cauchy, is a continuous probability distribution. It is also known, especially among physicists, as the Lorentz distribution, Cauchy–Lorentz distribution, Lorentz(ian) function, or Breit–Wigner distribution. The Cauchy distribution $is the distribution of the x -intercept of a ray issuing from with a uniformly distributed angle. It is also the distribution of the ratio of two independent normally distributed random variables with mean zero.$

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

In probability theory, a probability density function (PDF), density function, or density of an absolutely continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would be equal to that sample. Probability density is the probability per unit length, in other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would be close to one sample compared to the other sample.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other,, the covariance is negative. The sign of the covariance, therefore, shows the tendency in the linear relationship between the variables. The magnitude of the covariance is the geometric mean of the variances that are in-common for the two random variables. The correlation coefficient normalizes the covariance by dividing by the geometric mean of the total variances for the two random variables.

In physics, a Langevin equation is a stochastic differential equation describing how a system evolves when subjected to a combination of deterministic and fluctuating ("random") forces. The dependent variables in a Langevin equation typically are collective (macroscopic) variables changing only slowly in comparison to the other (microscopic) variables of the system. The fast (microscopic) variables are responsible for the stochastic nature of the Langevin equation. One application is to Brownian motion, which models the fluctuating motion of a small particle in a fluid.

In statistics, the Wishart distribution is a generalization to multiple dimensions of the gamma distribution. It is named in honor of John Wishart, who first formulated the distribution in 1928. Other names include Wishart ensemble, or Wishart–Laguerre ensemble, or LOE, LUE, LSE.

In the physical sciences, the Airy function (or Airy function of the first kind) $Ai(x)$ is a special function named after the British astronomer George Biddell Airy (1801–1892). The function $Ai(x)$ and the related function $Bi(x)$ , are linearly independent solutions to the differential equation

In statistics, an expectation–maximization (EM) algorithm is an iterative method to find (local) maximum likelihood or maximum a posteriori (MAP) estimates of parameters in statistical models, where the model depends on unobserved latent variables. The EM iteration alternates between performing an expectation (E) step, which creates a function for the expectation of the log-likelihood evaluated using the current estimate for the parameters, and a maximization (M) step, which computes parameters maximizing the expected log-likelihood found on the E step. These parameter-estimates are then used to determine the distribution of the latent variables in the next E step.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In mathematics, a $π$ -system on a set $is a collection of certain subsets of such that$

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

<span class="mw-page-title-main">Wallenius' noncentral hypergeometric distribution</span>

In probability theory and statistics, Wallenius' noncentral hypergeometric distribution is a generalization of the hypergeometric distribution where items are sampled with bias.

<span class="mw-page-title-main">Q-function</span> Statistics function

In statistics, the Q-function is the tail distribution function of the standard normal distribution. In other words, $is the probability that a normal (Gaussian) random variable will obtain a value larger than standard deviations. Equivalently, is the probability that a standard normal random variable takes a value larger than .$

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

In auction theory, particularly Bayesian-optimal mechanism design, a virtual valuation of an agent is a function that measures the surplus that can be extracted from that agent.

In statistics, the complex Wishart distribution is a complex version of the Wishart distribution. It is the distribution of $times the sample Hermitian covariance matrix of zero-mean independent Gaussian random variables. It has support for Hermitian positive definite matrices.$

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

Petrus (Piet) Groeneboom is a Dutch statistician who made major advances in the field of shape-constrained statistical inference such as isotonic regression, and also worked in probability theory.

References

↑ Groeneboom, Piet (1989). "Brownian motion with a parabolic drift and Airy functions". Probability Theory and Related Fields. 81: 79–109. doi: 10.1007/BF00343738 . MR 0981568. S2CID 119980629.
↑ Groeneboom, Piet (1985). Le Cam, L.E.; Olshen, R. A. (eds.). Estimating a monotone density. Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, vol. II. pp. 539–555.
↑ Groeneboom, Piet; Jongbloed, Geurt (2018). "Some Developments in the Theory of Shape Constrained Inference". Statistical Science . 33 (4): 473–492. doi: 10.1214/18-STS657 . S2CID 13672538.
↑ Nielsen, Frank (2022). "Revisiting Chernoff Information with Likelihood Ratio Exponential Families". Entropy. MDPI. 24 (10): 1400. doi: 10.3390/e24101400 .
1 2 3 Groeneboom, Piet; Lalley, Steven; Temme, Nico (2015). "Chernoff's distribution and differential equations of parabolic and Airy type". Journal of Mathematical Analysis and Applications . 423 (2): 1804–1824. doi: 10.1016/j.jmaa.2014.10.051 . MR 3278229. S2CID 119173815.
↑ Chernoff, Herman (1964). "Estimation of the mode". Annals of the Institute of Statistical Mathematics. 16: 31–41. doi:10.1007/BF02868560. MR 0172382. S2CID 121030566.
↑ Groeneboom, Piet; Wellner, Jon A. (2001). "Computing Chernoff's Distribution". Journal of Computational and Graphical Statistics. 10 (2): 388–400. CiteSeerX 10.1.1.369.863 . doi:10.1198/10618600152627997. MR 1939706. S2CID 6573960.
↑ Daniels, H.E.; Skyrme, T.H.R. (1985). "The maximum of a random walk whose mean path has a maximum". Advances in Applied Probability. 17 (1): 85–99. doi:10.2307/1427054. JSTOR 1427054. MR 0778595. S2CID 124603511.
↑ Temme, N.M. (1985). "A convolution integral equation solved by Laplace transformations". Journal of Computational and Applied Mathematics. 12–13: 609–613. doi:10.1016/0377-0427(85)90052-4. MR 0793989. S2CID 120496241.

This probability-related article is a stub. You can help Wikipedia by expanding it.

This statistics-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Groeneboom, Piet (1989). "Brownian motion with a parabolic drift and Airy functions". Probability Theory and Related Fields. 81: 79–109. doi: 10.1007/BF00343738 . MR 0981568. S2CID 119980629.

[2] Groeneboom, Piet (1985). Le Cam, L.E.; Olshen, R. A. (eds.). Estimating a monotone density. Proceedings of the Berkeley conference in honor of Jerzy Neyman and Jack Kiefer, vol. II. pp. 539–555.

[3] Groeneboom, Piet; Jongbloed, Geurt (2018). "Some Developments in the Theory of Shape Constrained Inference". Statistical Science . 33 (4): 473–492. doi: 10.1214/18-STS657 . S2CID 13672538.

[4] Nielsen, Frank (2022). "Revisiting Chernoff Information with Likelihood Ratio Exponential Families". Entropy. MDPI. 24 (10): 1400. doi: 10.3390/e24101400 .

[GLT16-5] 1 2 3 Groeneboom, Piet; Lalley, Steven; Temme, Nico (2015). "Chernoff's distribution and differential equations of parabolic and Airy type". Journal of Mathematical Analysis and Applications . 423 (2): 1804–1824. doi: 10.1016/j.jmaa.2014.10.051 . MR 3278229. S2CID 119173815.

[6] Chernoff, Herman (1964). "Estimation of the mode". Annals of the Institute of Statistical Mathematics. 16: 31–41. doi:10.1007/BF02868560. MR 0172382. S2CID 121030566.

[7] Groeneboom, Piet; Wellner, Jon A. (2001). "Computing Chernoff's Distribution". Journal of Computational and Graphical Statistics. 10 (2): 388–400. CiteSeerX 10.1.1.369.863 . doi:10.1198/10618600152627997. MR 1939706. S2CID 6573960.

[8] Daniels, H.E.; Skyrme, T.H.R. (1985). "The maximum of a random walk whose mean path has a maximum". Advances in Applied Probability. 17 (1): 85–99. doi:10.2307/1427054. JSTOR 1427054. MR 0778595. S2CID 124603511.

[9] Temme, N.M. (1985). "A convolution integral equation solved by Laplace transformations". Journal of Computational and Applied Mathematics. 12–13: 609–613. doi:10.1016/0377-0427(85)90052-4. MR 0793989. S2CID 120496241.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]