Von Mises–Fisher distribution

Last updated August 27, 2024 • 11 min readFrom Wikipedia, The Free Encyclopedia

In directional statistics, the von Mises–Fisher distribution (named after Richard von Mises and Ronald Fisher), is a probability distribution on the $(p-1)$ -sphere in $\mathbb {R} ^{p}$ . If $p=2$ the distribution reduces to the von Mises distribution on the circle.

Definition

The probability density function of the von Mises–Fisher distribution for the random p-dimensional unit vector $\mathbf {x}$ is given by:

f_{p}(\mathbf {x

where $\kappa \geq 0,\left\Vert {\boldsymbol {\mu }}\right\Vert =1$ and the normalization constant $C_{p}(\kappa )$ is equal to

C_{p}(\kappa )={\frac {\kappa ^{p/2-1}}{(2\pi )^{p/2}I_{p/2-1}(\kappa )}},

where $I_{v}$ denotes the modified Bessel function of the first kind at order $v$ . If $p=3$ , the normalization constant reduces to

C_{3}(\kappa )={\frac {\kappa }{4\pi \sinh \kappa }}={\frac {\kappa }{2\pi (e^{\kappa }-e^{-\kappa })}}.

The parameters ${\boldsymbol {\mu }}$ and $\kappa$ are called the mean direction and concentration parameter , respectively. The greater the value of $\kappa$ , the higher the concentration of the distribution around the mean direction ${\boldsymbol {\mu }}$ . The distribution is unimodal for $\kappa >0$ , and is uniform on the sphere for $\kappa =0$ .

The von Mises–Fisher distribution for $p=3$ is also called the Fisher distribution.^[1]^[2] It was first used to model the interaction of electric dipoles in an electric field.^[3] Other applications are found in geology, bioinformatics, and text mining.

Note on the normalization constant

In the textbook, Directional Statistics^[3] by Mardia and Jupp, the normalization constant given for the Von Mises Fisher probability density is apparently different from the one given here: $C_{p}(\kappa )$ . In that book, the normalization constant is specified as:

C_{p}^{*}(\kappa )={\frac {({\frac {\kappa }{2}})^{p/2-1}}{\Gamma (p/2)I_{p/2-1}(\kappa )}}

where $\Gamma$ is the gamma function. This is resolved by noting that Mardia and Jupp give the density "with respect to the uniform distribution", while the density here is specified in the usual way, with respect to Lebesgue measure. The density (w.r.t. Lebesgue measure) of the uniform distribution is the reciprocal of the surface area of the (p-1)-sphere, so that the uniform density function is given by the constant:

C_{p}(0)={\frac {\Gamma (p/2)}{2\pi ^{p/2}}}

It then follows that:

C_{p}^{*}(\kappa )={\frac {C_{p}(\kappa )}{C_{p}(0)}}

While the value for $C_{p}(0)$ was derived above via the surface area, the same result may be obtained by setting $\kappa =0$ in the above formula for $C_{p}(\kappa )$ . This can be done by noting that the series expansion for $I_{p/2-1}(\kappa )$ divided by $\kappa ^{p/2-1}$ has but one non-zero term at $\kappa =0$ . (To evaluate that term, one needs to use the definition $0^{0}=1$ .)

Support

The support of the Von Mises–Fisher distribution is the hypersphere, or more specifically, the $(p-1)$ -sphere, denoted as

S^{p-1}=\left\{\mathbf {x} \in \mathbb {R} ^{p}:\left\|\mathbf {x} \right\|=1\right\}

This is a $(p-1)$ -dimensional manifold embedded in $p$ -dimensional Euclidean space, $\mathbb {R} ^{p}$ .

Relation to normal distribution

Starting from a normal distribution with isotropic covariance $\kappa ^{-1}\mathbf {I}$ and mean ${\boldsymbol {\mu }}$ of length $r>0$ , whose density function is:

G_{p}(\mathbf {x

the Von Mises–Fisher distribution is obtained by conditioning on $\left\|\mathbf {x} \right\|=1$ . By expanding

(\mathbf {x} -{\boldsymbol {\mu }})^{\mathsf {T}}(\mathbf {x} -{\boldsymbol {\mu }})=\mathbf {x} ^{\mathsf {T}}\mathbf {x} +{\boldsymbol {\mu }}^{\mathsf {T}}{\boldsymbol {\mu }}-2{\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x} ,

and using the fact that the first two right-hand-side terms are fixed, the Von Mises-Fisher density, $f_{p}(\mathbf {x} ;r^{-1}{\boldsymbol {\mu }},r\kappa )$ is recovered by recomputing the normalization constant by integrating $\mathbf {x}$ over the unit sphere. If $r=0$ , we get the uniform distribution, with density $f_{p}(\mathbf {x$ .

More succinctly, the restriction of any isotropic multivariate normal density to the unit hypersphere, gives a Von Mises-Fisher density, up to normalization.

This construction can be generalized by starting with a normal distribution with a general covariance matrix, in which case conditioning on $\left\|\mathbf {x} \right\|=1$ gives the Fisher-Bingham distribution.

Estimation of parameters

Mean direction

A series of N independent unit vectors $x_{i}$ are drawn from a von Mises–Fisher distribution. The maximum likelihood estimates of the mean direction $\mu$ is simply the normalized arithmetic mean, a sufficient statistic:^[3]

\mu ={\bar {x}}/{\bar {R}},{\text{where }}{\bar {x}}={\frac {1}{N}}\sum _{i}^{N}x_{i},{\text{and }}{\bar {R}}=\|{\bar {x}}\|,

Concentration parameter

Use the modified Bessel function of the first kind to define

A_{p}(\kappa )={\frac {I_{p/2}(\kappa )}{I_{p/2-1}(\kappa )}}.

Then:

\kappa =A_{p}^{-1}({\bar {R}}).

Thus $\kappa$ is the solution to

A_{p}(\kappa )={\frac {\left\|\sum _{i}^{N}x_{i}\right\|}{N}}={\bar {R}}.

A simple approximation to $\kappa$ is (Sra, 2011)

{\hat {\kappa }}={\frac {{\bar {R}}(p-{\bar {R}}^{2})}{1-{\bar {R}}^{2}}},

A more accurate inversion can be obtained by iterating the Newton method a few times

{\hat {\kappa }}_{1}={\hat {\kappa }}-{\frac {A_{p}({\hat {\kappa }})-{\bar {R}}}{1-A_{p}({\hat {\kappa }})^{2}-{\frac {p-1}{\hat {\kappa }}}A_{p}({\hat {\kappa }})}},

{\hat {\kappa }}_{2}={\hat {\kappa }}_{1}-{\frac {A_{p}({\hat {\kappa }}_{1})-{\bar {R}}}{1-A_{p}({\hat {\kappa }}_{1})^{2}-{\frac {p-1}{{\hat {\kappa }}_{1}}}A_{p}({\hat {\kappa }}_{1})}}.

Standard error

For N ≥ 25, the estimated spherical standard error of the sample mean direction can be computed as:^[4]

{\hat {\sigma }}=\left({\frac {d}{N{\bar {R}}^{2}}}\right)^{1/2}

where

d=1-{\frac {1}{N}}\sum _{i}^{N}\left(\mu ^{T}x_{i}\right)^{2}

It is then possible to approximate a $100(1-\alpha )\%$ a spherical confidence interval (a confidence cone) about $\mu$ with semi-vertical angle:

q=\arcsin \left(e_{\alpha }^{1/2}{\hat {\sigma }}\right),

where

e_{\alpha }=-\ln(\alpha ).

For example, for a 95% confidence cone, $\alpha =0.05,e_{\alpha }=-\ln(0.05)=2.996,$ and thus $q=\arcsin(1.731{\hat {\sigma }}).$

Expected value

The expected value of the Von Mises–Fisher distribution is not on the unit hypersphere, but instead has a length of less than one. This length is given by $A_{p}(\kappa )$ as defined above. For a Von Mises–Fisher distribution with mean direction ${\boldsymbol {\mu }}$ and concentration $\kappa >0$ , the expected value is:

A_{p}(\kappa ){\boldsymbol {\mu }}

.

For $\kappa =0$ , the expected value is at the origin. For finite $\kappa >0$ , the length of the expected value is strictly between zero and one and is a monotonic rising function of $\kappa$ .

The empirical mean (arithmetic average) of a collection of points on the unit hypersphere behaves in a similar manner, being close to the origin for widely spread data and close to the sphere for concentrated data. Indeed, for the Von Mises–Fisher distribution, the expected value of the maximum-likelihood estimate based on a collection of points is equal to the empirical mean of those points.

Entropy and KL divergence

The expected value can be used to compute differential entropy and KL divergence.

The differential entropy of ${\text{VMF}}({\boldsymbol {\mu }},\kappa )$ is:

{\bigl \langle }-\log f_{p}(\mathbf {x

where the angle brackets denote expectation. Notice that the entropy is a function of $\kappa$ only.

The KL divergence between ${\text{VMF}}({\boldsymbol {\mu _{0}}},\kappa _{0})$ and ${\text{VMF}}({\boldsymbol {\mu _{1}}},\kappa _{1})$ is:

{\Bigl \langle }\log {\frac {f_{p}(\mathbf {x

Transformation

Von Mises-Fisher (VMF) distributions are closed under orthogonal linear transforms. Let $\mathbf {U}$ be a $p$ -by- $p$ orthogonal matrix. Let $\mathbf {x} \sim {\text{VMF}}({\boldsymbol {\mu }},\kappa )$ and apply the invertible linear transform: $\mathbf {y} =\mathbf {Ux}$ . The inverse transform is $\mathbf {x} =\mathbf {U'y}$ , because the inverse of an orthogonal matrix is its transpose: $\mathbf {U} ^{-1}=\mathbf {U} '$ . The Jacobian of the transform is $\mathbf {U}$ , for which the absolute value of its determinant is 1, also because of the orthogonality. Using these facts and the form of the VMF density, it follows that:

\mathbf {y} \sim {\text{VMF}}(\mathbf {U} {\boldsymbol {\mu }},\kappa ).

One may verify that since ${\boldsymbol {\mu }}$ and $\mathbf {x}$ are unit vectors, then by the orthogonality, so are $\mathbf {U} {\boldsymbol {\mu }}$ and $\mathbf {y}$ .

Pseudo-random number generation

General case

An algorithm for drawing pseudo-random samples from the Von Mises Fisher (VMF) distribution was given by Ulrich^[5] and later corrected by Wood.^[6] An implementation in R is given by Hornik and Grün;^[7] and a fast Python implementation is described by Pinzón and Jung.^[8]

To simulate from a VMF distribution on the $(p-1)$ -dimensional unitsphere, $S^{p-1}$ , with mean direction ${\boldsymbol {\mu }}\in S^{p-1}$ , these algorithms use the following radial-tangential decomposition for a point $\mathbf {x} \in S^{p-1}\subset \mathbb {R} ^{p}$ :

\mathbf {x} =t{\boldsymbol {\mu }}+{\sqrt {1-t^{2}}}\mathbf {v}

where $\mathbf {v} \in \mathbb {R} ^{p}$ lives in the tangential $(p-2)$ -dimensional unit-subsphere that is centered at and perpendicular to ${\boldsymbol {\mu }}$ ; while $t\in [-1,1]$ . To draw a sample $\mathbf {x}$ from a VMF with parameters ${\boldsymbol {\mu }}$ and $\kappa$ , $\mathbf {v}$ must be drawn from the uniform distribution on the tangential subsphere; and the radial component, $t$ , must be drawn independently from the distribution with density:

f_{\text{radial}}(t;\kappa ,p)={\frac {(\kappa /2)^{\nu }}{\Gamma ({\frac {1}{2}})\Gamma (\nu +{\frac {1}{2}})I_{\nu }(\kappa )}}e^{t\kappa }(1-t^{2})^{\nu -{\frac {1}{2}}}

where $\nu ={\frac {p}{2}}-1$ . The normalization constant for this density may be verified by using:

I_{\nu }(\kappa )={\frac {(\kappa /2)^{\nu }}{\Gamma ({\frac {1}{2}})\Gamma (\nu +{\frac {1}{2}})}}\int _{-1}^{1}e^{t\kappa }(1-t^{2})^{\nu -{\frac {1}{2}}}\,dt

as given in Appendix 1 (A.3) in Directional Statistics.^[3] Drawing the $t$ samples from this density by using a rejection sampling algorithm is explained in the above references. To draw the uniform $\mathbf {v}$ samples perpendicular to ${\boldsymbol {\mu }}$ , see the algorithm in,^[8] or otherwise a Householder transform can be used as explained in Algorithm 1 in.^[9]

3-D sphere

To generate a Von Mises–Fisher distributed pseudo-random spherical 3-D unit vector^[10]^[11] ${\textstyle \mathbf {X} _{s}}$ on the ${\textstyle S^{2}}$ sphere for a given ${\textstyle \mu }$ and ${\textstyle \kappa }$ , define

$\mathbf {X} _{s}=[r,\theta ,\phi ]$

where ${\textstyle \theta }$ is the polar angle, ${\textstyle \phi }$ the azimuthal angle, and ${\textstyle r=1}$ the distance to the center of the sphere

for ${\textstyle \mathbf {\mu } =[0,(.),1]}$ the pseudo-random triplet is then given by

$\mathbf {X} _{s}=[1,\arccos W,V]$

where ${\textstyle V}$ is sampled from the continuous uniform distribution ${\textstyle U(a,b)}$ with lower bound ${\textstyle a}$ and upper bound ${\textstyle b}$

$V\sim U(0,2\pi )$

and

$W=\cos \theta =1+{\frac {1}{\kappa }}(\ln \xi +\ln(1-{\frac {\xi -1}{\xi }}e^{-2\kappa }))$

where ${\textstyle \xi }$ is sampled from the standard continuous uniform distribution ${\textstyle U(0,1)}$

$\xi \sim U(0,1)$

here, ${\textstyle W}$ should be set to ${\textstyle W=1}$ when ${\textstyle \mathbf {\xi } =0}$ and ${\textstyle \mathbf {X} _{s}}$ rotated to match any other desired ${\textstyle \mu }$ .

Distribution of polar angle

For $p=3$ , the angle θ between $\mathbf {x}$ and ${\boldsymbol {\mu }}$ satisfies $\cos \theta ={\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x}$ . It has the distribution

p(\theta )=\int d^{2}xf(x;{\boldsymbol {\mu }},\kappa )\,\delta \left(\theta -{\text{arc cos}}({\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x} )\right)

,

which can be easily evaluated as

p(\theta )=2\pi C_{3}(\kappa )\,\sin \theta \,e^{\kappa \cos \theta }

.

For the general case, $p\geq 2$ , the distribution for the cosine of this angle:

\cos \theta =t={\boldsymbol {\mu }}^{\mathsf {T}}\mathbf {x}

is given by $f_{\text{radial}}(t;\kappa ,p)$ , as explained above.

The uniform hypersphere distribution

When $\kappa =0$ , the Von Mises–Fisher distribution, ${\text{VMF}}({\boldsymbol {\mu }},\kappa )$ on $S^{p-1}$ simplifies to the uniform distribution on $S^{p-1}\subset \mathbb {R} ^{p}$ . The density is constant with value $C_{p}(0)$ . Pseudo-random samples can be generated by generating samples in $\mathbb {R} ^{p}$ from the standard multivariate normal distribution, followed by normalization to unit norm.

Component marginal of uniform distribution

For $1\leq i\leq p$ , let $x_{i}$ be any component of $\mathbf {x} \in S^{p-1}$ . The marginal distribution for $x_{i}$ has the density:^[12]^[13]

f_{i}(x_{i};p)=f_{\text{radial}}(x_{i};\kappa =0,p)={\frac {(1-x_{i}^{2})^{{\frac {p-1}{2}}-1}}{B{\bigl (}{\frac {1}{2}},{\frac {p-1}{2}}{\bigr )}}}

where $B(\alpha ,\beta )$ is the beta function. This distribution may be better understood by highlighting its relation to the beta distribution:

{\begin{aligned}x_{i}^{2}&\sim {\text{Beta}}{\bigl (}{\frac {1}{2}},{\frac {p-1}{2}}{\bigr )}&&{\text{and}}&{\frac {x_{i}+1}{2}}&\sim {\text{Beta}}{\bigl (}{\frac {p-1}{2}},{\frac {p-1}{2}}{\bigr )}\end{aligned}}

where the Legendre duplication formula is useful to understand the relationships between the normalization constants of the various densities above.

Note that the components of $\mathbf {x} \in S^{p-1}$ are not independent, so that the uniform density is not the product of the marginal densities; and $\mathbf {x}$ cannot be assembled by independent sampling of the components.

Distribution of dot-products

In machine learning, especially in image classification, to-be-classified inputs (e.g. images) are often compared using cosine similarity, which is the dot product between intermediate representations in the form of unitvectors (termed embeddings). The dimensionality is typically high, with $p$ at least several hundreds. The deep neural networks that extract embeddings for classification should learn to spread the classes as far apart as possible and ideally this should give classes that are uniformly distributed on $S^{p-1}$ .^[14] For a better statistical understanding of across-class cosine similarity, the distribution of dot-products between unitvectors independently sampled from the uniform distribution may be helpful.

Let $\mathbf {x} ,\mathbf {y} \in S^{p-1}$ be unitvectors in $\mathbb {R} ^{p}$ , independently sampled from the uniform distribution. Define:

{\begin{aligned}t&=\mathbf {x} '\mathbf {y} \in [-1,1],&r&={\frac {t+1}{2}}\in [0,1],&s&={\text{logit}}(r)=\log {\frac {1+t}{1-t}}\in \mathbb {R} \end{aligned}}

where $t$ is the dot-product and $r,s$ are transformed versions of it. Then the distribution for $t$ is the same as the marginal component distribution given above;^[13] the distribution for $r$ is symmetric beta and the distribution for $s$ is symmetric logistic-beta:

{\begin{aligned}r&\sim {\text{Beta}}{\bigl (}{\frac {p-1}{2}},{\frac {p-1}{2}}{\bigr )},&s&\sim B_{\sigma }{\bigl (}{\frac {p-1}{2}},{\frac {p-1}{2}}{\bigr )}\end{aligned}}

The means and variances are:

{\begin{aligned}E[t]&=0,&E[r]&={\frac {1}{2}},&E[s]&=0,\end{aligned}}

and

{\begin{aligned}{\text{var}}[t]&={\frac {1}{p}},&{\text{var}}[r]&={\frac {1}{4p}},&{\text{var}}[s]&=2\psi '{\bigl (}{\frac {p-1}{2}}{\bigr )}\approx {\frac {4}{p-1}}\end{aligned}}

where $\psi '=\psi ^{(1)}$ is the first polygamma function. The variances decrease, the distributions of all three variables become more Gaussian, and the final approximation gets better as the dimensionality, $p$ , is increased.

Generalizations

Matrix Von Mises-Fisher

The matrix von Mises-Fisher distribution (also known as matrix Langevin distribution^[15]^[16]) has the density

f_{n,p}(\mathbf {X

supported on the Stiefel manifold of $n\times p$ orthonormal p-frames $\mathbf {X}$ , where $\mathbf {F}$ is an arbitrary $n\times p$ real matrix.^[17]^[18]

Saw distributions

Ulrich,^[5] in designing an algorithm for sampling from the VMF distribution, makes use of a family of distributions named after and explored by John G. Saw.^[19] A Saw distribution is a distribution on the $(p-1)$ -sphere, $S^{p-1}$ , with modal vector ${\boldsymbol {\mu }}\in S^{p-1}$ and concentration $\kappa \geq 0$ , and of which the density function has the form:

f_{\text{Saw}}(\mathbf {x

where $g$ is a non-negative, increasing function; and where $K_{P}(\kappa )$ is the normalization constant. The above-mentioned radial-tangential decomposition generalizes to the Saw family and the radial compoment, $t=\mathbf {x} '{\boldsymbol {\mu }}$ has the density:

f_{\text{Saw-radial}}(t;\kappa )={\frac {2\pi ^{p/2}}{\Gamma (p/2)}}{\frac {g(\kappa t)(1-t^{2})^{(p-3)/2}}{B{\bigl (}{\frac {1}{2}},{\frac {p-1}{2}}{\bigr )}K_{p}(\kappa )}}.

where $B$ is the beta function. Also notice that the left-hand factor of the radial density is the surface area of $S^{p-1}$ .

By setting $g(\kappa \mathbf {x} '{\boldsymbol {\mu }})=e^{\kappa \mathbf {x} '{\boldsymbol {\mu }}}$ , one recovers the VMF distribution.

Weighted Rademacher Distribution

The definition of the Von Mises-Fisher distribution can be extended to include also the case where $p=1$ , so that the support is the 0-dimensional hypersphere, which when embedded into 1-dimensional Euclidean space is the discrete set, $\{-1,1\}$ . The mean direction is $\mu \in \{-1,1\}$ and the concentration is $\kappa \geq 0$ . The probability mass function, for $x\in \{-1,1\}$ is:

f_{1}(x\mid \mu ,\kappa )={\frac {e^{\kappa \mu x}}{e^{-\kappa }+e^{\kappa }}}=\sigma (2\kappa \mu x)

where $\sigma (z)=1/(1+e^{-z})$ is the logistic sigmoid. The expected value is $\mu \,{\text{tanh}}(\kappa )$ . In the uniform case, at $\kappa =0$ , this distribution degenerates to the Rademacher distribution.

Related Research Articles

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is $The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is . A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate .$

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Navier–Stokes equations</span> Equations describing the motion of viscous fluid substances

The Navier–Stokes equations are partial differential equations which describe the motion of viscous fluid substances. They were named after French engineer and physicist Claude-Louis Navier and the Irish physicist and mathematician George Gabriel Stokes. They were developed over several decades of progressively building the theories, from 1822 (Navier) to 1842–1850 (Stokes).

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

In probability theory and statistics, the cumulants $κ n$ of a probability distribution are a set of quantities that provide an alternative to the moments of the distribution. Any two probability distributions whose moments are identical will have identical cumulants as well, and vice versa.

A Newtonian fluid is a fluid in which the viscous stresses arising from its flow are at every point linearly correlated to the local strain rate — the rate of change of its deformation over time. Stresses are proportional to the rate of change of the fluid's velocity vector.

In nuclear physics, the chiral model, introduced by Feza Gürsey in 1960, is a phenomenological model describing effective interactions of mesons in the chiral limit (where the masses of the quarks go to zero), but without necessarily mentioning quarks at all. It is a nonlinear sigma model with the principal homogeneous space of a Lie group $as its target manifold. When the model was originally introduced, this Lie group was the SU(N), where N is the number of quark flavors. The Riemannian metric of the target manifold is given by a positive constant multiplied by the Killing form acting upon the Maurer-Cartan form of SU(N).$

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In probability and statistics, a circular distribution or polar distribution is a probability distribution of a random variable whose values are angles, usually taken to be in the range [0, 2π). A circular distribution is often a continuous probability distribution, and hence has a probability density, but such distributions can also be discrete, in which case they are called circular lattice distributions. Circular distributions can be used even when the variables concerned are not explicitly angles: the main consideration is that there is not usually any real distinction between events occurring at the opposite ends of the range, and the division of the range could notionally be made at any point.

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle $on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises-Fisher distribution on the N -dimensional sphere.$

In directional statistics, the Kent distribution, also known as the 5-parameter Fisher–Bingham distribution, is a probability distribution on the unit sphere. It is the analogue on S² of the bivariate normal distribution with an unconstrained covariance matrix. The Kent distribution was proposed by John T. Kent in 1982, and is used in geology as well as bioinformatics.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

In quantum mechanics, the Pauli equation or Schrödinger–Pauli equation is the formulation of the Schrödinger equation for spin-1/2 particles, which takes into account the interaction of the particle's spin with an external electromagnetic field. It is the non-relativistic limit of the Dirac equation and can be used where particles are moving at speeds much less than the speed of light, so that relativistic effects can be neglected. It was formulated by Wolfgang Pauli in 1927. In its linearized form it is known as Lévy-Leblond equation.

In fluid dynamics and electrostatics, slender-body theory is a methodology that can be used to take advantage of the slenderness of a body to obtain an approximation to a field surrounding it and/or the net effect of the field on the body. Principal applications are to Stokes flow — at very low Reynolds numbers — and in electrostatics.

In probability theory and statistics, the negative multinomial distribution is a generalization of the negative binomial distribution (NB(x₀, p)) to more than two outcomes.

In probability theory and statistics, the bivariate von Mises distribution is a probability distribution describing values on a torus. It may be thought of as an analogue on the torus of the bivariate normal distribution. The distribution belongs to the field of directional statistics. The general bivariate von Mises distribution was first proposed by Kanti Mardia in 1975. One of its variants is today used in the field of bioinformatics to formulate a probabilistic model of protein structure in atomic detail, such as backbone-dependent rotamer libraries.

<span class="mw-page-title-main">Logit-normal distribution</span> Probability distribution

In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and t is the standard logistic function, then X = t(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log (X/(1-X)) is normally distributed. It is also known as the logistic normal distribution, which often refers to a multinomial logit version (e.g.).

The optical metric was defined by German theoretical physicist Walter Gordon in 1923 to study the geometrical optics in curved space-time filled with moving dielectric materials.

In the mathematical theory of probability, multivariate Laplace distributions are extensions of the Laplace distribution and the asymmetric Laplace distribution to multiple variables. The marginal distributions of symmetric multivariate Laplace distribution variables are Laplace distributions. The marginal distributions of asymmetric multivariate Laplace distribution variables are asymmetric Laplace distributions.

References

↑ Fisher, R. A. (1953). "Dispersion on a sphere". Proc. R. Soc. Lond. A. 217 (1130): 295–305. Bibcode:1953RSPSA.217..295F. doi:10.1098/rspa.1953.0064. S2CID 123166853.
↑ Watson, G. S. (1980). "Distributions on the Circle and on the Sphere". J. Appl. Probab. 19: 265–280. doi:10.2307/3213566. JSTOR 3213566. S2CID 222325569.
1 2 3 4 Mardia, Kanti; Jupp, P. E. (1999). Directional Statistics. John Wiley & Sons Ltd. ISBN 978-0-471-95333-3.
↑ Embleton, N. I. Fisher, T. Lewis, B. J. J. (1993). Statistical analysis of spherical data (1st pbk. ed.). Cambridge: Cambridge University Press. pp. 115–116. ISBN 0-521-45699-1.{{cite book}}: CS1 maint: multiple names: authors list (link)
1 2 Ulrich, Gary (1984). "Computer generation of distributions on the m-sphere". Applied Statistics. 33 (2): 158–163. doi:10.2307/2347441. JSTOR 2347441.
↑ Wood, Andrew T (1994). "Simulation of the Von Mises Fisher distribution". Communications in Statistics - Simulation and Computation. 23 (1): 157–164. doi:10.1080/03610919408813161.
↑ Hornik, Kurt; Grün, Bettina (2014). "movMF: An R Package for Fitting Mixtures of Von Mises-Fisher Distributions". Journal of Statistical Software. 58 (10). doi: 10.18637/jss.v058.i10 . S2CID 13171102.
1 2 Pinzón, Carlos; Jung, Kangsoo (2023-03-03), Fast Python sampler for the von Mises Fisher distribution , retrieved 2023-03-30
↑ De Cao, Nicola; Aziz, Wilker (13 Feb 2023). "The Power Spherical distribution". arXiv: 2006.04437 [stat.ML].
↑ Pakyuz-Charrier, Evren; Lindsay, Mark; Ogarko, Vitaliy; Giraud, Jeremie; Jessell, Mark (2018-04-06). "Monte Carlo simulation for uncertainty estimation on structural data in implicit 3-D geological modeling, a guide for disturbance distribution selection and parameterization". Solid Earth. 9 (2): 385–402. Bibcode:2018SolE....9..385P. doi: 10.5194/se-9-385-2018 . ISSN 1869-9510.
↑ A., Wood, Andrew T. (1992). Simulation of the Von Mises Fisher distribution. Centre for Mathematics & its Applications, Australian National University. OCLC 221030477.{{cite book}}: CS1 maint: multiple names: authors list (link)
↑ Gosmann, J; Eliasmith, C (2016). "Optimizing Semantic Pointer Representations for Symbol-Like Processing in Spiking Neural Networks". PLOS ONE. 11 (2): e0149928. Bibcode:2016PLoSO..1149928G. doi: 10.1371/journal.pone.0149928 . PMC 4762696 . PMID 26900931.
1 2 Voelker, Aaron R.; Gosmann, Jan; Stewart, Terrence C. "Efficiently sampling vectors and coordinates from the n-sphere and n-ball" (PDF). Centre for Theoretical Neuroscience – Technical Report, 2017. Retrieved 22 April 2023.
↑ Wang, Tongzhou; Isola, Phillip (2020). "Understanding Contrastive Representation Learning through Alignment and Uniformity on the Hypersphere". International Conference on Machine Learning (ICML). arXiv: 2005.10242 .
↑ Pal, Subhadip; Sengupta, Subhajit; Mitra, Riten; Banerjee, Arunava (2020). "Conjugate Priors and Posterior Inference for the Matrix Langevin Distribution on the Stiefel Manifold". Bayesian Analysis. 15 (3): 871–908. doi: 10.1214/19-BA1176 . ISSN 1936-0975.
↑ Chikuse, Yasuko (1 May 2003). "Concentrated matrix Langevin distributions". Journal of Multivariate Analysis. 85 (2): 375–394. doi: 10.1016/S0047-259X(02)00065-9 . ISSN 0047-259X.
↑ Jupp (1979). "Maximum likelihood estimators for the matrix von Mises-Fisher and Bingham distributions". The Annals of Statistics. 7 (3): 599–606. doi: 10.1214/aos/1176344681 .
↑ Downs (1972). "Orientational statistics". Biometrika. 59 (3): 665–676. doi:10.1093/biomet/59.3.665.
↑ Saw, John G (1978). "A family of distributions on the m-sphere and some hypothesis tests". Biometrika. 65 (`): 69–73. doi:10.2307/2335278. JSTOR 2335278.