Central limit theorem for directional statistics

Last updated August 20, 2022

In probability theory, the central limit theorem states conditions under which the average of a sufficiently large number of independent random variables, each with finite mean and variance, will be approximately normally distributed.^[1]

Directional statistics is the subdiscipline of statistics that deals with directions (unit vectors in Rⁿ), axes (lines through the origin in Rⁿ) or rotations in Rⁿ. The means and variances of directional quantities are all finite, so that the central limit theorem may be applied to the particular case of directional statistics.^[2]

This article will deal only with unit vectors in 2-dimensional space (R²) but the method described can be extended to the general case.

The central limit theorem

A sample of angles $\theta _{i}$ are measured, and since they are indefinite to within a factor of $2\pi$ , the complex definite quantity $z_{i}=e^{i\theta _{i}}=\cos(\theta _{i})+i\sin(\theta _{i})$ is used as the random variate. The probability distribution from which the sample is drawn may be characterized by its moments, which may be expressed in Cartesian and polar form:

m_{n}=E(z^{n})=C_{n}+iS_{n}=R_{n}e^{i\theta _{n}}\,

It follows that:

C_{n}=E(\cos(n\theta ))\,

S_{n}=E(\sin(n\theta ))\,

R_{n}=|E(z^{n})|={\sqrt {C_{n}^{2}+S_{n}^{2}}}\,

\theta _{n}=\arg(E(z^{n}))\,

Sample moments for N trials are:

{\overline {m_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}z_{i}^{n}={\overline {C_{n}}}+i{\overline {S_{n}}}={\overline {R_{n}}}e^{i{\overline {\theta _{n}}}}

where

{\overline {C_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\cos(n\theta _{i})

{\overline {S_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\sin(n\theta _{i})

{\overline {R_{n}}}={\frac {1}{N}}\sum _{i=1}^{N}|z_{i}^{n}|

{\overline {\theta _{n}}}={\frac {1}{N}}\sum _{i=1}^{N}\arg(z_{i}^{n})

The vector [ ${\overline {C_{1}}},{\overline {S_{1}}}$ ] may be used as a representation of the sample mean $({\overline {m_{1}}})$ and may be taken as a 2-dimensional random variate.^[2] The bivariate central limit theorem states that the joint probability distribution for ${\overline {C_{1}}}$ and ${\overline {S_{1}}}$ in the limit of a large number of samples is given by:

[{\overline {C_{1}}},{\overline {S_{1}}}]{\xrightarrow {d}}{\mathcal {N}}([C_{1},S_{1}],\Sigma /N)

where ${\mathcal {N}}()$ is the bivariate normal distribution and $\Sigma$ is the covariance matrix for the circular distribution:

\Sigma ={\begin{bmatrix}\sigma _{CC}&\sigma _{CS}\\\sigma _{SC}&\sigma _{SS}\end{bmatrix}}\quad

\sigma _{CC}=E(\cos ^{2}\theta )-E(\cos \theta )^{2}\,

\sigma _{CS}=\sigma _{SC}=E(\cos \theta \sin \theta )-E(\cos \theta )E(\sin \theta )\,

\sigma _{SS}=E(\sin ^{2}\theta )-E(\sin \theta )^{2}\,

Note that the bivariate normal distribution is defined over the entire plane, while the mean is confined to be in the unit ball (on or inside the unit circle). This means that the integral of the limiting (bivariate normal) distribution over the unit ball will not be equal to unity, but rather approach unity as N approaches infinity.

It is desired to state the limiting bivariate distribution in terms of the moments of the distribution.

Covariance matrix in terms of moments

Using multiple angle trigonometric identities ^[2]

C_{2}=E(\cos(2\theta ))=E(\cos ^{2}\theta -1)=E(1-\sin ^{2}\theta )\,

S_{2}=E(\sin(2\theta ))=E(2\cos \theta \sin \theta )\,

It follows that:

\sigma _{CC}=E(\cos ^{2}\theta )-E(\cos \theta )^{2}={\frac {1}{2}}\left(1+C_{2}-2C_{1}^{2}\right)

\sigma _{CS}=E(\cos \theta \sin \theta )-E(\cos \theta )E(\sin \theta )={\frac {1}{2}}\left(S_{2}-2C_{1}S_{1}\right)

\sigma _{SS}=E(\sin ^{2}\theta )-E(\sin \theta )^{2}={\frac {1}{2}}\left(1-C_{2}-2S_{1}^{2}\right)

The covariance matrix is now expressed in terms of the moments of the circular distribution.

The central limit theorem may also be expressed in terms of the polar components of the mean. If $P({\overline {C_{1}}},{\overline {S_{1}}})d{\overline {C_{1}}}d{\overline {S_{1}}}$ is the probability of finding the mean in area element $d{\overline {C_{1}}}d{\overline {S_{1}}}$ , then that probability may also be written $P({\overline {R_{1}}}\cos({\overline {\theta _{1}}}),{\overline {R_{1}}}\sin({\overline {\theta _{1}}})){\overline {R_{1}}}d{\overline {R_{1}}}d{\overline {\theta _{1}}}$ .

Related Research Articles

In physics, the cross section is a measure of the probability that a specific process will take place when some kind of radiant excitation intersects a localized phenomenon. For example, the Rutherford cross-section is a measure of probability that an alpha particle will be deflected by a given angle during an interaction with an atomic nucleus. Cross section is typically denoted $σ$ (sigma) and is expressed in units of area, more specifically in barns. In a way, it can be thought of as the size of the object that the excitation must hit in order for the process to occur, but more exactly, it is a parameter of a stochastic process.

Kinetic theory of gases Historical physical model of gases

The kinetic theory of gases is a simple, historically significant classical model of the thermodynamic behavior of gases, with which many principal concepts of thermodynamics were established. The model describes a gas as a large number of identical submicroscopic particles, all of which are in constant, rapid, random motion. Their size is assumed to be much smaller than the average distance between the particles. The particles undergo random elastic collisions between themselves and with the enclosing walls of the container. The basic version of the model describes the ideal gas, and considers no other interactions between the particles.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

In mechanics and geometry, the 3D rotation group, often denoted SO(3), is the group of all rotations about the origin of three-dimensional Euclidean space $under the operation of composition.$

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the base form

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rⁿ. More generally, directional statistics deals with observations on compact Riemannian manifolds including the Stiefel manifold.

In probability and statistics, a circular distribution or polar distribution is a probability distribution of a random variable whose values are angles, usually taken to be in the range [0, 2π). A circular distribution is often a continuous probability distribution, and hence has a probability density, but such distributions can also be discrete, in which case they are called circular lattice distributions. Circular distributions can be used even when the variables concerned are not explicitly angles: the main consideration is that there is not usually any real distinction between events occurring at the lower or upper end of the range, and the division of the range could notionally be made at any point.

In Euclidean geometry, Ptolemy's theorem is a relation between the four sides and two diagonals of a cyclic quadrilateral. The theorem is named after the Greek astronomer and mathematician Ptolemy. Ptolemy used the theorem as an aid to creating his table of chords, a trigonometric table that he applied to astronomy.

Etendue or étendue is a property of light in an optical system, which characterizes how "spread out" the light is in area and angle. It corresponds to the beam parameter product (BPP) in Gaussian beam optics. Other names for etendue include acceptance, throughput, light grasp, light-gathering power, optical extent, and the AΩ product. Throughput and AΩ product are especially used in radiometry and radiative transfer where it is related to the view factor. It is a central concept in nonimaging optics.

In probability theory and directional statistics, the von Mises distribution is a continuous probability distribution on the circle. It is a close approximation to the wrapped normal distribution, which is the circular analogue of the normal distribution. A freely diffusing angle $on a circle is a wrapped normally distributed random variable with an unwrapped variance that grows linearly in time. On the other hand, the von Mises distribution is the stationary distribution of a drift and diffusion process on the circle in a harmonic potential, i.e. with a preferred orientation. The von Mises distribution is the maximum entropy distribution for circular data when the real and imaginary parts of the first circular moment are specified. The von Mises distribution is a special case of the von Mises-Fisher distribution on the N -dimensional sphere.$

In probability theory, the Rice distribution or Rician distribution is the probability distribution of the magnitude of a circularly-symmetric bivariate normal random variable, possibly with non-zero mean (noncentral). It was named after Stephen O. Rice (1907–1986).

In statistics, a pivotal quantity or pivot is a function of observations and unobservable parameters such that the function's probability distribution does not depend on the unknown parameters. A pivot quantity need not be a statistic—the function and its value can depend on the parameters of the model, but its distribution must not. If it is a statistic, then it is known as an ancillary statistic.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In mathematics and statistics, a circular mean or angular mean is a mean designed for angles and similar cyclic quantities, such as daytimes, and fractional parts of real numbers. This is necessary since most of the usual means may not be appropriate on angle-like quantities. For example, the arithmetic mean of 0° and 360° is 180°, which is misleading because 360° equals 0° modulo a full cycle. As another example, the "average time" between 11 PM and 1 AM is either midnight or noon, depending on whether the two times are part of a single night or part of a single calendar day. The circular mean is one of the simplest examples of circular statistics and of statistics of non-Euclidean spaces.

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.

In probability theory and directional statistics, a circular uniform distribution is a probability distribution on the unit circle whose density is uniform for all angles.

In statistics, a multivariate Pareto distribution is a multivariate extension of a univariate Pareto distribution.

In physics, and especially scattering theory, the momentum-transfer cross section is an effective scattering cross section useful for describing the average momentum transferred from a particle when it collides with a target. Essentially, it contains all the information about a scattering process necessary for calculating average momentum transfers but ignores other details about the scattering angle.

References

↑ Rice, John A. (1995). Mathematical Statistics and Data Analysis (2nd ed.). Duxbury Press.
1 2 3 Jammalamadaka, S. Rao; SenGupta, A. (2001). Topics in circular statistics. New Jersey: World Scientific. ISBN 978-981-02-3778-3 . Retrieved 2011-05-15.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Rice, John A. (1995). Mathematical Statistics and Data Analysis (2nd ed.). Duxbury Press.

[SRJ-2] 1 2 3 Jammalamadaka, S. Rao; SenGupta, A. (2001). Topics in circular statistics. New Jersey: World Scientific. ISBN 978-981-02-3778-3 . Retrieved 2011-05-15.

[1]

[2]

Central limit theorem for directional statistics

Contents

The central limit theorem

Covariance matrix in terms of moments

Related Research Articles

References