Sub-Gaussian distribution

Last updated

In probability theory, a sub-Gaussian distribution, the distribution of a sub-Gaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a sub-Gaussian distribution are dominated by (i.e. decay at least as fast as) the tails of a Gaussian. This property gives sub-Gaussian distributions their name.

Contents

Formally, the probability distribution of a random variable is called sub-Gaussian if there is a positive constant C such that for every ,

.

Alternatively, a random variable is considered sub-Gaussian if its distribution function is upper bounded (up to a constant) by the distribution function of a Gaussian. Specifically, we say that is sub-Gaussian if for all we have that:

where is constant and is a mean zero Gaussian random variable. [1] :Theorem 2.6

Early works [2] [3] considered the following definition of the sub-Gaussian property based on the moment-generating function of : for some s > 0 and for all :

Definitions

The sub-Gaussian norm of , denoted as , is defined by

which is the Orlicz norm of generated by the Orlicz function By condition below, sub-Gaussian random variables can be characterized as those random variables with finite sub-Gaussian norm.

Following the definition based on the moment-generating function, any such that for all is called a variance proxy, and the smallest such is called the optimal variance proxy and denoted by . A variance proxy is always lower bounded by the variance of , and a random variable satisfying is called strictly sub-Gaussian.

Sub-Gaussian properties

Let be a random variable. The following conditions are equivalent:

  1. for all , where is a positive constant;
  2. , where is a positive constant;
  3. for all , where is a positive constant.

Proof. By the layer cake representation,

After a change of variables , we find that

Using the Taylor series for :

we obtain that

which is less than or equal to for . Take , then


By Markov's inequality,

More equivalent definitions

The following properties are equivalent [4] :

Examples

A standard normal random variable is a sub-Gaussian random variable.

The optimal variance proxy is known for many standard probability distributions, including the beta, Bernoulli, Dirichlet [5] , Kumaraswamy, triangular [6] , truncated Gaussian, and truncated exponential [7] .

Let be a random variable with symmetric Bernoulli distribution (or Rademacher distribution). That is, takes values and with probabilities each. Since , it follows that

and hence is a sub-Gaussian random variable.

Maximum of Sub-Gaussian Random Variables

Consider a finite collection of subgaussian random variables, X1, ..., Xn, with corresponding subgaussian parameters . The random variable Mn = max(X1, ..., Xn) represents the maximum of this collection. The expectation of can be bounded above by . Note that no independence assumption is needed to form this bound. [1]

See also

Notes

  1. 1 2 Wainwright MJ. High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge: Cambridge University Press; 2019. doi:10.1017/9781108627771, ISBN   9781108627771.
  2. Kahane, J.P. (1960). "Propriétés locales des fonctions à séries de Fourier aléatoires". Studia Mathematica. 19: 1–25. doi:10.4064/sm-19-1-1-25.
  3. Buldygin, V.V.; Kozachenko, Yu.V. (1980). "Sub-Gaussian random variables". Ukrainian Mathematical Journal. 32 (6): 483–489. doi:10.1007/BF01087176.
  4. Vershynin, R. (2018). High-dimensional probability: An introduction with applications in data science. Cambridge: Cambridge University Press. pp. 33–34.
  5. Olivier Marchal and Julyan Arbel. On the sub-Gaussianity of the Beta and Dirichlet distributions. Electronic Communications in Probability, 22:1--14, 2017, doi:10.1214/17-ECP92.
  6. Julyan Arbel, Olivier Marchal, and Hien D Nguyen. On strict sub-Gaussianity, optimal proxy variance and symmetry for bounded random variables. ESAIM: Probability & Statistics, 24:39--55, 2020, doi:10.1051/ps/2019018.
  7. Mathias Barreto, Olivier Marchal, and Julyan Arbel. Optimal sub-Gaussian variance proxy for truncated Gaussian and exponential random variables, 2024, doi:10.48550/arXiv.2403.08628.

Related Research Articles

<span class="mw-page-title-main">Cumulative distribution function</span> Probability that random variable X is less than or equal to x

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

In probability theory and statistics, the noncentral F-distribution is a continuous probability distribution that is a noncentral generalization of the (ordinary) F-distribution. It describes the distribution of the quotient (X/n1)/(Y/n2), where the numerator X has a noncentral chi-squared distribution with n1 degrees of freedom and the denominator Y has a central chi-squared distribution with n2 degrees of freedom. It is also required that X and Y are statistically independent of each other.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

Differential entropy is a concept in information theory that began as an attempt by Claude Shannon to extend the idea of (Shannon) entropy, a measure of average (surprisal) of a random variable, to continuous probability distributions. Unfortunately, Shannon did not derive this formula, and rather just assumed it was the correct continuous analogue of discrete entropy, but it is not. The actual continuous version of discrete entropy is the limiting density of discrete points (LDDP). Differential entropy is commonly encountered in the literature, but it is a limiting case of the LDDP, and one that loses its fundamental association with discrete entropy.

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

In mathematics, Doob's martingale inequality, also known as Kolmogorov’s submartingale inequality is a result in the study of stochastic processes. It gives a bound on the probability that a submartingale exceeds any given value over a given interval of time. As the name suggests, the result is usually given in the case that the process is a martingale, but the result is also valid for submartingales.

In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let X1, ..., Xn be independent Bernoulli random variables taking values +1 and −1 with probability 1/2, then for every positive ,

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

Financial models with long-tailed distributions and volatility clustering have been introduced to overcome problems with the realism of classical financial models. These classical models of financial time series typically assume homoskedasticity and normality cannot explain stylized phenomena such as skewness, heavy tails, and volatility clustering of the empirical asset returns in finance. In 1963, Benoit Mandelbrot first used the stable distribution to model the empirical distributions which have the skewness and heavy-tail property. Since -stable distributions have infinite -th moments for all , the tempered stable processes have been proposed for overcoming this limitation of the stable distribution.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t:

<i>q</i>-Weibull distribution

In statistics, the q-Weibull distribution is a probability distribution that generalizes the Weibull distribution and the Lomax distribution. It is one example of a Tsallis distribution.

In statistics, the complex Wishart distribution is a complex version of the Wishart distribution. It is the distribution of times the sample Hermitian covariance matrix of zero-mean independent Gaussian random variables. It has support for Hermitian positive definite matrices.

References