Rademacher distribution

Rademacher
Support
PMF
CDF
Mean
Median
Mode	N/A
Variance
Skewness
Excess kurtosis
Entropy
MGF
CF

Last updated June 24, 2024

In probability theory and statistics, the Rademacher distribution (which is named after Hans Rademacher) is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.^[1]

Mathematical formulation

The probability mass function of this distribution is

f(k)=\left\{{\begin{matrix}1/2&{\mbox{if }}k=-1,\\1/2&{\mbox{if }}k=+1,\\0&{\mbox{otherwise.}}\end{matrix}}\right.

In terms of the Dirac delta function, as

f(k)={\frac {1}{2}}\left(\delta \left(k-1\right)+\delta \left(k+1\right)\right).

Bounds on sums of independent Rademacher variables

There are various results in probability theory around analyzing the sum of i.i.d. Rademacher variables, including concentration inequalities such as Bernstein inequalities as well as anti-concentration inequalities like Tomaszewski's conjecture.

Concentration inequalities

Let {x_i} be a set of random variables with a Rademacher distribution. Let {a_i} be a sequence of real numbers. Then

\Pr \left(\sum _{i}x_{i}a_{i}>t||a||_{2}\right)\leq e^{-{\frac {t^{2}}{2}}}

where ||a||₂ is the Euclidean norm of the sequence {a_i}, t > 0 is a real number and Pr(Z) is the probability of event Z.^[2]

Let Y = Σ x_ia_i and let Y be an almost surely convergent series in a Banach space. The for t > 0 and s ≥ 1 we have^[3]

\Pr \left(||Y||>st\right)\leq \left[{\frac {1}{c}}\Pr(||Y||>t)\right]^{cs^{2}}

for some constant c.

Let p be a positive real number. Then the Khintchine inequality says that^[4]

c_{1}\left[\sum {\left|a_{i}\right|^{2}}\right]^{\frac {1}{2}}\leq \left(E\left[\left|\sum {a_{i}x_{i}}\right|^{p}\right]\right)^{\frac {1}{p}}\leq c_{2}\left[\sum {\left|a_{i}\right|^{2}}\right]^{\frac {1}{2}}

where c₁ and c₂ are constants dependent only on p.

For p ≥ 1, $c_{2}\leq c_{1}{\sqrt {p}}.$

Tomaszewski’s conjecture

In 1986, Bogusław Tomaszewski proposed a question about the distribution of the sum of independent Rademacher variables. A series of works on this question^[5]^[6] culminated into a proof in 2020 by Nathan Keller and Ohad Klein of the following conjecture.^[7]

Conjecture. Let $X=\sum _{i=1}^{n}a_{i}X_{i}$ , where $a_{1}^{2}+\cdots +a_{n}^{2}=1$ and the $X_{i}$ 's are independent Rademacher variables. Then

\Pr[|X|\leq 1]\geq 1/2.

For example, when $a_{1}=a_{2}=\cdots =a_{n}=1/{\sqrt {n}}$ , one gets the following bound, first shown by Van Zuijlen.^[8]

\Pr \left(\left|{\frac {\sum _{i=1}^{n}X_{i}}{\sqrt {n}}}\right|\leq 1\right)\geq 0.5.

The bound is sharp and better than that which can be derived from the normal distribution (approximately Pr > 0.31).

Applications

The Rademacher distribution has been used in bootstrapping.

The Rademacher distribution can be used to show that normally distributed and uncorrelated does not imply independent.

Random vectors with components sampled independently from the Rademacher distribution are useful for various stochastic approximations, for example:

The Hutchinson trace estimator,^[9] which can be used to efficiently approximate the trace of a matrix of which the elements are not directly accessible, but rather implicitly defined via matrix-vector products.
SPSA, a computationally cheap, derivative-free, stochastic gradient approximation, useful for numerical optimization.

Rademacher random variables are used in the Symmetrization Inequality.

Related distributions

Bernoulli distribution: If X has a Rademacher distribution, then ${\frac {X+1}{2}}$ has a Bernoulli(1/2) distribution.
Laplace distribution: If X has a Rademacher distribution and Y ~ Exp(λ) is independent from X, then XY ~ Laplace(0, 1/λ).

Related Research Articles

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable $, or just distribution function of, evaluated at, is the probability that will take a value less than or equal to .$

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbers is spread out from their average value. It is the second central moment of a distribution, and the covariance of the random variable with itself, and it is often represented by $,,,, or .$

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

<span class="mw-page-title-main">Hypergeometric distribution</span> Discrete probability distribution

In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the probability of $successes in draws, without replacement, from a finite population of size that contains exactly objects with that feature, wherein each draw is either a success or a failure. In contrast, the binomial distribution describes the probability of successes in draws with replacement.$

In probability theory, the central limit theorem states that, under certain circumstances, the probability distribution of the scaled mean of a random sample converges to a normal distribution as the sample size increases to infinity. Under stronger assumptions, the Berry–Esseen theorem, or Berry–Esseen inequality, gives a more quantitative result, because it also specifies the rate at which this convergence takes place by giving a bound on the maximal error of approximation between the normal distribution and the true distribution of the scaled sample mean. The approximation is measured by the Kolmogorov–Smirnov distance. In the case of independent samples, the convergence rate is $n -1/2$ , where $n$ is the sample size, and the constant is estimated in terms of the third absolute normalized moment.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

Students of statistics and probability theory sometimes develop misconceptions about the normal distribution, ideas that may seem plausible but are mathematically untrue. For example, it is sometimes mistakenly thought that two linearly uncorrelated, normally distributed random variables must be statistically independent. However, this is untrue, as can be demonstrated by counterexample. Likewise, it is sometimes mistakenly thought that a linear combination of normally distributed random variables will itself be normally distributed, but again, counterexamples prove this wrong.

In probability theory, Kolmogorov's inequality is a so-called "maximal inequality" that gives a bound on the probability that the partial sums of a finite collection of independent random variables exceed some specified bound.

In the theory of probability and statistics, the Dvoretzky–Kiefer–Wolfowitz–Massart inequality provides a bound on the worst case distance of an empirically determined distribution function from its associated population distribution function. It is named after Aryeh Dvoretzky, Jack Kiefer, and Jacob Wolfowitz, who in 1956 proved the inequality

In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let X₁, ..., X_n be independent Bernoulli random variables taking values +1 and −1 with probability 1/2, then for every positive $,$

In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive. More generally, the "moment method" consists of bounding the probability that a random variable fluctuates far from its mean, by using its moments.

In probability theory, the multidimensional Chebyshev's inequality is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified amount.

In probability theory and statistics, the Poisson binomial distribution is the discrete probability distribution of a sum of independent Bernoulli trials that are not necessarily identically distributed. The concept is named after Siméon Denis Poisson.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

In probability theory, Eaton's inequality is a bound on the largest values of a linear combination of bounded random variables. This inequality was described in 1974 by Morris L. Eaton.

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.

References

↑ Hitczenko, P.; Kwapień, S. (1994). "On the Rademacher series". Probability in Banach Spaces. Progress in probability. Vol. 35. pp. 31–36. doi:10.1007/978-1-4612-0253-0_2. ISBN 978-1-4612-6682-2.
↑ Montgomery-Smith, S. J. (1990). "The distribution of Rademacher sums". Proc Amer Math Soc. 109 (2): 517–522. doi: 10.1090/S0002-9939-1990-1013975-0 .
↑ Dilworth, S. J.; Montgomery-Smith, S. J. (1993). "The distribution of vector-valued Radmacher series". Ann Probab. 21 (4): 2046–2052. arXiv: math/9206201 . doi:10.1214/aop/1176989010. JSTOR 2244710. S2CID 15159626.
↑ Khintchine, A. (1923). "Über dyadische Brüche". Math. Z. 18 (1): 109–116. doi:10.1007/BF01192399. S2CID 119840766.
↑ Holzman, Ron; Kleitman, Daniel J. (1992-09-01). "On the product of sign vectors and unit vectors". Combinatorica. 12 (3): 303–316. doi:10.1007/BF01285819. ISSN 1439-6912. S2CID 20281665.
↑ Boppana, Ravi B.; Holzman, Ron (2017-08-31). "Tomaszewski's Problem on Randomly Signed Sums: Breaking the 3/8 Barrier". arXiv: 1704.00350 [math.CO].
↑ Keller, Nathan; Klein, Ohad (2021-08-03). "Proof of Tomaszewski's Conjecture on Randomly Signed Sums". arXiv: 2006.16834 [math.CO].
↑ van Zuijlen, Martien C. A. (2011). "On a conjecture concerning the sum of independent Rademacher random variables". arXiv: 1112.4988 [math.PR].
↑ Avron, H.; Toledo, S. (2011). "Randomized algorithms for estimating the trace of an implicit symmetric positive semidefinite matrix". Journal of the ACM. 58 (2): 8. CiteSeerX 10.1.1.380.9436 . doi:10.1145/1944345.1944349. S2CID 5827717.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Hitczenko1994-1] Hitczenko, P.; Kwapień, S. (1994). "On the Rademacher series". Probability in Banach Spaces. Progress in probability. Vol. 35. pp. 31–36. doi:10.1007/978-1-4612-0253-0_2. ISBN 978-1-4612-6682-2.

[MontgomerySmith1990-2] Montgomery-Smith, S. J. (1990). "The distribution of Rademacher sums". Proc Amer Math Soc. 109 (2): 517–522. doi: 10.1090/S0002-9939-1990-1013975-0 .

[Dilworth1993-3] Dilworth, S. J.; Montgomery-Smith, S. J. (1993). "The distribution of vector-valued Radmacher series". Ann Probab. 21 (4): 2046–2052. arXiv: math/9206201 . doi:10.1214/aop/1176989010. JSTOR 2244710. S2CID 15159626.

[Khintchine1923-4] Khintchine, A. (1923). "Über dyadische Brüche". Math. Z. 18 (1): 109–116. doi:10.1007/BF01192399. S2CID 119840766.

[5] Holzman, Ron; Kleitman, Daniel J. (1992-09-01). "On the product of sign vectors and unit vectors". Combinatorica. 12 (3): 303–316. doi:10.1007/BF01285819. ISSN 1439-6912. S2CID 20281665.

[6] Boppana, Ravi B.; Holzman, Ron (2017-08-31). "Tomaszewski's Problem on Randomly Signed Sums: Breaking the 3/8 Barrier". arXiv: 1704.00350 [math.CO].

[7] Keller, Nathan; Klein, Ohad (2021-08-03). "Proof of Tomaszewski's Conjecture on Randomly Signed Sums". arXiv: 2006.16834 [math.CO].

[vanZuijlen2011-8] van Zuijlen, Martien C. A. (2011). "On a conjecture concerning the sum of independent Rademacher random variables". arXiv: 1112.4988 [math.PR].

[9] Avron, H.; Toledo, S. (2011). "Randomized algorithms for estimating the trace of an implicit symmetric positive semidefinite matrix". Journal of the ACM. 58 (2): 8. CiteSeerX 10.1.1.380.9436 . doi:10.1145/1944345.1944349. S2CID 5827717.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Support	$k\in \{-1,1\}\,$
PMF	$f(k)=\left\{{\begin{matrix}1/2&{\mbox{if }}k=-1,\\1/2&{\mbox{if }}k=+1,\\0&{\mbox{otherwise.}}\end{matrix}}\right.$
CDF	$F(k)={\begin{cases}0,&k<-1\\1/2,&-1\leq k<1\\1,&k\geq 1\end{cases}}$
Mean	$0\,$
Median	$0\,$
Mode	N/A
Variance	$1\,$
Skewness	$0\,$
Excess kurtosis	$-2\,$
Entropy	$\ln(2)\,$
MGF	$\cosh(t)\,$
CF	$\cos(t)\,$