Multidimensional Chebyshev's inequality

Last updated January 25, 2025

In probability theory, the multidimensional Chebyshev's inequality^[1] is a generalization of Chebyshev's inequality, which puts a bound on the probability of the event that a random variable differs from its expected value by more than a specified amount.

Proof

Since $V$ is positive-definite, so is $V^{-1}$ . Define the random variable

y=(X-\mu )^{T}V^{-1}(X-\mu ).

Since $y$ is positive, Markov's inequality holds:

\Pr \left({\sqrt {(X-\mu )^{T}V^{-1}(X-\mu )}}>t\right)=\Pr({\sqrt {y}}>t)=\Pr(y>t^{2})\leq {\frac {\operatorname {E} [y]}{t^{2}}}.

Finally,

{\begin{aligned}\operatorname {E} [y]&=\operatorname {E} [(X-\mu )^{T}V^{-1}(X-\mu )]\\[6pt]&=\operatorname {E} [\operatorname {trace} (V^{-1}(X-\mu )(X-\mu )^{T})]\\[6pt]&=\operatorname {trace} (V^{-1}V)=N\end{aligned}}.

^[1]^[2]

Infinite dimensions

There is a straightforward extension of the vector version of Chebyshev's inequality to infinite dimensional settings^{[more refs. needed]}.^[3] Let $X$ be a random variable which takes values in a Fréchet space ${\mathcal {X}}$ (equipped with seminorms $|| \cdot || α$ ). This includes most common settings of vector-valued random variables, e.g., when ${\mathcal {X}}$ is a Banach space (equipped with a single norm), a Hilbert space, or the finite-dimensional setting as described above.

Suppose that $X$ is of "strong order two", meaning that

\operatorname {E} \left(\|X\|_{\alpha }^{2}\right)<\infty

for every seminorm $|| \cdot || α$ . This is a generalization of the requirement that $X$ have finite variance, and is necessary for this strong form of Chebyshev's inequality in infinite dimensions. The terminology "strong order two" is due to Vakhania.^[4]

Let $\mu \in {\mathcal {X}}$ be the Pettis integral of $X$ (i.e., the vector generalization of the mean), and let

\sigma _{a}:={\sqrt {\operatorname {E} \|X-\mu \|_{\alpha }^{2}}}

be the standard deviation with respect to the seminorm $|| \cdot || α$ . In this setting we can state the following:

General version of Chebyshev's inequality.

\forall k>0:\quad \Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)\leq {\frac {1}{k^{2}}}.

Proof. The proof is straightforward, and essentially the same as the finitary version^{[source needed]}. If $σ α = 0$ , then $X$ is constant (and equal to $μ$ ) almost surely, so the inequality is trivial.

If

\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }

then $|| X - μ || α > 0$ , so we may safely divide by $|| X - μ || α$ . The crucial trick in Chebyshev's inequality is to recognize that $1={\tfrac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}$ .

The following calculations complete the proof:

{\begin{aligned}\Pr \left(\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }\right)&=\int _{\Omega }\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\&=\int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{\|X-\mu \|_{\alpha }^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq \int _{\Omega }\left({\frac {\|X-\mu \|_{\alpha }^{2}}{(k\sigma _{\alpha })^{2}}}\right)\cdot \mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\,\mathrm {d} \Pr \\[6pt]&\leq {\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\int _{\Omega }\|X-\mu \|_{\alpha }^{2}\,\mathrm {d} \Pr &&\mathbf {1} _{\|X-\mu \|_{\alpha }\geq k\sigma _{\alpha }}\leq 1\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\operatorname {E} \|X-\mu \|_{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}\sigma _{\alpha }^{2}}}\left(\sigma _{\alpha }^{2}\right)\\[6pt]&={\frac {1}{k^{2}}}\end{aligned}}

Related Research Articles

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices that are traceless, Hermitian, involutory and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries.

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not. Standard deviation may be abbreviated SD or std dev, and is most commonly represented in mathematical texts and equations by the lowercase Greek letter σ (sigma), for the population standard deviation, or the Latin letter s, for the sample standard deviation.

<span class="mw-page-title-main">Central limit theorem</span> Fundamental theorem in probability theory and statistics

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

In mathematics, the $L p$ spaces are function spaces defined using a natural generalization of the $p$ -norm for finite-dimensional vector spaces. They are sometimes called Lebesgue spaces, named after Henri Lebesgue, although according to the Bourbaki group they were first introduced by Frigyes Riesz.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of $L p$ spaces.

In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.

In mathematics, the moments of a function are certain quantitative measures related to the shape of the function's graph. If the function represents mass density, then the zeroth moment is the total mass, the first moment is the center of mass, and the second moment is the moment of inertia. If the function is a probability distribution, then the first moment is the expected value, the second central moment is the variance, the third standardized moment is the skewness, and the fourth standardized moment is the kurtosis.

In probability theory, the Vysochanskij–Petunin inequality gives a lower bound for the probability that a random variable with finite variance lies within a certain number of standard deviations of the variable's mean, or equivalently an upper bound for the probability that it lies further away. The sole restrictions on the distribution are that it be unimodal and have finite variance; here unimodal implies that it is a continuous probability distribution except at the mode, which may have a non-zero probability.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

In statistics, the multivariate t-distribution is a multivariate probability distribution. It is a generalization to random vectors of the Student's t-distribution, which is a distribution applicable to univariate random variables. While the case of a random matrix could be treated within this structure, the matrix t-distribution is distinct and makes particular use of the matrix structure.

Expected shortfall (ES) is a risk measure—a concept used in the field of financial risk measurement to evaluate the market risk or credit risk of a portfolio. The "expected shortfall at q% level" is the expected return on the portfolio in the worst $of cases. ES is an alternative to value at risk that is more sensitive to the shape of the tail of the loss distribution.$

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In financial mathematics, tail value at risk (TVaR), also known as tail conditional expectation (TCE) or conditional tail expectation (CTE), is a risk measure associated with the more general value at risk. It quantifies the expected value of the loss given that an event outside a given probability level has occurred.

<span class="mw-page-title-main">Normal-inverse-gamma distribution</span>

In probability theory and statistics, the normal-inverse-gamma distribution is a four-parameter family of multivariate continuous probability distributions. It is the conjugate prior of a normal distribution with unknown mean and variance.

In probability theory, Gauss's inequality gives an upper bound on the probability that a unimodal random variable lies more than any given distance from its mode.

<span class="mw-page-title-main">Logit-normal distribution</span> Probability distribution

In probability theory, a logit-normal distribution is a probability distribution of a random variable whose logit has a normal distribution. If Y is a random variable with a normal distribution, and t is the standard logistic function, then X = t(Y) has a logit-normal distribution; likewise, if X is logit-normally distributed, then Y = logit(X)= log (X/(1-X)) is normally distributed. It is also known as the logistic normal distribution, which often refers to a multinomial logit version (e.g.).

In statistical mechanics, the Griffiths inequality, sometimes also called Griffiths–Kelly–Sherman inequality or GKS inequality, named after Robert B. Griffiths, is a correlation inequality for ferromagnetic spin systems. Informally, it says that in ferromagnetic spin systems, if the 'a-priori distribution' of the spin is invariant under spin flipping, the correlation of any monomial of the spins is non-negative; and the two point correlation of two monomial of the spins is non-negative.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

In statistics and probability theory, the nonparametric skew is a statistic occasionally used with random variables that take real values. It is a measure of the skewness of a random variable's distribution—that is, the distribution's tendency to "lean" to one side or the other of the mean. Its calculation does not require any knowledge of the form of the underlying distribution—hence the name nonparametric. It has some desirable properties: it is zero for any symmetric distribution; it is unaffected by a scale shift; and it reveals either left- or right-skewness equally well. In some statistical samples it has been shown to be less powerful than the usual measures of skewness in detecting departures of the population from normality.

References

1 2 Marshall, Albert W.; Olkin, Ingram (December 1960). "Multivariate Chebyshev Inequalities". The Annals of Mathematical Statistics. 31 (4): 1001–1014. doi:10.1214/aoms/1177705673. ISSN 0003-4851.
↑ Navarro, Jorge (2013-05-24). "A simple proof for the multivariate Chebyshev inequality". arXiv: 1305.5646 [math.ST].
↑ Altomare, Francesco; Campiti, Michele (1994). De Gruyter (ed.). Korovkin-type Approximation Theory and Its Applications. p. 313. doi:10.1515/9783110884586. ISBN 978-3-11-014178-8.
↑ Vakhania, Nikolai Nikolaevich. Probability distributions on linear spaces. New York: North Holland, 1981.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[awm-1] 1 2 Marshall, Albert W.; Olkin, Ingram (December 1960). "Multivariate Chebyshev Inequalities". The Annals of Mathematical Statistics. 31 (4): 1001–1014. doi:10.1214/aoms/1177705673. ISSN 0003-4851.

[2] Navarro, Jorge (2013-05-24). "A simple proof for the multivariate Chebyshev inequality". arXiv: 1305.5646 [math.ST].

[3] Altomare, Francesco; Campiti, Michele (1994). De Gruyter (ed.). Korovkin-type Approximation Theory and Its Applications. p. 313. doi:10.1515/9783110884586. ISBN 978-3-11-014178-8.

[4] Vakhania, Nikolai Nikolaevich. Probability distributions on linear spaces. New York: North Holland, 1981.

[1]

[2]

[3]

[4]

Multidimensional Chebyshev's inequality

Contents

Proof

Infinite dimensions

Related Research Articles

References