Bennett's inequality

Last updated May 03, 2024

In probability theory, Bennett's inequality provides an upper bound on the probability that the sum of independent random variables deviates from its expected value by more than any specified amount. Bennett's inequality was proved by George Bennett of the University of New South Wales in 1962.^[1]

Statement

Let $X 1, \dots X n$ be independent random variables with finite variance. Further assume $| X i - E X i | \leq a$ almost surely for all $i$ , and define $S_{n}=\sum _{i=1}^{n}\left[X_{i}-\operatorname {E} (X_{i})\right]$ and $\sigma ^{2}=\sum _{i=1}^{n}\operatorname {E} (X_{i}-\operatorname {E} X_{i})^{2}.$ Then for any $t \geq 0$ ,

\Pr \left(S_{n}>t\right)\leq \exp \left(-{\frac {\sigma ^{2}}{a^{2}}}h\left({\frac {at}{\sigma ^{2}}}\right)\right),

where $h (u) = (1 + u)log(1 + u) - u$ and log denotes the natural logarithm.^[2]^[3]

Generalizations and comparisons to other bounds

For generalizations see Freedman (1975)^[4] and Fan, Grama and Liu (2012)^[5] for a martingale version of Bennett's inequality and its improvement, respectively.

Hoeffding's inequality only assumes the summands are bounded almost surely, while Bennett's inequality offers some improvement when the variances of the summands are small compared to their almost sure bounds. However Hoeffding's inequality entails sub-Gaussian tails, whereas in general Bennett's inequality has Poissonian tails.^{[ citation needed ]}

Bennett's inequality is most similar to the Bernstein inequalities, the first of which also gives concentration in terms of the variance and almost sure bound on the individual terms. Bennett's inequality is stronger than this bound, but more complicated to compute.^[3]

In both inequalities, unlike some other inequalities or limit theorems, there is no requirement that the component variables have identical or similar distributions.^{[ citation needed ]}

Example

Suppose that each $X i$ is an independent binary random variable with probability $p$ . Then Bennett's inequality says that:

\Pr \left(\sum _{i=1}^{n}X_{i}>pn+t\right)\leq \exp \left(-nph\left({\frac {t}{np}}\right)\right).

For $t\geq 10np$ , $h({\frac {t}{np}})\geq {\frac {t}{2np}}\log {\frac {t}{np}},$ so

\Pr \left(\sum _{i=1}^{n}X_{i}>pn+t\right)\leq \left({\frac {t}{np}}\right)^{-t/2}

for $t\geq 10np$ .

By contrast, Hoeffding's inequality gives a bound of $\exp(-2t^{2}/n)$ and the first Bernstein inequality gives a bound of $\exp(-{\frac {t^{2}}{2np+2t/3}})$ . For $t\gg np$ , Hoeffding's inequality gives $\exp(-\Theta (t^{2}/n))$ , Bernstein gives $\exp(-\Theta (t))$ , and Bennett gives $\exp(-\Theta (t\log {\frac {t}{np}}))$ .

Related Research Articles

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

In probability theory, the expected value is a generalization of the weighted average. Informally, the expected value is the arithmetic mean of the possible values a random variable can take, weighted by the probability of those outcomes. Since it is obtained through arithmetic, the expected value sometimes may not even be included in the sample data set; it is not the value you would "expect" to get in reality.

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

In probability theory and statistics, the moment-generating function of a real-valued random variable is an alternative specification of its probability distribution. Thus, it provides the basis of an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the moment-generating functions of distributions defined by the weighted sums of random variables. However, not all random variables have moment-generating functions.

In probability theory, Markov's inequality gives an upper bound on the probability that a non-negative random variable is greater than or equal to some positive constant. Markov's inequality is tight in the sense that for each chosen positive constant, there exists a random variable such that the inequality is in fact an equality.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation; it is a simple corollary that the opposite is true of concave transformations.

In probability theory, the Azuma–Hoeffding inequality gives a concentration result for the values of martingales that have bounded differences.

In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.

In probability theory, Hoeffding's inequality provides an upper bound on the probability that the sum of bounded independent random variables deviates from its expected value by more than a certain amount. Hoeffding's inequality was proven by Wassily Hoeffding in 1963.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In probability theory and statistics, the Rademacher distribution is a discrete probability distribution where a random variate X has a 50% chance of being +1 and a 50% chance of being -1.

In probability theory, Bernstein inequalities give bounds on the probability that the sum of random variables deviates from its mean. In the simplest case, let X₁, ..., X_n be independent Bernoulli random variables taking values +1 and −1 with probability 1/2, then for every positive $,$

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

In mathematics, the second moment method is a technique used in probability theory and analysis to show that a random variable has positive probability of being positive. More generally, the "moment method" consists of bounding the probability that a random variable fluctuates far from its mean, by using its moments.

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In probability theory, concentration inequalities provide mathematical bounds on the probability of a random variable deviating from some value.

For certain applications in linear algebra, it is useful to know properties of the probability distribution of the largest eigenvalue of a finite sum of random matrices. Suppose $is a finite sequence of random matrices. Analogous to the well-known Chernoff bound for sums of scalars, a bound on the following is sought for a given parameter t :$

In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.

In information theory, the Bretagnolle–Huber inequality bounds the total variation distance between two probability distributions $and by a concave and bounded function of the Kullback-Leibler divergence . The bound can be viewed as an alternative to the well-known Pinsker's inequality: when is large, Pinsker's inequality is vacuous, while Bretagnolle-Huber remains bounded and hence non-vacuous. It is used in statistics and machine learning to prove information-theoretic lower bounds relying on hypothesis testing (Bretagnolle-Huber-Carol Inequality is a variation of Concentration inequality for multinomially distributed random variables which bounds the total variation distance.)$

References

↑ Bennett, G. (1962). "Probability Inequalities for the Sum of Independent Random Variables". Journal of the American Statistical Association . 57 (297): 33–45. doi:10.2307/2282438. JSTOR 2282438.
↑ Devroye, Luc; Lugosi, Gábor (2001). Combinatorial methods in density estimation. Springer. p. 11. ISBN 978-0-387-95117-1.
1 2 Boucheron, Stephane; Lugosi, Gabor; Massart, Pascal (2013). Concentration inequalities, a nonasymptotic theory of independence. Oxford University Press. ISBN 978-0-19-953525-5.
↑ Freedman, D. A. (1975). "On tail probabilities for martingales". The Annals of Probability . 3 (1): 100–118. doi: 10.1214/aop/1176996452 . JSTOR 2959268.
↑ Fan, X.; Grama, I.; Liu, Q. (2012). "Hoeffding's inequality for supermartingales". Stochastic Processes and Their Applications . 122 (10): 3545–3559. arXiv: 1109.4359 . doi:10.1016/j.spa.2012.06.009. S2CID 13451239.

This probability-related article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[bennett-1] Bennett, G. (1962). "Probability Inequalities for the Sum of Independent Random Variables". Journal of the American Statistical Association . 57 (297): 33–45. doi:10.2307/2282438. JSTOR 2282438.

[devroye-2] Devroye, Luc; Lugosi, Gábor (2001). Combinatorial methods in density estimation. Springer. p. 11. ISBN 978-0-387-95117-1.

[BLM2013-3] 1 2 Boucheron, Stephane; Lugosi, Gabor; Massart, Pascal (2013). Concentration inequalities, a nonasymptotic theory of independence. Oxford University Press. ISBN 978-0-19-953525-5.

[4] Freedman, D. A. (1975). "On tail probabilities for martingales". The Annals of Probability . 3 (1): 100–118. doi: 10.1214/aop/1176996452 . JSTOR 2959268.

[5] Fan, X.; Grama, I.; Liu, Q. (2012). "Hoeffding's inequality for supermartingales". Stochastic Processes and Their Applications . 122 (10): 3545–3559. arXiv: 1109.4359 . doi:10.1016/j.spa.2012.06.009. S2CID 13451239.

[1]

[2]

[3]

[4]

[5]