Note that the inequalities also hold when the Xi have been obtained using sampling without replacement; in this case the random variables are not independent anymore. A proof of this statement can be found in Hoeffding's paper. For slightly better bounds in the case of sampling without replacement, see for instance the paper by Serfling (1974).
Generalization
Let be independent observations such that and . Let . Then, for any ,[4]
Special Case: Bernoulli RVs
Suppose and for all i. This can occur when Xi are independent Bernoulli random variables, though they need not be identically distributed. Then we get the inequality[5]
or equivalently,
for all . This is a version of the additive Chernoff bound which is more general, since it allows for random variables that take values between zero and one, but also weaker, since the Chernoff bound gives a better tail bound when the random variables have small variance.
General case of bounded from above random variables
Hoeffding's inequality can be extended to the case of bounded from above random variables.[6]
Equivalently, we can define sub-Gaussian distributions by variance proxy, defined as follows. If there exists some such that for all , then is called a variance proxy, and the smallest such is called the optimal variance proxy, and denoted by . In this form, Hoeffding's inequality states
This upper bound is the best for the value of s minimizing the value inside the exponential. This can be done easily by optimizing a quadratic, giving
Writing the above bound for this value of s, we get the desired bound:
This proof easily generalizes to the case for sub-Gaussian distributions with variance proxy.
Usage
Confidence intervals
Hoeffding's inequality can be used to derive confidence intervals. We consider a coin that shows heads with probability p and tails with probability 1 − p. We toss the coin n times, generating n samples (which are i.i.dBernoulli random variables). The expected number of times the coin comes up heads is pn. Furthermore, the probability that the coin comes up heads at least k times can be exactly quantified by the following expression:
where H(n) is the number of heads in n coin tosses.
When k = (p + ε)n for some ε > 0, Hoeffding's inequality bounds this probability by a term that is exponentially small in ε2n:
Since this bound holds on both sides of the mean, Hoeffding's inequality implies that the number of heads that we see is concentrated around its mean, with exponentially small tail.
Thinking of as the "observed" mean, this probability can be interpreted as the level of significance (probability of making an error) for a confidence interval around of size 2ɛ:
Finding n for opposite inequality sign in the above, i.e. n that violates inequality but not equality above, gives us:
Therefore, we require at least samples to acquire a -confidence interval .
Hence, the cost of acquiring the confidence interval is sublinear in terms of confidence level and quadratic in terms of precision. Note that there are more efficient methods of estimating a confidence interval.
Vershynin, Roman (2018). High-Dimensional Probability. Cambridge University Press. ISBN9781108415194.
Boucheron, Stéphane (2013). Concentration inequalities: a nonasymptotic theory of independence. Gábor Lugosi, Pascal Massart. Oxford: Oxford University Press. ISBN978-0-19-953525-5. OCLC837517674.
Kahane, J.P. (1960). "Propriétés locales des fonctions à séries de Fourier aléatoires". Stud. Math. Vol.19. pp.1–25. .
This page is based on this Wikipedia article Text is available under the CC BY-SA 4.0 license; additional terms may apply. Images, videos and audio are available under their respective licenses.