Binomial approximation

Last updated May 15, 2024

The binomial approximation is useful for approximately calculating powers of sums of 1 and a small number x. It states that

Derivations

Using linear approximation

The function

f(x)=(1+x)^{\alpha }

is a smooth function for x near 0. Thus, standard linear approximation tools from calculus apply: one has

f'(x)=\alpha (1+x)^{\alpha -1}

and so

f'(0)=\alpha .

Thus

f(x)\approx f(0)+f'(0)(x-0)=1+\alpha x.

By Taylor's theorem, the error in this approximation is equal to ${\textstyle {\frac {\alpha (\alpha -1)x^{2}}{2}}\cdot (1+\zeta )^{\alpha -2}}$ for some value of $\zeta$ that lies between 0 and $x$ . For example, if $x<0$ and $\alpha \geq 2$ , the error is at most ${\textstyle {\frac {\alpha (\alpha -1)x^{2}}{2}}}$ . In little o notation, one can say that the error is $o(|x|)$ , meaning that ${\textstyle \lim _{x\to 0}{\frac {\textrm {error}}{|x|}}=0}$ .

Using Taylor series

The function

f(x)=(1+x)^{\alpha }

where $x$ and $\alpha$ may be real or complex can be expressed as a Taylor series about the point zero.

{\begin{aligned}f(x)&=\sum _{n=0}^{\infty }{\frac {f^{(n)}(0)}{n!}}x^{n}\\f(x)&=f(0)+f'(0)x+{\frac {1}{2}}f''(0)x^{2}+{\frac {1}{6}}f'''(0)x^{3}+{\frac {1}{24}}f^{(4)}(0)x^{4}+\cdots \\(1+x)^{\alpha }&=1+\alpha x+{\frac {1}{2}}\alpha (\alpha -1)x^{2}+{\frac {1}{6}}\alpha (\alpha -1)(\alpha -2)x^{3}+{\frac {1}{24}}\alpha (\alpha -1)(\alpha -2)(\alpha -3)x^{4}+\cdots \end{aligned}}

If $|x|<1$ and $|\alpha x|\ll 1$ , then the terms in the series become progressively smaller and it can be truncated to

(1+x)^{\alpha }\approx 1+\alpha x.

This result from the binomial approximation can always be improved by keeping additional terms from the Taylor series above. This is especially important when $|\alpha x|$ starts to approach one, or when evaluating a more complex expression where the first two terms in the Taylor series cancel (see example).

Sometimes it is wrongly claimed that $|x|\ll 1$ is a sufficient condition for the binomial approximation. A simple counterexample is to let $x=10^{-6}$ and $\alpha =10^{7}$ . In this case $(1+x)^{\alpha }>22,000$ but the binomial approximation yields $1+\alpha x=11$ . For small $|x|$ but large $|\alpha x|$ , a better approximation is:

(1+x)^{\alpha }\approx e^{\alpha x}.

Example

The binomial approximation for the square root, ${\sqrt {1+x}}\approx 1+x/2$ , can be applied for the following expression,

{\frac {1}{\sqrt {a+b}}}-{\frac {1}{\sqrt {a-b}}}

where $a$ and $b$ are real but $a\gg b$ .

The mathematical form for the binomial approximation can be recovered by factoring out the large term $a$ and recalling that a square root is the same as a power of one half.

{\begin{aligned}{\frac {1}{\sqrt {a+b}}}-{\frac {1}{\sqrt {a-b}}}&={\frac {1}{\sqrt {a}}}\left(\left(1+{\frac {b}{a}}\right)^{-1/2}-\left(1-{\frac {b}{a}}\right)^{-1/2}\right)\\&\approx {\frac {1}{\sqrt {a}}}\left(\left(1+\left(-{\frac {1}{2}}\right){\frac {b}{a}}\right)-\left(1-\left(-{\frac {1}{2}}\right){\frac {b}{a}}\right)\right)\\&\approx {\frac {1}{\sqrt {a}}}\left(1-{\frac {b}{2a}}-1-{\frac {b}{2a}}\right)\\&\approx -{\frac {b}{a{\sqrt {a}}}}\end{aligned}}

Evidently the expression is linear in $b$ when $a\gg b$ which is otherwise not obvious from the original expression.

Generalization

While the binomial approximation is linear, it can be generalized to keep the quadratic term in the Taylor series:

(1+x)^{\alpha }\approx 1+\alpha x+(\alpha /2)(\alpha -1)x^{2}

Applied to the square root, it results in:

{\sqrt {1+x}}\approx 1+x/2-x^{2}/8.

Quadratic example

Consider the expression:

(1+\epsilon )^{n}-(1-\epsilon )^{-n}

where $|\epsilon |<1$ and $|n\epsilon |\ll 1$ . If only the linear term from the binomial approximation is kept $(1+x)^{\alpha }\approx 1+\alpha x$ then the expression unhelpfully simplifies to zero

{\begin{aligned}(1+\epsilon )^{n}-(1-\epsilon )^{-n}&\approx (1+n\epsilon )-(1-(-n)\epsilon )\\&\approx (1+n\epsilon )-(1+n\epsilon )\\&\approx 0.\end{aligned}}

While the expression is small, it is not exactly zero. So now, keeping the quadratic term:

{\begin{aligned}(1+\epsilon )^{n}-(1-\epsilon )^{-n}&\approx \left(1+n\epsilon +{\frac {1}{2}}n(n-1)\epsilon ^{2}\right)-\left(1+(-n)(-\epsilon )+{\frac {1}{2}}(-n)(-n-1)(-\epsilon )^{2}\right)\\&\approx \left(1+n\epsilon +{\frac {1}{2}}n(n-1)\epsilon ^{2}\right)-\left(1+n\epsilon +{\frac {1}{2}}n(n+1)\epsilon ^{2}\right)\\&\approx {\frac {1}{2}}n(n-1)\epsilon ^{2}-{\frac {1}{2}}n(n+1)\epsilon ^{2}\\&\approx {\frac {1}{2}}n\epsilon ^{2}((n-1)-(n+1))\\&\approx -n\epsilon ^{2}\end{aligned}}

This result is quadratic in $\epsilon$ which is why it did not appear when only the linear terms in $\epsilon$ were kept.

Related Research Articles

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficient is indexed by a pair of integers $n \geq k \geq 0$ and is written $It is the coefficient of the x k term in the polynomial expansion of the binomial power (1 + x) n; this coefficient can be computed by the multiplicative formula$

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

In probability theory, the central limit theorem (CLT) states that, under appropriate conditions, the distribution of a normalized version of the sample mean converges to a standard normal distribution. This holds even if the original variables themselves are not normally distributed. There are several versions of the CLT, each applying in the context of different conditions.

<span class="mw-page-title-main">Taylor's theorem</span> Approximation of a function by a truncated power series

In calculus, Taylor's theorem gives an approximation of a $-times differentiable function around a given point by a polynomial of degree, called the -th-order Taylor polynomial . For a smooth function, the Taylor polynomial is the truncation at the order of the Taylor series of the function. The first-order Taylor polynomial is the linear approximation of the function, and the second-order Taylor polynomial is often referred to as the quadratic approximation . There are several versions of Taylor's theorem, some giving explicit estimates of the approximation error of the function by its Taylor polynomial.$

In elementary algebra, the quadratic formula is a closed-form expression describing the solutions of a quadratic equation. Other ways of solving quadratic equations, such as completing the square, yield the same solutions.

In continuum mechanics, the infinitesimal strain theory is a mathematical approach to the description of the deformation of a solid body in which the displacements of the material particles are assumed to be much smaller than any relevant dimension of the body; so that its geometry and the constitutive properties of the material at each point of space can be assumed to be unchanged by the deformation.

In number theory, the study of Diophantine approximation deals with the approximation of real numbers by rational numbers. It is named after Diophantus of Alexandria.

In mathematical physics, the WKB approximation or WKB method is a method for finding approximate solutions to linear differential equations with spatially varying coefficients. It is typically used for a semiclassical calculation in quantum mechanics in which the wavefunction is recast as an exponential function, semiclassically expanded, and then either the amplitude or the phase is taken to be changing slowly.

In probability theory, the multinomial distribution is a generalization of the binomial distribution. For example, it models the probability of counts for each side of a k-sided dice rolled n times. For n independent trials each of which leads to a success for exactly one of k categories, with each category having a given fixed success probability, the multinomial distribution gives the probability of any particular combination of numbers of successes for the various categories.

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In differential geometry, a tensor density or relative tensor is a generalization of the tensor field concept. A tensor density transforms as a tensor field when passing from one coordinate system to another, except that it is additionally multiplied or weighted by a power W of the Jacobian determinant of the coordinate transition function or its absolute value. A tensor density with a single index is called a vector density. A distinction is made among (authentic) tensor densities, pseudotensor densities, even tensor densities and odd tensor densities. Sometimes tensor densities with a negative weight W are called tensor capacity. A tensor density can also be regarded as a section of the tensor product of a tensor bundle with a density bundle.

In statistics, simple linear regression (SLR) is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variable. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

Methods of computing square roots are algorithms for approximating the non-negative square root $of a positive real number . Since all square roots of natural numbers, other than of perfect squares, are irrational, square roots can usually only be computed to some finite precision: these methods typically construct a series of increasingly accurate approximations.$

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability $when only the number of experiments and the number of successes are known.$

A pendulum is a body suspended from a fixed support so that it swings freely back and forth under the influence of gravity. When a pendulum is displaced sideways from its resting, equilibrium position, it is subject to a restoring force due to gravity that will accelerate it back towards the equilibrium position. When released, the restoring force acting on the pendulum's mass causes it to oscillate about the equilibrium position, swinging it back and forth. The mathematics of pendulums are in general quite complicated. Simplifying assumptions can be made, which in the case of a simple pendulum allow the equations of motion to be solved analytically for small-angle oscillations.

In numerical analysis, Aitken's delta-squared process or Aitken extrapolation is a series acceleration method, used for accelerating the rate of convergence of a sequence. It is named after Alexander Aitken, who introduced this method in 1926. Its early form was known to Seki Kōwa and was found for rectification of the circle, i.e. the calculation of π. It is most useful for accelerating the convergence of a sequence that is converging linearly.

In probability theory and statistics, the skew normal distribution is a continuous probability distribution that generalises the normal distribution to allow for non-zero skewness.

HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. Calculating the exact cardinality of the distinct elements of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, but can only approximate the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of > 10⁹ with a typical accuracy (standard error) of 2%, using 1.5 kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm, itself deriving from the 1984 Flajolet–Martin algorithm.

(Stochastic) variance reduction is an algorithmic approach to minimizing functions that can be decomposed into finite sums. By exploiting the finite sum structure, variance reduction techniques are able to achieve convergence rates that are impossible to achieve with methods that treat the objective as an infinite sum, as in the classical Stochastic approximation setting.

References

↑ For example calculating the multipole expansion. Griffiths, D. (1999). Introduction to Electrodynamics (Third ed.). Pearson Education, Inc. pp. 146–148.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] For example calculating the multipole expansion. Griffiths, D. (1999). Introduction to Electrodynamics (Third ed.). Pearson Education, Inc. pp. 146–148.

[1]