Stirling's approximation

Last updated
Comparison of Stirling's approximation with the factorial Mplwp factorial gamma stirling.svg
Comparison of Stirling's approximation with the factorial

In mathematics, Stirling's approximation (or Stirling's formula) is an asymptotic approximation for factorials. It is a good approximation, leading to accurate results even for small values of . It is named after James Stirling, though a related but less precise result was first stated by Abraham de Moivre. [1] [2] [3]

Contents

One way of stating the approximation involves the logarithm of the factorial: where the big O notation means that, for all sufficiently large values of , the difference between and will be at most proportional to the logarithm of . In computer science applications such as the worst-case lower bound for comparison sorting, it is convenient to instead use the binary logarithm, giving the equivalent form The error term in either base can be expressed more precisely as , corresponding to an approximate formula for the factorial itself, Here the sign means that the two quantities are asymptotic, that is, that their ratio tends to 1 as tends to infinity.

Derivation

Roughly speaking, the simplest version of Stirling's formula can be quickly obtained by approximating the sum with an integral:

The full formula, together with precise estimates of its error, can be derived as follows. Instead of approximating , one considers its natural logarithm, as this is a slowly varying function:

The right-hand side of this equation minus is the approximation by the trapezoid rule of the integral

and the error in this approximation is given by the Euler–Maclaurin formula:

where is a Bernoulli number, and Rm,n is the remainder term in the Euler–Maclaurin formula. Take limits to find that

Denote this limit as . Because the remainder Rm,n in the Euler–Maclaurin formula satisfies

where big-O notation is used, combining the equations above yields the approximation formula in its logarithmic form:

Taking the exponential of both sides and choosing any positive integer , one obtains a formula involving an unknown quantity . For m = 1, the formula is

The quantity can be found by taking the limit on both sides as tends to infinity and using Wallis' product, which shows that . Therefore, one obtains Stirling's formula:

Alternative derivations

An alternative formula for using the gamma function is (as can be seen by repeated integration by parts). Rewriting and changing variables x = ny, one obtains Applying Laplace's method one has which recovers Stirling's formula:

Higher orders

In fact, further corrections can also be obtained using Laplace's method. From previous result, we know that , so we "peel off" this dominant term, then perform two changes of variables, to obtain:To verify this: .

Now the function is unimodal, with maximum value zero. Locally around zero, it looks like , which is why we are able to perform Laplace's method. In order to extend Laplace's method to higher orders, we perform another change of variables by . This equation cannot be solved in closed form, but it can be solved by serial expansion, which gives us . Now plug back to the equation to obtainnotice how we don't need to actually find , since it is cancelled out by the integral. Higher orders can be achieved by computing more terms in , which can be obtained programmatically. [note 1]

Thus we get Stirling's formula to two orders:

Complex-analytic version

A complex-analysis version of this method [4] is to consider as a Taylor coefficient of the exponential function , computed by Cauchy's integral formula as

This line integral can then be approximated using the saddle-point method with an appropriate choice of contour radius . The dominant portion of the integral near the saddle point is then approximated by a real integral and Laplace's method, while the remaining portion of the integral can be bounded above to give an error term.

Using the Central Limit Theorem and the Poisson distribution

An alternative version uses the fact that the Poisson distribution converges to a normal distribution by the Central Limit Theorem. [5]

Since the Poisson distribution with parameter converges to a normal distribution with mean and variance , their density functions will be approximately the same:

Evaluating this expression at the mean, at which the approximation is particularly accurate, simplifies this expression to:

Taking logs then results in:

which can easily be rearranged to give:

Evaluating at gives the usual, more precise form of Stirling's approximation.

Speed of convergence and error estimates

The relative error in a truncated Stirling series vs.
n
{\displaystyle n}
, for 0 to 5 terms. The kinks in the curves represent points where the truncated series coincides with G(n + 1). Stirling series relative error.svg
The relative error in a truncated Stirling series vs. , for 0 to 5 terms. The kinks in the curves represent points where the truncated series coincides with Γ(n + 1).

Stirling's formula is in fact the first approximation to the following series (now called the Stirling series): [6]

An explicit formula for the coefficients in this series was given by G. Nemes. [7] Further terms are listed in the On-Line Encyclopedia of Integer Sequences as A001163 and A001164. The first graph in this section shows the relative error vs. , for 1 through all 5 terms listed above. (Bender and Orszag [8] p. 218) gives the asymptotic formula for the coefficients:which shows that it grows superexponentially, and that by ratio test the radius of convergence is zero.

The relative error in a truncated Stirling series vs. the number of terms used Stirling error vs number of terms.svg
The relative error in a truncated Stirling series vs. the number of terms used

As n → ∞, the error in the truncated series is asymptotically equal to the first omitted term. This is an example of an asymptotic expansion. It is not a convergent series; for any particular value of there are only so many terms of the series that improve accuracy, after which accuracy worsens. This is shown in the next graph, which shows the relative error versus the number of terms in the series, for larger numbers of terms. More precisely, let S(n, t) be the Stirling series to terms evaluated at . The graphs show which, when small, is essentially the relative error.

Writing Stirling's series in the form it is known that the error in truncating the series is always of the opposite sign and at most the same magnitude as the first omitted term.[ citation needed ]

Other bounds, due to Robbins, [9] valid for all positive integers are This upper bound corresponds to stopping the above series for after the term. The lower bound is weaker than that obtained by stopping the series after the term. A looser version of this bound is that for all .

Stirling's formula for the gamma function

For all positive integers, where Γ denotes the gamma function.

However, the gamma function, unlike the factorial, is more broadly defined for all complex numbers other than non-positive integers; nevertheless, Stirling's formula may still be applied. If Re(z) > 0, then

Repeated integration by parts gives

where is the th Bernoulli number (note that the limit of the sum as is not convergent, so this formula is just an asymptotic expansion). The formula is valid for large enough in absolute value, when |arg(z)| < π − ε, where ε is positive, with an error term of O(z−2N+ 1). The corresponding approximation may now be written:

where the expansion is identical to that of Stirling's series above for , except that is replaced with z − 1. [10]

A further application of this asymptotic expansion is for complex argument z with constant Re(z). See for example the Stirling formula applied in Im(z) = t of the Riemann–Siegel theta function on the straight line 1/4 + it.

Error bounds

For any positive integer , the following notation is introduced: and

Then [11] [12]

For further information and other error bounds, see the cited papers.

A convergent version of Stirling's formula

Thomas Bayes showed, in a letter to John Canton published by the Royal Society in 1763, that Stirling's formula did not give a convergent series. [13] Obtaining a convergent version of Stirling's formula entails evaluating Binet's formula:

One way to do this is by means of a convergent series of inverted rising factorials. If then where where s(n, k) denotes the Stirling numbers of the first kind. From this one obtains a version of Stirling's series which converges when Re(x) > 0. Stirling's formula may also be given in convergent form as [14] where

Versions suitable for calculators

The approximation and its equivalent form can be obtained by rearranging Stirling's extended formula and observing a coincidence between the resultant power series and the Taylor series expansion of the hyperbolic sine function. This approximation is good to more than 8 decimal digits for z with a real part greater than 8. Robert H. Windschitl suggested it in 2002 for computing the gamma function with fair accuracy on calculators with limited program or register memory. [15]

Gergő Nemes proposed in 2007 an approximation which gives the same number of exact digits as the Windschitl approximation but is much simpler: [16] or equivalently,

An alternative approximation for the gamma function stated by Srinivasa Ramanujan in Ramanujan's lost notebook [17] is for x ≥ 0. The equivalent approximation for ln n! has an asymptotic error of 1/1400n3 and is given by

The approximation may be made precise by giving paired upper and lower bounds; one such inequality is [18] [19] [20] [21]

History

The formula was first discovered by Abraham de Moivre [2] in the form

De Moivre gave an approximate rational-number expression for the natural logarithm of the constant. Stirling's contribution consisted of showing that the constant is precisely . [3]

See also

Related Research Articles

<span class="mw-page-title-main">Gamma function</span> Extension of the factorial function

In mathematics, the gamma function is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function is defined for all complex numbers except non-positive integers, and for every positive integer , The gamma function can be defined via a convergent improper integral for complex numbers with positive real part:

<span class="mw-page-title-main">Normal distribution</span> Probability distribution

In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.

<span class="mw-page-title-main">Error function</span> Sigmoid shape special function

In mathematics, the error function, often denoted by erf, is a function defined as:

<span class="mw-page-title-main">Digamma function</span> Mathematical function

In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function:

<span class="mw-page-title-main">Theta function</span> Special functions of several complex variables

In mathematics, theta functions are special functions of several complex variables. They show up in many topics, including Abelian varieties, moduli spaces, quadratic forms, and solitons. Theta functions are parametrized by points in a tube domain inside a complex Lagrangian Grassmannian, namely the Siegel upper half space.

<span class="mw-page-title-main">Polylogarithm</span> Special mathematical function

In mathematics, the polylogarithm (also known as Jonquière's function, for Alfred Jonquière) is a special function Lis(z) of order s and argument z. Only for special values of s does the polylogarithm reduce to an elementary function such as the natural logarithm or a rational function. In quantum statistics, the polylogarithm function appears as the closed form of integrals of the Fermi–Dirac distribution and the Bose–Einstein distribution, and is also known as the Fermi–Dirac integral or the Bose–Einstein integral. In quantum electrodynamics, polylogarithms of positive integer order arise in the calculation of processes represented by higher-order Feynman diagrams.

<span class="mw-page-title-main">Stable distribution</span> Distribution of variables which satisfies a stability property under linear combinations

In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.

<span class="mw-page-title-main">Voigt profile</span> Probability distribution

The Voigt profile is a probability distribution given by a convolution of a Cauchy-Lorentz distribution and a Gaussian distribution. It is often used in analyzing data from spectroscopy or diffraction.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.
<span class="mw-page-title-main">Chi distribution</span> Probability distribution

In probability theory and statistics, the chi distribution is a continuous probability distribution over the non-negative real line. It is the distribution of the positive square root of a sum of squared independent Gaussian random variables. Equivalently, it is the distribution of the Euclidean distance between a multivariate Gaussian random variable and the origin. The chi distribution describes the positive square roots of a variable obeying a chi-squared distribution.

In mathematics, the Glaisher–Kinkelin constant or Glaisher's constant, typically denoted A, is a mathematical constant, related to special functions like the K-function and the Barnes G-function. The constant also appears in a number of sums and integrals, especially those involving the gamma function and the Riemann zeta function. It is named after mathematicians James Whitbread Lee Glaisher and Hermann Kinkelin.

Noncentral <i>t</i>-distribution Probability distribution

The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.

The gamma function is an important special function in mathematics. Its particular values can be expressed in closed form for integer and half-integer arguments, but no simple expressions are known for the values at rational points in general. Other fractional arguments can be approximated through efficient infinite products, infinite series, and recurrence relations.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In mathematics, the secondary measure associated with a measure of positive density ρ when there is one, is a measure of positive density μ, turning the secondary polynomials associated with the orthogonal polynomials for ρ into an orthogonal system.

The Herschel–Bulkley fluid is a generalized model of a non-Newtonian fluid, in which the strain experienced by the fluid is related to the stress in a complicated, non-linear way. Three parameters characterize this relationship: the consistency k, the flow index n, and the yield shear stress . The consistency is a simple constant of proportionality, while the flow index measures the degree to which the fluid is shear-thinning or shear-thickening. Ordinary paint is one example of a shear-thinning fluid, while oobleck provides one realization of a shear-thickening fluid. Finally, the yield stress quantifies the amount of stress that the fluid may experience before it yields and begins to flow.

<span class="mw-page-title-main">Wrapped normal distribution</span>

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .

<i>q</i>-Gaussian distribution Probability distribution

The q-Gaussian is a probability distribution arising from the maximization of the Tsallis entropy under appropriate constraints. It is one example of a Tsallis distribution. The q-Gaussian is a generalization of the Gaussian in the same way that Tsallis entropy is a generalization of standard Boltzmann–Gibbs entropy or Shannon entropy. The normal distribution is recovered as q → 1.

<span class="mw-page-title-main">Inverse gamma function</span> Inverse of the gamma function

In mathematics, the inverse gamma function is the inverse function of the gamma function. In other words, whenever . For example, . Usually, the inverse gamma function refers to the principal branch with domain on the real interval and image on the real interval , where is the minimum value of the gamma function on the positive real axis and is the location of that minimum.

References

  1. Dutka, Jacques (1991), "The early history of the factorial function", Archive for History of Exact Sciences , 43 (3): 225–249, doi:10.1007/BF00389433, S2CID   122237769
  2. 1 2 Le Cam, L. (1986), "The central limit theorem around 1935", Statistical Science, 1 (1): 78–96, doi: 10.1214/ss/1177013818 , JSTOR   2245503, MR   0833276 ; see p. 81, "The result, obtained using a formula originally proved by de Moivre but now called Stirling's formula, occurs in his 'Doctrine of Chances' of 1733."
  3. 1 2 Pearson, Karl (1924), "Historical note on the origin of the normal curve of errors", Biometrika, 16 (3/4): 402–404 [p. 403], doi:10.2307/2331714, JSTOR   2331714, I consider that the fact that Stirling showed that De Moivre's arithmetical constant was does not entitle him to claim the theorem, [...]
  4. Flajolet, Philippe; Sedgewick, Robert (2009), Analytic Combinatorics, Cambridge, UK: Cambridge University Press, p. 555, doi:10.1017/CBO9780511801655, ISBN   978-0-521-89806-5, MR   2483235, S2CID   27509971
  5. MacKay, David J. C. (2019). Information theory, inference, and learning algorithms (22nd printing ed.). Cambridge: Cambridge University Press. ISBN   978-0-521-64298-9.
  6. Olver, F. W. J.; Olde Daalhuis, A. B.; Lozier, D. W.; Schneider, B. I.; Boisvert, R. F.; Clark, C. W.; Miller, B. R. & Saunders, B. V., "5.11 Gamma function properties: Asymptotic Expansions", NIST Digital Library of Mathematical Functions, Release 1.0.13 of 2016-09-16
  7. Nemes, Gergő (2010), "On the coefficients of the asymptotic expansion of ", Journal of Integer Sequences, 13 (6): 5
  8. Bender, Carl M.; Orszag, Steven A. (2009). Advanced mathematical methods for scientists and engineers. 1: Asymptotic methods and perturbation theory (Nachdr. ed.). New York, NY: Springer. ISBN   978-0-387-98931-0.
  9. Robbins, Herbert (1955), "A Remark on Stirling's Formula", The American Mathematical Monthly, 62 (1): 26–29, doi:10.2307/2308012, JSTOR   2308012
  10. Spiegel, M. R. (1999), Mathematical handbook of formulas and tables, McGraw-Hill, p. 148
  11. Schäfke, F. W.; Sattler, A. (1990), "Restgliedabschätzungen für die Stirlingsche Reihe", Note di Matematica, 10 (suppl. 2): 453–470, MR   1221957
  12. G. Nemes, Error bounds and exponential improvements for the asymptotic expansions of the gamma function and its reciprocal, Proc. Roy. Soc. Edinburgh Sect. A145 (2015), 571–596.
  13. Bayes, Thomas (24 November 1763), "A letter from the late Reverend Mr. Thomas Bayes, F. R. S. to John Canton, M. A. and F. R. S." (PDF), Philosophical Transactions of the Royal Society of London, Series I, 53: 269, Bibcode:1763RSPT...53..269B, archived (PDF) from the original on 2012-01-28, retrieved 2012-03-01
  14. Artin, Emil (2015). The Gamma Function. Dover. p. 24.
  15. Toth, V. T. Programmable Calculators: Calculators and the Gamma Function (2006) Archived 2005-12-31 at the Wayback Machine .
  16. Nemes, Gergő (2010), "New asymptotic expansion for the Gamma function", Archiv der Mathematik, 95 (2): 161–169, doi:10.1007/s00013-010-0146-9, S2CID   121820640
  17. Ramanujan, Srinivasa (14 August 1920), Lost Notebook and Other Unpublished Papers, p. 339 via Internet Archive
  18. Karatsuba, Ekatherina A. (2001), "On the asymptotic representation of the Euler gamma function by Ramanujan", Journal of Computational and Applied Mathematics, 135 (2): 225–240, Bibcode:2001JCoAM.135..225K, doi: 10.1016/S0377-0427(00)00586-0 , MR   1850542
  19. Mortici, Cristinel (2011), "Ramanujan's estimate for the gamma function via monotonicity arguments", Ramanujan J., 25 (2): 149–154, doi:10.1007/s11139-010-9265-y, S2CID   119530041
  20. Mortici, Cristinel (2011), "Improved asymptotic formulas for the gamma function", Comput. Math. Appl., 61 (11): 3364–3369, doi:10.1016/j.camwa.2011.04.036 .
  21. Mortici, Cristinel (2011), "On Ramanujan's large argument formula for the gamma function", Ramanujan J., 26 (2): 185–192, doi:10.1007/s11139-010-9281-y, S2CID   120371952 .

Further reading

  1. For example, a program in Mathematica:
    series=tau-tau^2/6+tau^3/36+tau^4*a+tau^5*b;(*pick the right a,b to make the series equal 0 at higher orders*)Series[tau^2/2+1+t-Exp[t]/.t->series,{tau,0,8}](*now do the integral*)integral=Integrate[Exp[-x*tau^2/2]*D[series/.a->0/.b->0,tau],{tau,-Infinity,Infinity}];Simplify[integral/Sqrt[2*Pi]*Sqrt[x]]