In mathematics, Stirling's approximation (or Stirling's formula) is an asymptotic approximation for factorials. It is a good approximation, leading to accurate results even for small values of . It is named after James Stirling, though a related but less precise result was first stated by Abraham de Moivre. [1] [2] [3]
One way of stating the approximation involves the logarithm of the factorial: where the big O notation means that, for all sufficiently large values of , the difference between and will be at most proportional to the logarithm of . In computer science applications such as the worst-case lower bound for comparison sorting, it is convenient to instead use the binary logarithm, giving the equivalent form The error term in either base can be expressed more precisely as , corresponding to an approximate formula for the factorial itself, Here the sign means that the two quantities are asymptotic, that is, that their ratio tends to 1 as tends to infinity.
Roughly speaking, the simplest version of Stirling's formula can be quickly obtained by approximating the sum with an integral:
The full formula, together with precise estimates of its error, can be derived as follows. Instead of approximating , one considers its natural logarithm, as this is a slowly varying function:
The right-hand side of this equation minus is the approximation by the trapezoid rule of the integral
and the error in this approximation is given by the Euler–Maclaurin formula:
where is a Bernoulli number, and Rm,n is the remainder term in the Euler–Maclaurin formula. Take limits to find that
Denote this limit as . Because the remainder Rm,n in the Euler–Maclaurin formula satisfies
where big-O notation is used, combining the equations above yields the approximation formula in its logarithmic form:
Taking the exponential of both sides and choosing any positive integer , one obtains a formula involving an unknown quantity . For m = 1, the formula is
The quantity can be found by taking the limit on both sides as tends to infinity and using Wallis' product, which shows that . Therefore, one obtains Stirling's formula:
An alternative formula for using the gamma function is (as can be seen by repeated integration by parts). Rewriting and changing variables x = ny, one obtains Applying Laplace's method one has which recovers Stirling's formula:
In fact, further corrections can also be obtained using Laplace's method. From previous result, we know that , so we "peel off" this dominant term, then perform two changes of variables, to obtain:To verify this: .
Now the function is unimodal, with maximum value zero. Locally around zero, it looks like , which is why we are able to perform Laplace's method. In order to extend Laplace's method to higher orders, we perform another change of variables by . This equation cannot be solved in closed form, but it can be solved by serial expansion, which gives us . Now plug back to the equation to obtainnotice how we don't need to actually find , since it is cancelled out by the integral. Higher orders can be achieved by computing more terms in , which can be obtained programmatically. [note 1]
Thus we get Stirling's formula to two orders:
A complex-analysis version of this method [4] is to consider as a Taylor coefficient of the exponential function , computed by Cauchy's integral formula as
This line integral can then be approximated using the saddle-point method with an appropriate choice of contour radius . The dominant portion of the integral near the saddle point is then approximated by a real integral and Laplace's method, while the remaining portion of the integral can be bounded above to give an error term.
An alternative version uses the fact that the Poisson distribution converges to a normal distribution by the Central Limit Theorem. [5]
Since the Poisson distribution with parameter converges to a normal distribution with mean and variance , their density functions will be approximately the same:
Evaluating this expression at the mean, at which the approximation is particularly accurate, simplifies this expression to:
Taking logs then results in:
which can easily be rearranged to give:
Evaluating at gives the usual, more precise form of Stirling's approximation.
Stirling's formula is in fact the first approximation to the following series (now called the Stirling series): [6]
An explicit formula for the coefficients in this series was given by G. Nemes. [7] Further terms are listed in the On-Line Encyclopedia of Integer Sequences as A001163 and A001164. The first graph in this section shows the relative error vs. , for 1 through all 5 terms listed above. (Bender and Orszag [8] p. 218) gives the asymptotic formula for the coefficients:which shows that it grows superexponentially, and that by ratio test the radius of convergence is zero.
As n → ∞, the error in the truncated series is asymptotically equal to the first omitted term. This is an example of an asymptotic expansion. It is not a convergent series; for any particular value of there are only so many terms of the series that improve accuracy, after which accuracy worsens. This is shown in the next graph, which shows the relative error versus the number of terms in the series, for larger numbers of terms. More precisely, let S(n, t) be the Stirling series to terms evaluated at . The graphs show which, when small, is essentially the relative error.
Writing Stirling's series in the form it is known that the error in truncating the series is always of the opposite sign and at most the same magnitude as the first omitted term.[ citation needed ]
Other bounds, due to Robbins, [9] valid for all positive integers are This upper bound corresponds to stopping the above series for after the term. The lower bound is weaker than that obtained by stopping the series after the term. A looser version of this bound is that for all .
For all positive integers, where Γ denotes the gamma function.
However, the gamma function, unlike the factorial, is more broadly defined for all complex numbers other than non-positive integers; nevertheless, Stirling's formula may still be applied. If Re(z) > 0, then
Repeated integration by parts gives
where is the th Bernoulli number (note that the limit of the sum as is not convergent, so this formula is just an asymptotic expansion). The formula is valid for large enough in absolute value, when |arg(z)| < π − ε, where ε is positive, with an error term of O(z−2N+ 1). The corresponding approximation may now be written:
where the expansion is identical to that of Stirling's series above for , except that is replaced with z − 1. [10]
A further application of this asymptotic expansion is for complex argument z with constant Re(z). See for example the Stirling formula applied in Im(z) = t of the Riemann–Siegel theta function on the straight line 1/4 + it.
For any positive integer , the following notation is introduced: and
For further information and other error bounds, see the cited papers.
Thomas Bayes showed, in a letter to John Canton published by the Royal Society in 1763, that Stirling's formula did not give a convergent series. [13] Obtaining a convergent version of Stirling's formula entails evaluating Binet's formula:
One way to do this is by means of a convergent series of inverted rising factorials. If then where where s(n, k) denotes the Stirling numbers of the first kind. From this one obtains a version of Stirling's series which converges when Re(x) > 0. Stirling's formula may also be given in convergent form as [14] where
The approximation and its equivalent form can be obtained by rearranging Stirling's extended formula and observing a coincidence between the resultant power series and the Taylor series expansion of the hyperbolic sine function. This approximation is good to more than 8 decimal digits for z with a real part greater than 8. Robert H. Windschitl suggested it in 2002 for computing the gamma function with fair accuracy on calculators with limited program or register memory. [15]
Gergő Nemes proposed in 2007 an approximation which gives the same number of exact digits as the Windschitl approximation but is much simpler: [16] or equivalently,
An alternative approximation for the gamma function stated by Srinivasa Ramanujan in Ramanujan's lost notebook [17] is for x ≥ 0. The equivalent approximation for ln n! has an asymptotic error of 1/1400n3 and is given by
The approximation may be made precise by giving paired upper and lower bounds; one such inequality is [18] [19] [20] [21]
The formula was first discovered by Abraham de Moivre [2] in the form
De Moivre gave an approximate rational-number expression for the natural logarithm of the constant. Stirling's contribution consisted of showing that the constant is precisely . [3]
In mathematics, the gamma function is the most common extension of the factorial function to complex numbers. Derived by Daniel Bernoulli, the gamma function is defined for all complex numbers except non-positive integers, and for every positive integer , The gamma function can be defined via a convergent improper integral for complex numbers with positive real part:
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In mathematics, the error function, often denoted by erf, is a function defined as:
In mathematics, the digamma function is defined as the logarithmic derivative of the gamma function:
In mathematics, theta functions are special functions of several complex variables. They show up in many topics, including Abelian varieties, moduli spaces, quadratic forms, and solitons. Theta functions are parametrized by points in a tube domain inside a complex Lagrangian Grassmannian, namely the Siegel upper half space.
In mathematics, the polylogarithm (also known as Jonquière's function, for Alfred Jonquière) is a special function Lis(z) of order s and argument z. Only for special values of s does the polylogarithm reduce to an elementary function such as the natural logarithm or a rational function. In quantum statistics, the polylogarithm function appears as the closed form of integrals of the Fermi–Dirac distribution and the Bose–Einstein distribution, and is also known as the Fermi–Dirac integral or the Bose–Einstein integral. In quantum electrodynamics, polylogarithms of positive integer order arise in the calculation of processes represented by higher-order Feynman diagrams.
In probability theory, a distribution is said to be stable if a linear combination of two independent random variables with this distribution has the same distribution, up to location and scale parameters. A random variable is said to be stable if its distribution is stable. The stable distribution family is also sometimes referred to as the Lévy alpha-stable distribution, after Paul Lévy, the first mathematician to have studied it.
The Voigt profile is a probability distribution given by a convolution of a Cauchy-Lorentz distribution and a Gaussian distribution. It is often used in analyzing data from spectroscopy or diffraction.
Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:
In probability theory and statistics, the chi distribution is a continuous probability distribution over the non-negative real line. It is the distribution of the positive square root of a sum of squared independent Gaussian random variables. Equivalently, it is the distribution of the Euclidean distance between a multivariate Gaussian random variable and the origin. The chi distribution describes the positive square roots of a variable obeying a chi-squared distribution.
In mathematics, the Glaisher–Kinkelin constant or Glaisher's constant, typically denoted A, is a mathematical constant, related to special functions like the K-function and the Barnes G-function. The constant also appears in a number of sums and integrals, especially those involving the gamma function and the Riemann zeta function. It is named after mathematicians James Whitbread Lee Glaisher and Hermann Kinkelin.
The noncentral t-distribution generalizes Student's t-distribution using a noncentrality parameter. Whereas the central probability distribution describes how a test statistic t is distributed when the difference tested is null, the noncentral distribution describes how t is distributed when the null is false. This leads to its use in statistics, especially calculating statistical power. The noncentral t-distribution is also known as the singly noncentral t-distribution, and in addition to its primary use in statistical inference, is also used in robust modeling for data.
The gamma function is an important special function in mathematics. Its particular values can be expressed in closed form for integer and half-integer arguments, but no simple expressions are known for the values at rational points in general. Other fractional arguments can be approximated through efficient infinite products, infinite series, and recurrence relations.
A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.
In mathematics, the secondary measure associated with a measure of positive density ρ when there is one, is a measure of positive density μ, turning the secondary polynomials associated with the orthogonal polynomials for ρ into an orthogonal system.
The Herschel–Bulkley fluid is a generalized model of a non-Newtonian fluid, in which the strain experienced by the fluid is related to the stress in a complicated, non-linear way. Three parameters characterize this relationship: the consistency k, the flow index n, and the yield shear stress . The consistency is a simple constant of proportionality, while the flow index measures the degree to which the fluid is shear-thinning or shear-thickening. Ordinary paint is one example of a shear-thinning fluid, while oobleck provides one realization of a shear-thickening fluid. Finally, the yield stress quantifies the amount of stress that the fluid may experience before it yields and begins to flow.
In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.
In probability theory, the family of complex normal distributions, denoted or , characterizes complex random variables whose real and imaginary parts are jointly normal. The complex normal family has three parameters: location parameter μ, covariance matrix , and the relation matrix . The standard complex normal is the univariate distribution with , , and .
The q-Gaussian is a probability distribution arising from the maximization of the Tsallis entropy under appropriate constraints. It is one example of a Tsallis distribution. The q-Gaussian is a generalization of the Gaussian in the same way that Tsallis entropy is a generalization of standard Boltzmann–Gibbs entropy or Shannon entropy. The normal distribution is recovered as q → 1.
In mathematics, the inverse gamma function is the inverse function of the gamma function. In other words, whenever . For example, . Usually, the inverse gamma function refers to the principal branch with domain on the real interval and image on the real interval , where is the minimum value of the gamma function on the positive real axis and is the location of that minimum.
I consider that the fact that Stirling showed that De Moivre's arithmetical constant was does not entitle him to claim the theorem, [...]
series=tau-tau^2/6+tau^3/36+tau^4*a+tau^5*b;(*pick the right a,b to make the series equal 0 at higher orders*)Series[tau^2/2+1+t-Exp[t]/.t->series,{tau,0,8}](*now do the integral*)integral=Integrate[Exp[-x*tau^2/2]*D[series/.a->0/.b->0,tau],{tau,-Infinity,Infinity}];Simplify[integral/Sqrt[2*Pi]*Sqrt[x]]