Stirling numbers and exponential generating functions in symbolic combinatorics

Last updated November 14, 2021

The use of exponential generating functions (EGFs) to study the properties of Stirling numbers is a classical exercise in combinatorial mathematics and possibly the canonical example of how symbolic combinatorics is used. It also illustrates the parallels in the construction of these two types of numbers, lending support to the binomial-style notation that is used for them.

This article uses the coefficient extraction operator $[z^{n}]$ for formal power series, as well as the (labelled) operators ${\mathfrak {C}}$ (for cycles) and ${\mathfrak {P}}$ (for sets) on combinatorial classes, which are explained on the page for symbolic combinatorics. Given a combinatorial class, the cycle operator creates the class obtained by placing objects from the source class along a cycle of some length, where cyclical symmetries are taken into account, and the set operator creates the class obtained by placing objects from the source class in a set (symmetries from the symmetric group, i.e. an "unstructured bag".) The two combinatorial classes (shown without additional markers) are

permutations (for unsigned Stirling numbers of the first kind):

{\mathcal {P}}=\operatorname {SET} (\operatorname {CYC} ({\mathcal {Z}})),

and

set partitions into non-empty subsets (for Stirling numbers of the second kind):

{\mathcal {B}}=\operatorname {SET} (\operatorname {SET} _{\geq 1}({\mathcal {Z}})),

where ${\mathcal {Z}}$ is the singleton class.

Warning: The notation used here for the Stirling numbers is not that of the Wikipedia articles on Stirling numbers; square brackets denote the signed Stirling numbers here.

Stirling numbers of the first kind

The unsigned Stirling numbers of the first kind count the number of permutations of [n] with k cycles. A permutation is a set of cycles, and hence the set ${\mathcal {P}}\,$ of permutations is given by

{\mathcal {P}}=\operatorname {SET} ({\mathcal {U}}\times \operatorname {CYC} ({\mathcal {Z}})),\,

where the singleton ${\mathcal {U}}$ marks cycles. This decomposition is examined in some detail on the page on the statistics of random permutations.

Translating to generating functions we obtain the mixed generating function of the unsigned Stirling numbers of the first kind:

G(z,u)=\exp \left(u\log {\frac {1}{1-z}}\right)=\left({\frac {1}{1-z}}\right)^{u}=\sum _{n=0}^{\infty }\sum _{k=0}^{n}\left[{\begin{matrix}n\\k\end{matrix}}\right]u^{k}\,{\frac {z^{n}}{n!}}.

Now the signed Stirling numbers of the first kind are obtained from the unsigned ones through the relation

(-1)^{n-k}\left[{\begin{matrix}n\\k\end{matrix}}\right].

Hence the generating function $H(z,u)$ of these numbers is

H(z,u)=G(-z,-u)=\left({\frac {1}{1+z}}\right)^{-u}=(1+z)^{u}=\sum _{n=0}^{\infty }\sum _{k=0}^{n}(-1)^{n-k}\left[{\begin{matrix}n\\k\end{matrix}}\right]u^{k}\,{\frac {z^{n}}{n!}}.

A variety of identities may be derived by manipulating this generating function:

(1+z)^{u}=\sum _{n=0}^{\infty }{u \choose n}z^{n}=\sum _{n=0}^{\infty }{\frac {z^{n}}{n!}}\sum _{k=0}^{n}(-1)^{n-k}\left[{\begin{matrix}n\\k\end{matrix}}\right]u^{k}=\sum _{k=0}^{\infty }u^{k}\sum _{n=k}^{\infty }{\frac {z^{n}}{n!}}(-1)^{n-k}\left[{\begin{matrix}n\\k\end{matrix}}\right]=e^{u\log(1+z)}.

In particular, the order of summation may be exchanged, and derivatives taken, and then z or u may be fixed.

Finite sums

A simple sum is

\sum _{k=0}^{n}(-1)^{k}\left[{\begin{matrix}n\\k\end{matrix}}\right]=(-1)^{n}n!.

This formula holds because the exponential generating function of the sum is

H(z,-1)={\frac {1}{1+z}}\quad {\mbox{and hence}}\quad n![z^{n}]H(z,-1)=(-1)^{n}n!.

Infinite sums

Some infinite sums include

\sum _{n=k}^{\infty }\left[{\begin{matrix}n\\k\end{matrix}}\right]{\frac {z^{n}}{n!}}={\frac {\left(-\log(1-z)\right)^{k}}{k!}}

where $|z|<1$ (the singularity nearest to $z=0$ of $\log(1+z)$ is at $z=-1.$ )

This relation holds because

[u^{k}]H(z,u)=[u^{k}]\exp \left(u\log(1+z)\right)={\frac {\left(\log(1+z)\right)^{k}}{k!}}.

Stirling numbers of the second kind

These numbers count the number of partitions of [n] into k nonempty subsets. First consider the total number of partitions, i.e. B_n where

B_{n}=\sum _{k=1}^{n}\left\{{\begin{matrix}n\\k\end{matrix}}\right\}{\mbox{ and }}B_{0}=1,

i.e. the Bell numbers. The Flajolet–Sedgewick fundamental theorem applies (labelled case). The set ${\mathcal {B}}\,$ of partitions into non-empty subsets is given by ("set of non-empty sets of singletons")

{\mathcal {B}}=\operatorname {SET} (\operatorname {SET} _{\geq 1}({\mathcal {Z}})).

This decomposition is entirely analogous to the construction of the set ${\mathcal {P}}\,$ of permutations from cycles, which is given by

{\mathcal {P}}=\operatorname {SET} (\operatorname {CYC} ({\mathcal {Z}})).

and yields the Stirling numbers of the first kind. Hence the name "Stirling numbers of the second kind."

The decomposition is equivalent to the EGF

B(z)=\exp \left(\exp z-1\right).

Differentiate to obtain

{\frac {d}{dz}}B(z)=\exp \left(\exp z-1\right)\exp z=B(z)\exp z,

which implies that

B_{n+1}=\sum _{k=0}^{n}{n \choose k}B_{k},

by convolution of exponential generating functions and because differentiating an EGF drops the first coefficient and shifts B_n+1 to zⁿ/n!.

The EGF of the Stirling numbers of the second kind is obtained by marking every subset that goes into the partition with the term ${\mathcal {U}}\,$ , giving

{\mathcal {B}}=\operatorname {SET} ({\mathcal {U}}\times \operatorname {SET} _{\geq 1}({\mathcal {Z}})).

Translating to generating functions, we obtain

B(z,u)=\exp \left(u\left(\exp z-1\right)\right).

This EGF yields the formula for the Stirling numbers of the second kind:

\left\{{\begin{matrix}n\\k\end{matrix}}\right\}=n![u^{k}][z^{n}]B(z,u)=n![z^{n}]{\frac {(\exp z-1)^{k}}{k!}}

or

n![z^{n}]{\frac {1}{k!}}\sum _{j=0}^{k}{k \choose j}\exp(jz)(-1)^{k-j}

which simplifies to

{\frac {n!}{k!}}\sum _{j=0}^{k}{k \choose j}(-1)^{k-j}{\frac {j^{n}}{n!}}={\frac {1}{k!}}\sum _{j=0}^{k}{k \choose j}(-1)^{k-j}j^{n}.

Related Research Articles

In mathematics, the determinant is a scalar value that is a function of the entries of a square matrix. It allows characterizing some properties of the matrix and the linear map represented by the matrix. In particular, the determinant is nonzero if and only if the matrix is invertible and the linear map represented by the matrix is an isomorphism. The determinant of a product of matrices is the product of their determinants . The determinant of a matrix $A$ is denoted $det(A)$ , $det A$ , or $| A |$ .

In probability theory, the central limit theorem (CLT) establishes that, in many situations, when independent random variables are summed up, their properly normalized sum tends toward a normal distribution even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. This theorem has seen many changes during the formal development of probability theory. Previous versions of the theorem date back to 1811, but in its modern general form, this fundamental result in probability theory was precisely stated as late as 1920, thereby serving as a bridge between classical and modern probability theory.

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

In mathematics, a generating function is a way of encoding an infinite sequence of numbers (a_n) by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary series, the formal power series is not required to converge: in fact, the generating function is not actually regarded as a function, and the "variable" remains an indeterminate. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general linear recurrence problem. One can generalize to formal power series in more than one indeterminate, to encode information about infinite multi-dimensional arrays of numbers.

In mathematics, a Gaussian function, often simply referred to as a Gaussian, is a function of the form

In mathematics, the Baker–Campbell–Hausdorff formula is the solution for $to the equation$

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In mathematics, a Dirichlet series is any series of the form

In combinatorics, the symbolic method is a technique for counting combinatorial objects. It uses the internal structure of the objects to derive formulas for their generating functions. The method is mostly associated with Philippe Flajolet and is detailed in Part A of his book with Robert Sedgewick, Analytic Combinatorics, while the rest of the book explains how to use complex analysis in order to get asymptotic and probabilistic results on the corresponding generating functions.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In statistics and information theory, a maximum entropy probability distribution has entropy that is at least as great as that of all other members of a specified class of probability distributions. According to the principle of maximum entropy, if nothing is known about a distribution except that it belongs to a certain class, then the distribution with the largest entropy should be chosen as the least-informative default. The motivation is twofold: first, maximizing entropy minimizes the amount of prior information built into the distribution; second, many physical systems tend to move towards maximal entropy configurations over time.

In mathematics, the Barnes G-functionG(z) is a function that is an extension of superfactorials to the complex numbers. It is related to the gamma function, the K-function and the Glaisher–Kinkelin constant, and was named after mathematician Ernest William Barnes. It can be written in terms of the double gamma function.

In mathematics, especially in combinatorics, Stirling numbers of the first kind arise in the study of permutations. In particular, the Stirling numbers of the first kind count permutations according to their number of cycles.

In mathematics, the Stirling polynomials are a family of polynomials that generalize important sequences of numbers appearing in combinatorics and analysis, which are closely related to the Stirling numbers, the Bernoulli numbers, and the generalized Bernoulli polynomials. There are multiple variants of the Stirling polynomial sequence considered below most notably including the Sheffer sequence form of the sequence, $, defined characteristically through the special form of its exponential generating function, and the Stirling (convolution) polynomials,, which also satisfy a characteristic ordinary generating function and that are of use in generalizing the Stirling numbers to arbitrary complex-valued inputs. We consider the " convolution polynomial " variant of this sequence and its properties second in the last subsection of the article. Still other variants of the Stirling polynomials are studied in the supplementary links to the articles given in the references.$

The statistics of random permutations, such as the cycle structure of a random permutation are of fundamental importance in the analysis of algorithms, especially of sorting algorithms, which operate on random permutations. Suppose, for example, that we are using quickselect to select a random element of a random permutation. Quickselect will perform a partial sort on the array, as it partitions the array according to the pivot. Hence a permutation will be less disordered after quickselect has been performed. The amount of disorder that remains may be analysed with generating functions. These generating functions depend in a fundamental way on the generating functions of random permutation statistics. Hence it is of vital importance to compute these generating functions.

In mathematics, the multiplication theorem is a certain type of identity obeyed by many special functions related to the gamma function. For the explicit case of the gamma function, the identity is a product of values; thus the name. The various relations all stem from the same underlying principle; that is, the relation for one special function can be derived from that for the others, and is simply a manifestation of the same identity in different guises.

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product

In mathematics, a transformation of a sequence's generating function provides a method of converting the generating function for one sequence into a generating function enumerating another. These transformations typically involve integral formulas applied to a sequence generating function or weighted sums over the higher-order derivatives of these functions.

References

Ronald Graham, Donald Knuth, Oren Patashnik (1989): Concrete Mathematics, Addison–Wesley, ISBN 0-201-14236-8
D. S. Mitrinovic, Sur une classe de nombre relies aux nombres de Stirling, C. R. Acad. Sci. Paris 252 (1961), 2354–2356.
A. C. R. Belton, The monotone Poisson process, in: Quantum Probability (M. Bozejko, W. Mlotkowski and J. Wysoczanski, eds.), Banach Center Publications 73, Polish Academy of Sciences, Warsaw, 2006
Milton Abramowitz and Irene A. Stegun, Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, USGPO, 1964, Washington DC, ISBN 0-486-61272-4

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.