Chinese restaurant process

Chinese restaurant table
Parameters	;
Support
PMF
Mean	; (see digamma function)

Last updated December 07, 2024

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a restaurant. Imagine a restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

An equivalent partition process was published a year earlier by Fred Hoppe,^[3] using an "urn scheme" akin to Pólya's urn. In comparison with Hoppe's urn model, the Chinese restaurant process has the advantage that it naturally lends itself to describing random permutations via their cycle structure, in addition to describing random partitions.

Formal definition

For any positive integer $n$ , let ${\mathcal {P}}_{n}$ denote the set of all partitions of the set $\{1,2,3,...,n\}\triangleq [n]$ . The Chinese restaurant process takes values in the infinite Cartesian product $\prod _{n\geq 1}{\mathcal {P}}_{n}$ .

The value of the process at time $n$ is a partition $B_{n}$ of the set $[n]$ , whose probability distribution is determined as follows. At time $n=1$ , the trivial partition $B_{1}=\{\{1\}\}$ is obtained (with probability one). At time $n+1$ the element " $n+1$ " is either:

added to one of the blocks of the partition $B_{n}$ , where each block is chosen with probability $|b|/(n+1)$ where $|b|$ is the size of the block (i.e. number of elements), or
added to the partition $B_{n}$ as a new singleton block, with probability $1/(n+1)$ .

The random partition so generated has some special properties. It is exchangeable in the sense that relabeling $\{1,...,n\}$ does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of $[n-1]$ obtained by removing the element $n$ from the random partition $B_{n}$ is the same as the law of the random partition $B_{n-1}$ .

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

\Pr(B_{n}=B)={\frac {\prod _{b\in B}(|b|-1)!}{n!}},\qquad B\in {\mathcal {P}}_{n}

where $b$ is a block in the partition $B$ and $|b|$ is the size of $b$ .

The definition can be generalized by introducing a parameter $\theta >0$ which modifies the probability of the new customer sitting at a new table to ${\frac {\theta }{n+\theta }}$ and correspondingly modifies the probability of them sitting at a table of size $|b|$ to ${\frac {|b|}{n+\theta }}$ . The vanilla process introduced above can be recovered by setting $\theta =1$ . Intuitively, $\theta$ can be interpreted as the effective number of customers sitting at the first empty table.

Alternative definition

An equivalent, but subtly different way to define the Chinese restaurant process, is to let new customers choose companions rather than tables.^[4] Customer $n+1$ chooses to sit at the same table as any one of the $n$ seated customers with probability ${\frac {1}{n+\theta }}$ , or chooses to sit at a new, unoccupied table with probability ${\frac {\theta }{n+\theta }}$ . Notice that in this formulation, the customer chooses a table without having to count table occupancies---we don't need $|b|$ .

Distribution of the number of tables

The Chinese restaurant table distribution (CRT) is the probability distribution on the number of tables in the Chinese restaurant process.^[5] It can be understood as the sum of $n$ independent Bernoulli random variables, each with a different parameter:

{\begin{aligned}K&=\sum _{i=1}^{n}b_{i}\\[4pt]b_{i}&\sim \operatorname {Bernoulli} \left({\frac {\theta }{i-1+\theta }}\right)\end{aligned}}

The probability mass function of $K$ is given by ^[6]

f(k)={\frac {\Gamma (\theta )}{\Gamma (n+\theta )}}|s(n,k)|\theta ^{k},\quad k=1,\dots ,n,

where $s$ denotes Stirling numbers of the first kind.

Two-parameter generalization

This construction can be generalized to a model with two parameters, $\theta$ & $\alpha$ ,^[2]^[7] commonly called the strength (or concentration) and discount parameters respectively. At time $n+1$ , the next customer to arrive finds $|B|$ occupied tables and decides to sit at an empty table with probability

{\frac {\theta +|B|\alpha }{n+\theta }},

or at an occupied table $b$ of size $|b|$ with probability

{\frac {|b|-\alpha }{n+\theta }}.

In order for the construction to define a valid probability measure it is necessary to suppose that either $\alpha <0$ and $\theta =-L\alpha$ for some $L\in \{1,2,,...\}$ ; or that $0\leq \alpha <1$ and $\theta >-\alpha$ .

Under this model the probability assigned to any particular partition $B$ of $[n]$ , can be expressed in the general case (for any values of $\theta ,\alpha$ that satisfy the above-mentioned constraints) in terms of the Pochhammer k-symbol, as

\Pr(B_{n}=B\mid \theta ,\alpha )={\frac {(\theta +\alpha )_{|B|-1,\alpha }}{(\theta +1)_{n-1,1}}}\prod _{b\in B}(1-\alpha )_{|b|-1,1}

where, the Pochhammer k-symbol is defined as follows: by convention, $(a)_{0,k}=1$ , and for $m>0$

(a)_{m,k}=\prod _{i=0}^{m-1}(a+ik)={\begin{cases}a^{m}&{\text{if }}k=0,\\\\k^{m}\,({\frac {a}{k}})^{\overline {m}}&{\text{if }}k>0,\\\\\left|k\right|^{m}\,({\frac {a}{\left|k\right|}})^{\underline {m}}&{\text{if }}k<0\end{cases}}

where $x^{\overline {m}}=\prod _{i=0}^{m-1}(x+i)$ is the rising factorial and $x^{\underline {m}}=\prod _{i=0}^{m-1}(x-i)$ is the falling factorial. It is worth noting that for the parameter setting where $\alpha <0$ and $\theta =-L\alpha$ , then $(\theta +\alpha )_{|B|-1,\alpha }=(|\alpha |(L-1))_{|B|-1,\alpha }$ , which evaluates to zero whenever $|B|>L$ , so that $L$ is an upper bound on the number of blocks in the partition; see the subsection on the Dirichlet-categorical model below for more details.

For the case when $\theta >0$ and $0<\alpha <1$ , the partition probability can be rewritten in terms of the Gamma function as

\Pr(B_{n}=B\mid \theta ,\alpha )={\frac {\Gamma (\theta )}{\Gamma (\theta +n)}}{\dfrac {\alpha ^{|B|}\,\Gamma (\theta /\alpha +|B|)}{\Gamma (\theta /\alpha )}}\prod _{b\in B}{\dfrac {\Gamma (|b|-\alpha )}{\Gamma (1-\alpha )}}.

In the one-parameter case, where $\alpha$ is zero, and $\theta >0$ this simplifies to

\Pr(B_{n}=B\mid \theta )={\frac {\Gamma (\theta )\,\theta ^{|B|}}{\Gamma (\theta +n)}}\prod _{b\in B}\Gamma (|b|).

Or, when $\theta$ is zero, and $0<\alpha <1$

\Pr(B_{n}=B\mid \alpha )={\frac {\alpha ^{|B|-1}\,\Gamma (|B|)}{\Gamma (n)}}\prod _{b\in B}{\frac {\Gamma (|b|-\alpha )}{\Gamma (1-\alpha )}}.

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If $\alpha =0$ , the probability distribution of the random partition of the integer $n$ thus generated is the Ewens distribution with parameter $\theta$ , used in population genetics and the unified neutral theory of biodiversity.

Animation of a Chinese restaurant process with scaling parameter

\theta =0.5,\ \alpha =0

. Tables are hidden once the customers of a table can not be displayed anymore; however, every table has infinitely many seats. (Recording of an interactive animation.^[8])

Derivation

Here is one way to derive this partition probability. Let $C_{i}$ be the random block into which the number $i$ is added, for $i=1,2,3,...$ . Then

\Pr(C_{i}=c\mid C_{1},\ldots ,C_{i-1})={\begin{cases}{\dfrac {\theta +|B|\alpha }{\theta +i-1}}&{\text{if }}c\in {\text{new block}},\\\\{\dfrac {|b|-\alpha }{\theta +i-1}}&{\text{if }}c\in b;\end{cases}}

The probability that $B_{n}$ is any particular partition of the set $\{1,...,n\}$ is the product of these probabilities as $i$ runs from $1$ to $n$ . Now consider the size of block $b$ : it increases by one each time we add one element into it. When the last element in block $b$ is to be added in, the block size is $|b|-1$ . For example, consider this sequence of choices: (generate a new block $b$ )(join $b$ )(join $b$ )(join $b$ ). In the end, block $b$ has 4 elements and the product of the numerators in the above equation gets $\theta \cdot 1\cdot 2\cdot 3$ . Following this logic, we obtain $\Pr(B_{n}=B)$ as above.

Expected number of tables

For the one parameter case, with $\alpha =0$ and $0<\theta <\infty$ , the number of tables is distributed according to the chinese restaurant table distribution. The expected value of this random variable, given that there are $n$ seated customers, is^[9]

{\begin{aligned}\sum _{k=1}^{n}{\frac {\theta }{\theta +k-1}}=\theta \cdot (\Psi (\theta +n)-\Psi (\theta ))\end{aligned}}

where $\Psi (\theta )$ is the digamma function. For the two-parameter case, for $\alpha \neq 0$ , the expected number of occupied tables is^[7]

{\begin{aligned}{\frac {(\theta +\alpha )^{\overline {n}}}{\alpha (\theta +1)^{\overline {n-1}}}}-{\frac {\theta }{\alpha }},\end{aligned}}

where $x^{\overline {m}}$ is the rising factorial (as defined above).

The Dirichlet-categorical model

For the parameter choice $\alpha <0$ and $\theta =-L\alpha$ , where $L\in \{1,2,3,\ldots \}$ , the two-parameter Chinese restaurant process is equivalent to the Dirichlet-categorical model, which is a hierarchical model that can be defined as follows. Notice that for this parameter setting, the probability of occupying a new table, when there are already $L$ occupied tables, is zero; so that the number of occupied tables is upper bounded by $L$ . If we choose to identify tables with labels that take values in $\{1,2,\ldots ,L\}$ , then to generate a random partition of the set $[n]=\{1,2,\ldots ,n\}$ , the hierarchical model first draws a categorical label distribution, $\mathbf {p} =(p_{1},p_{2},\ldots ,p_{L})$ from the symmetric Dirichlet distribution, with concentration parameter $\gamma =-\alpha >0$ . Then, independently for each of the $n$ customers, the table label is drawn from the categorical $\mathbf {p}$ . Since the Dirichlet distribution is conjugate to the categorical, the hidden variable $\mathbf {p}$ can be marginalized out to obtain the posterior predictive distribution for the next label state, $\ell _{n+1}$ , given $n$ previous labels

P(\ell _{n+1}=i\mid \ell _{1},\ldots ,\ell _{n})={\frac {\gamma +\left|{b_{i}}\right|}{L\gamma +n}}

where $\left|{b_{i}}\right|\geq 0$ is the number of customers that are already seated at table $i$ . With $\alpha =-\gamma$ and $\theta =L\gamma$ , this agrees with the above general formula, ${\frac {|b_{i}|-\alpha }{n+\theta }}$ , for the probability of sitting at an occupied table when $|b_{i}|\geq 1$ . The probability for sitting at any of the $L-|B|$ unoccupied tables, also agrees with the general formula and is given by

\sum _{i:|b_{i}|=0}P(\ell _{n+1}=i\mid \ell _{1},\ldots ,\ell _{n})={\frac {(L-|B|)\gamma }{n+L\gamma }}={\frac {\theta +|B|\alpha }{n+\theta }}

The marginal probability for the labels is given by

P(\ell _{1},\ldots ,\ell _{n})=P(\ell _{1})\prod _{t=1}^{n-1}P(\ell _{t+1}\mid \ell _{1},\ldots ,\ell _{t})={\frac {\prod _{i=1}^{L}\gamma ^{\overline {\left|{b_{i}}\right|}}}{(L\gamma )^{\overline {n}}}}

where $P(\ell _{1})={\frac {1}{L}}$ and $x^{\overline {m}}=\prod _{i=0}^{m-1}(x+i)$ is the rising factorial. In general, there are however multiple label states that all correspond to the same partition. For a given partition, $B$ , which has $\left|B\right|\leq L$ blocks, the number of label states that all correspond to this partition is given by the falling factorial, $L^{\underline {\left|B\right|}}=\prod _{i=0}^{\left|B\right|-1}(L-i)$ . Taking this into account, the probability for the partition is

{\text{Pr}}(B_{n}=B\mid \gamma ,L)=L^{\underline {\left|B\right|}}\,{\frac {\prod _{i=1}^{L}\gamma ^{\overline {\left|{b_{i}}\right|}}}{(L\gamma )^{\overline {n}}}}

which can be verified to agree with the general version of the partition probability that is given above in terms of the Pochhammer k-symbol. Notice again, that if $B$ is outside of the support, i.e. $|B|>L$ , the falling factorial, $L^{\underline {|B|}}$ evaluates to zero as it should. (Practical implementations that evaluate the log probability for partitions via $\log L^{\underline {|B|}}=\log \left|\Gamma (L+1)\right|-\log \left|\Gamma (L+1-|B|)\right|$ will return $-\infty$ , whenever $|B|>L$ , as required.)

Relationship between Dirichlet-categorical and one-parameter CRP

Consider on the one hand, the one-parameter Chinese restaurant process, with $\alpha =0$ and $\theta >0$ , which we denote ${\text{CRP}}(\alpha =0,\theta )$ ; and on the other hand the Dirichlet-categorical model with $L$ a positive integer and where we choose $\gamma ={\frac {\theta }{L}}$ , which as shown above, is equivalent to ${\text{CRP}}(\alpha =-{\frac {\theta }{L}},\theta )$ . This shows that the Dirichlet-categorical model can be made arbitrarily close to ${\text{CRP}}(0,\theta )$ , by making $L$ large.

Stick-breaking process

The two-parameter Chinese restaurant process can equivalently be defined in terms of a stick-breaking process.^[10] For the case where $0\leq \alpha <1$ and $\theta >-\alpha$ , the stick breaking process can be described as a hierarchical model, much like the above Dirichlet-categorical model, except that there is an infinite number of label states. The table labels are drawn independently from the infinite categorical distribution $\mathbf {p} =(p_{1},p_{2},\ldots )$ , the components of which are sampled using stick breaking: start with a stick of length 1 and randomly break it in two, the length of the left half is $p_{1}$ and the right half is broken again recursively to give $p_{2},p_{3},\ldots$ . More precisely, the left fraction, $f_{k}$ , of the $k$ -th break is sampled from the beta distribution:

f_{k}\sim B(1-\alpha ,\theta +k\alpha ),\;{\text{for }}k\geq 1{\text{ and }}0\leq \alpha <1

The categorical probabilities are:

p_{k}=f_{k}\prod _{i=1}^{k-1}(1-f_{k}),\;{\text{where the empty product evaluates to one.}}

For the parameter settings $\alpha <0$ and $\theta =-\alpha L$ , where $L$ is a positive integer, and where the categorical is finite: $\mathbf {p} =(p_{1},\ldots ,p_{L})$ , we can sample $\mathbf {p}$ from an ordinary Dirchlet distribution as explained above, but it can also be sampled with a truncated stick-breaking recipe, where the formula for sampling the fractions is modified to:

f_{k}\sim B(-\alpha ,\theta +k\alpha ),\;{\text{for }}1\leq k\leq L-1{\text{ and }}\alpha <0

and $f_{L}=1$ .

The Indian buffet process

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e., we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data.^[11]

Applications

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of Bayesian statistics including nonparametric Bayesian methods. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,^[12] biodiversity modelling, and image reconstruction ^[13]^[14]

Related Research Articles

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

With a shape parameter $k$ and a scale parameter $θ$
With a shape parameter $and a rate parameter ⁠ ⁠$

In mathematics, the beta function, also called the Euler integral of the first kind, is a special function that is closely related to the gamma function and to binomial coefficients. It is defined by the integral

In quantum mechanics, a spherically symmetric potential is a system of which the potential only depends on the radial distance from the spherical center and a location in space. A particle in a spherically symmetric potential will behave accordingly to said potential and can therefore be used as an approximation, for example, of the electron in a hydrogen atom or of the formation of chemical bonds.

In quantum physics, the scattering amplitude is the probability amplitude of the outgoing spherical wave relative to the incoming plane wave in a stationary-state scattering process. At large distances from the centrally symmetric scattering center, the plane wave is described by the wavefunction

<span class="mw-page-title-main">Dirichlet distribution</span> Probability distribution

In probability and statistics, the Dirichlet distribution, often denoted $, is a family of continuous multivariate probability distributions parameterized by a vector of positive reals. It is a multivariate generalization of the beta distribution, hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution.$

In rotordynamics, the rigid rotor is a mechanical model of rotating systems. An arbitrary rigid rotor is a 3-dimensional rigid object, such as a top. To orient such an object in space requires three angles, known as Euler angles. A special rigid rotor is the linear rotor requiring only two angles to describe, for example of a diatomic molecule. More general molecules are 3-dimensional, such as water, ammonia, or methane.

In probability theory and statistics, the beta prime distribution is an absolutely continuous probability distribution. If $has a beta distribution, then the odds has a beta prime distribution.$

In queueing theory, a discipline within the mathematical theory of probability, a Jackson network is a class of queueing network where the equilibrium distribution is particularly simple to compute as the network has a product-form solution. It was the first significant development in the theory of networks of queues, and generalising and applying the ideas of the theorem to search for similar product-form solutions in other networks has been the subject of much research, including ideas used in the development of the Internet. The networks were first identified by James R. Jackson and his paper was re-printed in the journal Management Science’s ‘Ten Most Influential Titles of Management Sciences First Fifty Years.’

In natural language processing, latent Dirichlet allocation (LDA) is a Bayesian network for modeling automatically extracted topics in textual corpora. The LDA is an example of a Bayesian topic model. In this, observations are collected into documents, and each word's presence is attributable to one of the document's topics. Each document will contain a small number of topics.

The Wigner D-matrix is a unitary matrix in an irreducible representation of the groups SU(2) and SO(3). It was introduced in 1927 by Eugene Wigner, and plays a fundamental role in the quantum mechanical theory of angular momentum. The complex conjugate of the D-matrix is an eigenfunction of the Hamiltonian of spherical and symmetric rigid rotors. The letter $D$ stands for Darstellung, which means "representation" in German.

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

In probability theory and statistics, the Dirichlet-multinomial distribution is a family of discrete multivariate probability distributions on a finite support of non-negative integers. It is also called the Dirichlet compound multinomial distribution (DCM) or multivariate Pólya distribution. It is a compound probability distribution, where a probability vector p is drawn from a Dirichlet distribution with parameter vector $, and an observation drawn from a multinomial distribution with probability vector p and number of trials n . The Dirichlet parameter vector captures the prior belief about the situation and can be seen as a pseudocount: observations of each outcome that occur before the actual data is collected. The compounding corresponds to a Pólya urn scheme. It is frequently encountered in Bayesian statistics, machine learning, empirical Bayes methods and classical statistics as an overdispersed multinomial distribution.$

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

A hydrogen-like atom (or hydrogenic atom) is any atom or ion with a single valence electron. These atoms are isoelectronic with hydrogen. Examples of hydrogen-like atoms include, but are not limited to, hydrogen itself, all alkali metals such as Rb and Cs, singly ionized alkaline earth metals such as Ca⁺ and Sr⁺ and other ions such as He⁺, Li²⁺, and Be³⁺ and isotopes of any of the above. A hydrogen-like atom includes a positively charged core consisting of the atomic nucleus and any core electrons as well as a single valence electron. Because helium is common in the universe, the spectroscopy of singly ionized helium is important in EUV astronomy, for example, of DO white dwarf stars.

In statistics, a Pólya urn model, named after George Pólya, is a family of urn models that can be used to interpret many commonly used statistical models.

In representation theory of mathematics, the Waldspurger formula relates the special values of two L-functions of two related admissible irreducible representations. Let $k$ be the base field, $f$ be an automorphic form over $k$ , $π$ be the representation associated via the Jacquet–Langlands correspondence with f. Goro Shimura (1976) proved this formula, when $and f is a cusp form; Günter Harder made the same discovery at the same time in an unpublished paper. Marie-France Vignéras (1980) proved this formula, when and f is a newform. Jean-Loup Waldspurger, for whom the formula is named, reproved and generalized the result of Vignéras in 1985 via a totally different method which was widely used thereafter by mathematicians to prove similar formulas.$

In probability theory and statistics, the Dirichlet negative multinomial distribution is a multivariate distribution on the non-negative integers. It is a multivariate extension of the beta negative binomial distribution. It is also a generalization of the negative multinomial distribution (NM(k, p)) allowing for heterogeneity or overdispersion to the probability vector. It is used in quantitative marketing research to flexibly model the number of household transactions across multiple brands.

In probability theory, Poisson-Dirichlet distributions are probability distributions on the set of nonnegative, non-increasing sequences with sum 1, depending on two parameters $and . It can be defined as follows. One considers independent random variables such that follows the beta distribution of parameters and . Then, the Poisson-Dirichlet distribution of parameters and is the law of the random decreasing sequence containing and the products . This definition is due to Jim Pitman and Marc Yor. It generalizes Kingman's law, which corresponds to the particular case .$

References

↑ Aldous, D. J. (1985). "Exchangeability and related topics". École d'Été de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics. Vol. 1117. pp. 1–198. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3. The restaurant process is described on page 92.
1 2 Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145–158. doi: 10.1007/BF01213386 . MR 1337249. S2CID 16849229.
↑ Hoppe, Fred M. (1984). "Pólya-like urns and the Ewens' sampling formula". Journal of Mathematical Biology. 20: 91–94.
↑ Blei, David M.; Frazier, Peter I. (2011). "Distance Dependent Chinese Restaurant Processes" (PDF). Journal of Machine Learning Research. 12: 2461–2488.
↑ Zhou, Mingyuan; Carin, Lawrence (2012). "Negative Binomial Process Count and Mixture Modeling". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 307–20. arXiv: 1209.3442 . Bibcode:2012arXiv1209.3442Z. doi:10.1109/TPAMI.2013.211. PMID 26353243. S2CID 1937045.
↑ Antoniak, Charles E (1974). "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems". The Annals of Statistics. 2 (6): 1152–1174. doi: 10.1214/aos/1176342871 .
1 2 Pitman, Jim (2006). Combinatorial Stochastic Processes. Vol. 1875. Berlin: Springer-Verlag. ISBN 9783540309901. Archived from the original on 2012-09-25. Retrieved 2011-05-11.
↑ "Dirichlet Process and Dirichlet Distribution -- Polya Restaurant Scheme and Chinese Restaurant Process".
↑ Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf Archived April 11, 2011, at the Wayback Machine
↑ Ishwaran, Hemant; James, Lancelot F. (2001). "Gibbs Sampling Methods for Stick-Breaking Priors". Journal of the American Statistical Association. 96 (453): 161–173. ISSN 0162-1459.
↑ Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process Archived 2008-10-31 at the Wayback Machine . Gatsby Unit Technical Report GCNU-TR-2005-001.
↑ Qin, Zhaohui S (2006). "Clustering microarray gene expression data using weighted Chinese restaurant process". Bioinformatics. 22 (16): 1988–1997. doi:10.1093/bioinformatics/btl284. PMID 16766561.
↑ White, J. T.; Ghosal, S. (2011). "Bayesian smoothing of photon-limited images with applications in astronomy" (PDF). Journal of the Royal Statistical Society, Series B (Statistical Methodology). 73 (4): 579–599. CiteSeerX 10.1.1.308.7922 . doi:10.1111/j.1467-9868.2011.00776.x. S2CID 2342134.
↑ Li, M.; Ghosal, S. (2014). "Bayesian multiscale smoothing of Gaussian noised images". Bayesian Analysis. 9 (3): 733–758. doi: 10.1214/14-ba871 .

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Aldous, D. J. (1985). "Exchangeability and related topics". École d'Été de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics. Vol. 1117. pp. 1–198. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3. The restaurant process is described on page 92.

[Pitman1995-2] 1 2 Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145–158. doi: 10.1007/BF01213386 . MR 1337249. S2CID 16849229.

[3] Hoppe, Fred M. (1984). "Pólya-like urns and the Ewens' sampling formula". Journal of Mathematical Biology. 20: 91–94.

[Blei2011-4] Blei, David M.; Frazier, Peter I. (2011). "Distance Dependent Chinese Restaurant Processes" (PDF). Journal of Machine Learning Research. 12: 2461–2488.

[5] Zhou, Mingyuan; Carin, Lawrence (2012). "Negative Binomial Process Count and Mixture Modeling". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 307–20. arXiv: 1209.3442 . Bibcode:2012arXiv1209.3442Z. doi:10.1109/TPAMI.2013.211. PMID 26353243. S2CID 1937045.

[6] Antoniak, Charles E (1974). "Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems". The Annals of Statistics. 2 (6): 1152–1174. doi: 10.1214/aos/1176342871 .

[Pitman2006-7] 1 2 Pitman, Jim (2006). Combinatorial Stochastic Processes. Vol. 1875. Berlin: Springer-Verlag. ISBN 9783540309901. Archived from the original on 2012-09-25. Retrieved 2011-05-11.

[8] "Dirichlet Process and Dirichlet Distribution -- Polya Restaurant Scheme and Chinese Restaurant Process".

[9] Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf Archived April 11, 2011, at the Wayback Machine

[10] Ishwaran, Hemant; James, Lancelot F. (2001). "Gibbs Sampling Methods for Stick-Breaking Priors". Journal of the American Statistical Association. 96 (453): 161–173. ISSN 0162-1459.

[ibpreport-11] Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process Archived 2008-10-31 at the Wayback Machine . Gatsby Unit Technical Report GCNU-TR-2005-001.

[12] Qin, Zhaohui S (2006). "Clustering microarray gene expression data using weighted Chinese restaurant process". Bioinformatics. 22 (16): 1988–1997. doi:10.1093/bioinformatics/btl284. PMID 16766561.

[13] White, J. T.; Ghosal, S. (2011). "Bayesian smoothing of photon-limited images with applications in astronomy" (PDF). Journal of the Royal Statistical Society, Series B (Statistical Methodology). 73 (4): 579–599. CiteSeerX 10.1.1.308.7922 . doi:10.1111/j.1467-9868.2011.00776.x. S2CID 2342134.

[14] Li, M.; Ghosal, S. (2014). "Bayesian multiscale smoothing of Gaussian noised images". Bayesian Analysis. 9 (3): 733–758. doi: 10.1214/14-ba871 .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

v t e Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Additive process Bessel process Birth–death process pure birth Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Dyson Brownian motion Empirical process Feller process Fleming–Viot process Gamma process Geometric process Hawkes process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Binomial options pricing model Black–Derman–Toy Black–Karasinski Black–Scholes Chan–Karolyi–Longstaff–Sanders (CKLS) Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White Korn-Kreer-Lenssen LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise-deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem Zero–one laws (Blumenthal, Borel–Cantelli, Engelbert–Schmidt, Hewitt–Savage, Kolmogorov, Lévy)
Inequalities	Burkholder–Davis–Gundy Doob's martingale Doob's upcrossing Kunita–Watanabe Marcinkiewicz–Zygmund
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Karhunen–Loève theorem Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Control theory Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Signal processing Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category