Indian buffet process

Last updated June 19, 2019

In the mathematical theory of probability, the Indian buffet process (IBP) is a stochastic process defining a probability distribution over sparse binary matrices with a finite number of rows and an infinite number of columns. This distribution is suitable to use as a prior for models with potentially infinite number of features. The form of the prior ensures that only a finite number of features will be present in any finite set of observations but more features may appear as more data points are observed.

Stochastic process mathematical object usually defined as a collection of random variables

In probability theory and related fields, a stochastic or random process is a mathematical object usually defined as a family of random variables. Historically, the random variables were associated with or indexed by a set of numbers, usually viewed as points in time, giving the interpretation of a stochastic process representing numerical values of some system randomly changing over time, such as the growth of a bacterial population, an electrical current fluctuating due to thermal noise, or the movement of a gas molecule. Stochastic processes are widely used as mathematical models of systems and phenomena that appear to vary in a random manner. They have applications in many disciplines including sciences such as biology, chemistry, ecology, neuroscience, and physics as well as technology and engineering fields such as image processing, signal processing, information theory, computer science, cryptography and telecommunications. Furthermore, seemingly random changes in financial markets have motivated the extensive use of stochastic processes in finance.

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. In more technical terms, the probability distribution is a description of a random phenomenon in terms of the probabilities of events. For instance, if the random variable $X$ is used to denote the outcome of a coin toss, then the probability distribution of $X$ would take the value 0.5 for $X = heads$ , and 0.5 for $X = tails$ . Examples of random phenomena can include the results of an experiment or survey.

In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. By contrast, if most of the elements are nonzero, then the matrix is considered dense. The number of zero-valued elements divided by the total number of elements is called the sparsity of the matrix. Using those definitions, a matrix will be sparse when its sparsity is greater than 0.5.

Indian buffet process prior

Let $Z$ be an $N\times K$ binary matrix indicating the presence or absence of a latent feature. The IBP places the following prior on $Z$ :

p(Z)={\frac {\alpha ^{K}}{\prod _{i=1}^{N}K_{1}^{(i)}!}}\exp\{-\alpha H_{N}\}\prod _{k=1}^{K}{\frac {(N-m_{k})!(m_{k}-1)!}{N!}}

where $K$ is the number of non-zero columns in $Z$ , $m_{k}$ is the number of ones in column $k$ of $Z$ , $H_{N}$ is the Nth harmonic number, and $K_{h}$ is the number of occurrences of the non-zero binary vector $h$ among the columns in $Z$ . The parameter $\alpha$ controls the expected number of features present in each observation.

Harmonic number Sum of the first n whole number reciprocals; 1/1 + 1/2 + 1/3 + ... + 1/n

In mathematics, the $n$ -th harmonic number is the sum of the reciprocals of the first $n$ natural numbers:

A bit array is an array data structure that compactly stores bits. It can be used to implement a simple set data structure. A bit array is effective at exploiting bit-level parallelism in hardware to perform operations quickly. A typical bit array stores kw bits, where w is the number of bits in the unit of storage, such as a byte or word, and k is some nonnegative integer. If w does not divide the number of bits to be stored, some space is wasted due to internal fragmentation.

In the Indian buffet process, the rows of $Z$ correspond to customers and the columns correspond to dishes in an infinitely long buffet. The first customer takes the first $\mathrm {Poisson} (\alpha )$ dishes. The $i$ -th customer then takes dishes that have been previously sampled with probability $m_{k}/i$ , where $m_{k}$ is the number of people who have already sampled dish $k$ . He also takes $\mathrm {Poisson} (\alpha /i)$ new dishes. Therefore, $z_{nk}$ is one if customer $n$ tried the $k$ -th dish and zero otherwise.

This process is infinitely exchangeable for an equivalence class of binary matrices defined by a left-ordered many-to-one function. $\operatorname {lof} (Z)$ is obtained by ordering the columns of the binary matrix $Z$ from left to right by the magnitude of the binary number expressed by that column, taking the first row as the most significant bit.

In mathematics, when the elements of some set $S$ have a notion of equivalence defined on them, then one may naturally split the set $S$ into equivalence classes. These equivalence classes are constructed so that elements $a$ and $b$ belong to the same equivalence class if and only if $a$ and $b$ are equivalent.

Related Research Articles

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto, is a power-law probability distribution that is used in description of social, scientific, geophysical, actuarial, and many other types of observable phenomena. Originally applied to describing the distribution of wealth in a society, fitting the trend that a large portion of wealth is held by a small fraction of the population, the Pareto distribution has colloquially become known and referred to as the Pareto principle, or "80-20 rule", and is sometimes called the "Matthew principle". This rule states that, for example, 80% of the wealth of a society is held by 20% of its population. However, one should not conflate the Pareto distribution for the Pareto Principle as the former only produces this result for a particular power value, $(α = log 4 5 \approx 1.16). While is variable, empirical observation has found the 80-20 distribution to fit a wide range of cases, including natural phenomena and human activities.$

In probability and statistics, a Bernoulli process is a finite or infinite sequence of binary random variables, so it is a discrete-time stochastic process that takes only two values, canonically 0 and 1. The component Bernoulli variablesX_i are identically distributed and independent. Prosaically, a Bernoulli process is a repeated coin flipping, possibly with an unfair coin. Every variable X_i in the sequence is associated with a Bernoulli trial or experiment. They all have the same Bernoulli distribution. Much of what can be said about the Bernoulli process can also be generalized to more than two outcomes ; this generalization is known as the Bernoulli scheme.

An orthogonal matrix is a square matrix whose columns and rows are orthogonal unit vectors, i.e.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

With a shape parameter k and a scale parameter θ.
With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
With a shape parameter k and a mean parameter μ = kθ = α/β.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. In the simplest cases, the result can be either a continuous or a discrete distribution.

In signal processing, a comb filter is a filter implemented by adding a delayed version of a signal to itself, causing constructive and destructive interference. The frequency response of a comb filter consists of a series of regularly spaced notches, giving the appearance of a comb.

In information theory, the Rényi entropy generalizes the Hartley entropy, the Shannon entropy, the collision entropy and the min-entropy. Entropies quantify the diversity, uncertainty, or randomness of a system. The Rényi entropy is named after Alfréd Rényi. In the context of fractal dimension estimation, the Rényi entropy forms the basis of the concept of generalized dimensions.

Quantum tomography Reconstruction of quantum states based on measurements

Quantum tomography or quantum state tomography is the process of reconstructing the quantum state for a source of quantum systems by measurements on the systems coming from the source. The source may be any device or system which prepares quantum states either consistently into quantum pure states or otherwise into general mixed states. To be able to uniquely identify the state, the measurements must be tomographically complete. That is, the measured operators must form an operator basis on the Hilbert space of the system, providing all the information about the state. Such a set of observations is sometimes called a quorum.

In probability theory and statistics, the Jensen–Shannon divergence is a method of measuring the similarity between two probability distributions. It is also known as information radius (IRad) or total divergence to the average. It is based on the Kullback–Leibler divergence, with some notable differences, including that it is symmetric and it always has a finite value. The square root of the Jensen–Shannon divergence is a metric often referred to as Jensen-Shannon distance.

In mathematics, the projective unitary group $PU(n)$ is the quotient of the unitary group $U(n)$ by the right multiplication of its center, $U(1)$ , embedded as scalars. Abstractly, it is the holomorphic isometry group of complex projective space, just as the projective orthogonal group is the isometry group of real projective space.

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there, or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables. The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

In quantum computing, quantum finite automata (QFA) or quantum state machines are a quantum analog of probabilistic automata or a Markov decision process. They are related to quantum computers in a similar fashion as finite automata are related to Turing machines. Several types of automata may be defined, including measure-once and measure-many automata. Quantum finite automata can also be understood as the quantization of subshifts of finite type, or as a quantization of Markov chains. QFAs are, in turn, special cases of geometric finite automata or topological finite automata.

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law,

In mathematics, particularly in the theory of C*-algebras, a uniformly hyperfinite, or UHF, algebra is a C*-algebra that can be written as the closure, in the norm topology, of an increasing union of finite-dimensional full matrix algebras.

In statistics, additive smoothing, also called Laplace smoothing, or Lidstone smoothing, is a technique used to smooth categorical data. Given an observation $from a multinomial distribution with trials, a "smoothed" version of the data gives the estimator:$

In probability theory and statistics, the Poisson distribution, named after French mathematician Siméon Denis Poisson, is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant rate and independently of the time since the last event. The Poisson distribution can also be used for the number of events in other specified intervals such as distance, area or volume.

In the mathematics of free probability theory, the free Poisson distribution is a counterpart of the Poisson distribution in conventional probability theory.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

References

T.L. Griffiths and Z. Ghahramani The Indian Buffet Process: An Introduction and Review, Journal of Machine Learning Research, pp. 1185–1224, 2011.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

Indian buffet process

Contents

Indian buffet process prior

See also

Related Research Articles

References