Zero-truncated Poisson distribution

Last updated

In probability theory, the zero-truncated Poisson (ZTP) distribution is a certain discrete probability distribution whose support is the set of positive integers. This distribution is also known as the conditional Poisson distribution [1] or the positive Poisson distribution. [2] It is the conditional probability distribution of a Poisson-distributed random variable, given that the value of the random variable is not zero. Thus it is impossible for a ZTP random variable to be zero. Consider for example the random variable of the number of items in a shopper's basket at a supermarket checkout line. Presumably a shopper does not stand in line with nothing to buy (i.e., the minimum purchase is 1 item), so this phenomenon may follow a ZTP distribution. [3]

Contents

Since the ZTP is a truncated distribution with the truncation stipulated as k > 0, one can derive the probability mass function g(k;λ) from a standard Poisson distribution f(k;λ) as follows: [4]

The mean is

and the variance is

Parameter estimation

The method of moments estimator for the parameter is obtained by solving

where is the sample mean. [1]

This equation has a solution in terms of the Lambert W function. In practice, a solution may be found using numerical methods.

Examples

Insurance claims:

Imagine navigating the intricate landscape of auto insurance claims, where each claim signifies a unique event – an accident or damage occurrence. The ZTP distribution seamlessly aligns with this scenario, excluding the possibility of policyholders with zero claims.

Let X denote the random variable representing the number of insurance claims. If λ is the average rate of claims, the ZTP probability mass function takes the form:

for k= 1,2,3,...

This formula encapsulates the probability of observing k claims given that at least one claim has transpired. The denominator ensures the exclusion of the improbable zero-claim scenario. By utilizing the zero-truncated Poisson distribution, the manufacturing company can analyze and predict the frequency of defects in their products while focusing on instances where defects exist. This distribution helps in understanding and improving the quality control process, especially when it's crucial to account for at least one defect.

Generating zero-truncated Poisson-distributed random variables

Random variables sampled from the zero-truncated Poisson distribution may be achieved using algorithms derived from Poisson distribution sampling algorithms. [5]

init:     Let k ← 1, t ← e−λ / (1 - e−λ) * λ, s ← t.     Generate uniform random number u in [0,1]. while s < u do:     k ← k + 1.     t ← t * λ / k.     s ← s + t. return k.

The cost of the procedure above is linear in k, which may be large for large values of . Given access to an efficient sampler for non-truncated Poisson random variates, a non-iterative approach involves sampling from a truncated exponential distribution representing the time of the first event in a Poisson point process, conditional on such an event existing. [6] A simple NumPy implementation is:

defsample_zero_truncated_poisson(rate):u=np.random.uniform(np.exp(-rate),1)t=-np.log(u)return1+np.random.poisson(rate-t)

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the distance between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate; the distance parameter could be any meaningful mono-dimensional measure of the process, such as time between production errors, or length along a roll of fabric in the weaving manufacturing process. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result that characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

  1. To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
  2. To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In probability theory and statistics, the factorial moment generating function (FMGF) of the probability distribution of a real-valued random variable X is defined as

<span class="mw-page-title-main">Inverse Gaussian distribution</span> Family of continuous probability distributions

In probability theory, the inverse Gaussian distribution is a two-parameter family of continuous probability distributions with support on (0,∞).

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal, gamma and inverse Gaussian distributions, the purely discrete scaled Poisson distribution, and the class of compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. Tweedie distributions are a special case of exponential dispersion models and are often used as distributions for generalized linear models.

<span class="mw-page-title-main">Conway–Maxwell–Poisson distribution</span> Probability distribution

In probability theory and statistics, the Conway–Maxwell–Poisson distribution is a discrete probability distribution named after Richard W. Conway, William L. Maxwell, and Siméon Denis Poisson that generalizes the Poisson distribution by adding a parameter to model overdispersion and underdispersion. It is a member of the exponential family, has the Poisson distribution and geometric distribution as special cases and the Bernoulli distribution as a limiting case.

In probability theory, the law of rare events or Poisson limit theorem states that the Poisson distribution may be used as an approximation to the binomial distribution, under certain conditions. The theorem was named after Siméon Denis Poisson (1781–1840). A generalization of this theorem is Le Cam's theorem.

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.

In statistics, a zero-inflated model is a statistical model based on a zero-inflated probability distribution, i.e. a distribution that allows for frequent zero-valued observations.

In machine learning, the kernel embedding of distributions comprises a class of nonparametric methods in which a probability distribution is represented as an element of a reproducing kernel Hilbert space (RKHS). A generalization of the individual data-point feature mapping done in classical kernel methods, the embedding of distributions into infinite-dimensional feature spaces can preserve all of the statistical features of arbitrary distributions, while allowing one to compare and manipulate distributions using Hilbert space operations such as inner products, distances, projections, linear transformations, and spectral analysis. This learning framework is very general and can be applied to distributions over any space on which a sensible kernel function may be defined. For example, various kernels have been proposed for learning from data which are: vectors in , discrete classes/categories, strings, graphs/networks, images, time series, manifolds, dynamical systems, and other structured objects. The theory behind kernel embeddings of distributions has been primarily developed by Alex Smola, Le Song , Arthur Gretton, and Bernhard Schölkopf. A review of recent works on kernel embedding of distributions can be found in.

In probability theory, a sub-Gaussian distribution, the distribution of a sub-Gaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a sub-Gaussian distribution are dominated by the tails of a Gaussian. This property gives sub-Gaussian distributions their name.

A mixed Poisson distribution is a univariate discrete probability distribution in stochastics. It results from assuming that the conditional distribution of a random variable, given the value of the rate parameter, is a Poisson distribution, and that the rate parameter itself is considered as a random variable. Hence it is a special case of a compound probability distribution. Mixed Poisson distributions can be found in actuarial mathematics as a general approach for the distribution of the number of claims and is also examined as an epidemiological model. It should not be confused with compound Poisson distribution or compound Poisson process.

<span class="mw-page-title-main">Neyman Type A distribution</span> Compound Poisson-family discrete probability distribution

In statistics and probability, the Neyman Type A distribution is a discrete probability distribution from the family of Compound Poisson distribution. First of all, to easily understand this distribution we will demonstrate it with the following example explained in Univariate Discret Distributions; we have a statistical model of the distribution of larvae in a unit area of field by assuming that the variation in the number of clusters of eggs per unit area could be represented by a Poisson distribution with parameter , while the number of larvae developing per cluster of eggs are assumed to have independent Poisson distribution all with the same parameter . If we want to know how many larvae there are, we define a random variable Y as the sum of the number of larvae hatched in each group. Therefore, Y = X1 + X2 + ... Xj, where X1,...,Xj are independent Poisson variables with parameter and .

References

  1. 1 2 Cohen, A. Clifford (1960). "Estimating parameters in a conditional Poisson distribution". Biometrics. 16 (2): 203–211. doi:10.2307/2527552. JSTOR   2527552.
  2. Singh, Jagbir (1978). "A characterization of positive Poisson distribution and its application". SIAM Journal on Applied Mathematics. 34: 545–548. doi:10.1137/0134043.
  3. "Stata Data Analysis Examples: Zero-Truncated Poisson Regression". UCLA Institute for Digital Research and Education. Retrieved 7 August 2013.
  4. Johnson, Norman L.; Kemp, Adrianne W.; Kotz, Samuel (2005). Univariate Discrete Distributions (third ed.). Hoboken, NJ: Wiley-Interscience.
  5. Borje, Gio (2016-06-01). "Zero-Truncated Poisson Distribution Sampling Algorithm". Archived from the original on 2018-08-26.
  6. Hardie, Ted (1 May 2005). "[R] simulate zero-truncated Poisson distribution". r-help (Mailing list). Retrieved 27 May 2022.