Discrete Weibull distribution

Last updated
Discrete Weibull
Parameters scale
shape
Support
PMF
CDF

In probability theory and statistics, the discrete Weibull distribution is the discrete variant of the Weibull distribution. The Discrete Weibull Distribution, first introduced by Toshio Nakagawa and Shunji Osaki, is a discrete analog of the continuous Weibull distribution, predominantly used in reliability engineering. It is particularly applicable for modeling failure data measured in discrete units like cycles or shocks. This distribution provides a versatile tool for analyzing scenarios where the timing of events is counted in distinct intervals, making it distinctively useful in fields that deal with discrete data patterns and reliability analysis.

Contents


Alternative parametrizations

In the original paper by Nakagawa and Osaki they used the parametrization making the cumulative distribution function

The CDF of the Discrete Weibull Distribution with a q value of 0.5 and k values of 1 through 5. The B values are as follows: Red = 0.5, Green = 1.0, Blue = 1.5, Purple = 2.0, Orange = 2.5. CDF-2.png
The CDF of the Discrete Weibull Distribution with a q value of 0.5 and k values of 1 through 5. The B values are as follows: Red = 0.5, Green = 1.0, Blue = 1.5, Purple = 2.0, Orange = 2.5.

with and the probability mass function

The PMF of the Discrete Weibull Distribution with a q value of 0.5 and k values of 1 through 5. The B values are as follows: Red = 0.5, Green = 1.0, Blue = 1.5, Purple = 2.0, Orange = 2.5. PMF-2.png
The PMF of the Discrete Weibull Distribution with a q value of 0.5 and k values of 1 through 5. The B values are as follows: Red = 0.5, Green = 1.0, Blue = 1.5, Purple = 2.0, Orange = 2.5.

. Setting makes the relationship with the geometric distribution apparent. [1]

An alternative parametrization — related to the Pareto distribution — has been used to estimate parameters in infectious disease modelling. [2] This parametrization introduces a parameter , meaning that the term can be replaced with . Therefore, the probability mass function can be expressed as

,

and the cumulative mass function can be expressed as

.

Location-scale transformation

The continuous Weibull distribution has a close relationship with the Gumbel distribution which is easy to see when log-transforming the variable. A similar transformation can be made on the discrete Weibull.

Define where (unconventionally) and define parameters and . By replacing in the cumulative mass function:

We see that we get a location-scale parametrization:

which in estimation settings makes a lot of sense. This opens up the possibility of regression with frameworks developed for Weibull regression and extreme-value-theory. [3]

Comparison to Other Discrete Distributions

The discrete Weibull distribution can be compared with other common discrete distributions such as the Poisson, geometric, and negative binomial distributions, each of which has unique characteristics and applications.

Discrete Weibull vs. Poisson Distribution: The Poisson distribution is often used to model the number of rare event occurrences during a fixed period of time. It is characterized by a single parameter, λ, which is both the mean and variance of the distribution. The discrete Weibull distribution, on the other hand, is more flexible and can handle both over- and under-dispersion in count data. It has two parameters, q and β, which influence the shape and scale of the distribution. Unlike the Poisson distribution, which assumes events occur independently, the discrete Weibull can adapt to different event occurrence patterns.

Discrete Weibull vs. Geometric Distribution: The geometric distribution models the probability of the first success in a sequence of Bernoulli trials and is characterized by a single parameter, p, which is the probability of success on an individual trial. In contrast, the discrete Weibull distribution can model a broader range of data patterns due to its two parameters. While the geometric distribution is specifically for modeling the number of trials until the first success, the discrete Weibull can be used in a wider variety of scenarios, including those where the probability of success changes over trials.

Discrete Weibull vs. Negative Binomial Distribution: The negative binomial distribution is used to model the number of Bernoulli trials needed before a particular number of successes is achieved. It is characterized by the probability of success and the number of successes. The discrete Weibull distribution, with its flexibility in modeling different data patterns, can be a better fit for data that does not conform to the specific scenario modeled by the negative binomial distribution.

Overall the discrete Weibull distribution is preferred over these alternatives when dealing with data that exhibit variability in dispersion (over- or under-dispersion) or when the data patterns do not fit the specific scenarios that Poisson, geometric, or negative binomial distributions are best suited for. Its adaptability in terms of shape and scale makes it a versatile tool in statistical modeling of discrete data. [4]

Applications

The Discrete Weibull distribution finds diverse applications in statistical analysis, as evidenced by various scholarly papers. One such paper illustrates the distribution's utility in modeling count data, specifically in the context of fertility plans. This study highlights how the Discrete Weibull distribution effectively captures complex relationships influenced by factors like education and family background. Unlike the Poisson distribution, it adeptly manages both overdispersed and underdispersed data, demonstrating its flexibility and efficacy in social science research. This application marks a significant extension of the distribution's usage beyond its traditional role in reliability engineering. [5]

Further expanding its scope, "On Bivariate Discrete Weibull Distribution" explores the application of the Discrete Weibull distribution to bivariate data. The paper delves into sophisticated statistical techniques, including maximum likelihood estimation and Bayesian inference, for analyzing bivariate discrete data. This exploration underscores the distribution's compatibility with complex statistical methods. Moreover, the paper presents practical analysis scenarios, such as examining football match scores and nasal drainage severity, highlighting the distribution's broad applicability across varied fields. These instances underscore the distribution's practicality in real-world situations, moving beyond mere theoretical constructs. [6]

Another significant advancement is presented in "The Exponentiated Discrete Weibull Distribution," which introduces an enhanced version of the distribution, termed the Exponentiated Discrete Weibull Distribution (EDW). This generalization increases the model's flexibility, enabling it to represent a broader spectrum of data patterns, including various hazard rate functions like increasing, decreasing, bathtub-shaped, and inverted bathtub-shaped. The EDW distribution's ability to model both overdispersed and underdispersed data, relative to a Poisson distribution, broadens its applicability. It proves to be a versatile tool for various fields, including reliability engineering and failure time studies, further broadening the distribution's practical utility. [7]

See also

Related Research Articles

<span class="mw-page-title-main">Binomial distribution</span> Probability distribution

In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on a die as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

<span class="mw-page-title-main">Exponential distribution</span> Probability distribution

In probability theory and statistics, the exponential distribution or negative exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

<span class="mw-page-title-main">Geometric distribution</span> Probability distribution

In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:

<span class="mw-page-title-main">Beta distribution</span> Probability distribution

In probability theory and statistics, the beta distribution is a family of continuous probability distributions defined on the interval [0, 1] or in terms of two positive parameters, denoted by alpha (α) and beta (β), that appear as exponents of the variable and its complement to 1, respectively, and control the shape of the distribution.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability and statistics, an exponential family is a parametric set of probability distributions of a certain form, specified below. This special form is chosen for mathematical convenience, including the enabling of the user to calculate expectations, covariances using differentiation based on some useful algebraic properties, as well as for generality, as exponential families are in a sense very natural sets of distributions to consider. The term exponential class is sometimes used in place of "exponential family", or the older term Koopman–Darmois family. The terms "distribution" and "family" are often used loosely: specifically, an exponential family is a set of distributions, where the specific distribution varies with the parameter; however, a parametric family of distributions is often referred to as "a distribution", and the set of all exponential families is sometimes loosely referred to as "the" exponential family. They are distinct because they possess a variety of desirable properties, most importantly the existence of a sufficient statistic.

<span class="mw-page-title-main">Logistic distribution</span> Continuous probability distribution

In probability theory and statistics, the logistic distribution is a continuous probability distribution. Its cumulative distribution function is the logistic function, which appears in logistic regression and feedforward neural networks. It resembles the normal distribution in shape but has heavier tails. The logistic distribution is a special case of the Tukey lambda distribution.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In Bayesian probability theory, if the posterior distribution is in the same probability distribution family as the prior probability distribution , the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function .

In probability theory, a compound Poisson distribution is the probability distribution of the sum of a number of independent identically-distributed random variables, where the number of terms to be added is itself a Poisson-distributed variable. The result can be either a continuous or a discrete distribution.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

<span class="mw-page-title-main">Inverse-gamma distribution</span> Two-parameter family of continuous probability distributions

In probability theory and statistics, the inverse gamma distribution is a two-parameter family of continuous probability distributions on the positive real line, which is the distribution of the reciprocal of a variable distributed according to the gamma distribution.

In statistics, Poisson regression is a generalized linear model form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a log-linear model, especially when used to model contingency tables.

In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.

<span class="mw-page-title-main">Relationships among probability distributions</span> Topic in probability theory and statistics

In probability theory and statistics, there are several relationships among probability distributions. These relations can be categorized in the following groups:

<span class="mw-page-title-main">Poisson distribution</span> Discrete probability distribution

In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event. It is named after French mathematician Siméon Denis Poisson. The Poisson distribution can also be used for the number of events in other specified interval types such as distance, area, or volume. It plays an important role for discrete-stable distributions.

<span class="mw-page-title-main">Hermite distribution</span> Statistical probability Distribution for discrete event counts

In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.

<span class="mw-page-title-main">Asymmetric Laplace distribution</span> Continuous probability distribution

In probability theory and statistics, the asymmetric Laplace distribution (ALD) is a continuous probability distribution which is a generalization of the Laplace distribution. Just as the Laplace distribution consists of two exponential distributions of equal scale back-to-back about x = m, the asymmetric Laplace consists of two exponential distributions of unequal scale back to back about x = m, adjusted to assure continuity and normalization. The difference of two variates exponentially distributed with different means and rate parameters will be distributed according to the ALD. When the two rate parameters are equal, the difference will be distributed according to the Laplace distribution.

<span class="mw-page-title-main">Kaniadakis Weibull distribution</span> Continuous probability distribution

The Kaniadakis Weibull distribution is a probability distribution arising as a generalization of the Weibull distribution. It is one example of a Kaniadakis κ-distribution. The κ-Weibull distribution has been adopted successfully for describing a wide variety of complex systems in seismology, economy, epidemiology, among many others.

References

  1. Nakagawa, Toshio; Osaki, Shunji (1975). "The discrete Weibull distribution". IEEE Transactions on Reliability. 24 (5): 300–301. doi:10.1109/TR.1975.5214915. S2CID   6149392.
  2. Endo A, Murayama H, Abbott S, et al. (2022). "Heavy-tailed sexual contact networks and monkeypox epidemiology in the global outbreak, 2022". Science. 378 (6615): 90–94. doi: 10.1126/science.add4507 . PMID   36137054.
  3. Scholz, Fritz (1996). "Maximum Likelihood Estimation for Type I Censored Weibull Data Including Covariates". ISSTECH-96-022, Boeing Information & Support Services. Retrieved 26 April 2016.
  4. PyMC Developers. (n.d.). PyMC3 3.11.5 Documentation: Discrete distributions. Retrieved from https://docs.pymc.io/en/v3/api/distributions/discrete.html
  5. Alina Peluso, Veronica Vinciotti, Keming Yu, Discrete Weibull Generalized Additive Model: An Application to Count Fertility Data, Journal of the Royal Statistical Society Series C: Applied Statistics, Volume 68, Issue 3, April 2019, Pages 565–583, https://doi.org/10.1111/rssc.12311
  6. Debasis Kundu & Vahid Nekoukhou (2019) On bivariate discrete Weibull distribution, Communications in Statistics - Theory and Methods, 48:14, 3464-3481, DOI: 10.1080/03610926.2018.1476712
  7. Nekoukhou, Vahid; Bidram, Hamid. “The exponentiated discrete Weibull Distribution”. SORT-Statistics and Operations Research Transactions, 2015, Vol. 39, Num. 1, pp. 127-146, https://raco.cat/index.php/SORT/article/view/294381.