This article needs additional citations for verification .(October 2022) |
Probability mass function n = 5 where n = b − a + 1 | |||
Cumulative distribution function | |||
Notation | or | ||
---|---|---|---|
Parameters | integers with | ||
Support | |||
PMF | |||
CDF | |||
Mean | |||
Median | |||
Mode | N/A | ||
Variance | |||
Skewness | |||
Excess kurtosis | |||
Entropy | |||
MGF | |||
CF | |||
PGF |
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein each of some finite whole number n of outcome values are equally likely to be observed. Thus every one of the n outcome values has equal probability 1/n. Intuitively, a discrete uniform distribution is "a known, finite number of outcomes all equally likely to happen."
A simple example of the discrete uniform distribution comes from throwing a fair six-sided die. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of each given value is 1/6. If two dice were thrown and their values added, the possible sums would not have equal probability and so the distribution of sums of two dice rolls is not uniform.
Although it is common to consider discrete uniform distributions over a contiguous range of integers, such as in this six-sided die example, one can define discrete uniform distributions over any finite set. For instance, the six-sided die could have abstract symbols rather than numbers on each of its faces. Less simply, a random permutation is a permutation generated uniformly randomly from the permutations of a given set and a uniform spanning tree of a graph is a spanning tree selected with uniform probabilities from the full set of spanning trees of the graph.
The discrete uniform distribution itself is non-parametric. However, in the common case that its possible outcome values are the integers in an interval , then a and b are parameters of the distribution and In these cases the cumulative distribution function (CDF) of the discrete uniform distribution can be expressed, for any k, as
or simply
on the distribution's support
The problem of estimating the maximum of a discrete uniform distribution on the integer interval from a sample of k observations is commonly known as the German tank problem, following the practical application of this maximum estimation problem, during World War II, by Allied forces seeking to estimate German tank production.
A uniformly minimum variance unbiased (UMVU) estimator for the distribution's maximum in terms of m, the sample maximum, and k, the sample size, is
. [1]
This can be seen as a very simple case of maximum spacing estimation.
This has a variance of
so a standard deviation of approximately , the population-average gap size between samples.
The sample maximum itself is the maximum likelihood estimator for the population maximum, but it is biased.
If samples from a discrete uniform distribution are not numbered in order but are recognizable or markable, one can instead estimate population size via a mark and recapture method.
See rencontres numbers for an account of the probability distribution of the number of fixed points of a uniformly distributed random permutation.
The family of uniform discrete distributions over ranges of integers with one or both bounds unknown has a finite-dimensional sufficient statistic, namely the triple of the sample maximum, sample minimum, and sample size.
Uniform discrete distributions over bounded integer ranges do not constitute an exponential family of distributions because their support varies with their parameters.
For families of distributions in which their supports do not depend on their parameters, the Pitman–Koopman–Darmois theorem states that only exponential families have sufficient statistics of dimensions that are bounded as sample size increases. The uniform distribution is thus a simple example showing the necessity of the conditions for this theorem.
In probability theory and statistics, the binomial distribution with parameters n and p is the discrete probability distribution of the number of successes in a sequence of n independent experiments, each asking a yes–no question, and each with its own Boolean-valued outcome: success or failure. A single success/failure experiment is also called a Bernoulli trial or Bernoulli experiment, and a sequence of outcomes is called a Bernoulli process; for a single trial, i.e., n = 1, the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis for the popular binomial test of statistical significance.
The median of a set of numbers is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics.
In probability theory and statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is The parameter is the mean or expectation of the distribution, while the parameter is the variance. The standard deviation of the distribution is (sigma). A random variable with a Gaussian distribution is said to be normally distributed, and is called a normal deviate.
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.
A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.
In probability theory and statistics, the geometric distribution is either one of two discrete probability distributions:
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory and statistics, the gamma distribution is a versatile two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:
In probability theory and statistics, the continuous uniform distributions or rectangular distributions are a family of symmetric probability distributions. Such a distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, and which are the minimum and maximum values. The interval can either be closed or open. Therefore, the distribution is often abbreviated where stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable under no constraint other than that it is contained in the distribution's support.
Probability theory and statistics have some commonly used conventions, in addition to standard mathematical notation and mathematical symbols.
In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.
In probability theory, a member of the (a, b, 0) class of distributions is any distribution of a discrete random variable N whose values are nonnegative integers whose probability mass function satisfies the recurrence formula
In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.
The generalized normal distribution (GND) or generalized Gaussian distribution (GGD) is either of two families of parametric continuous probability distributions on the real line. Both families add a shape parameter to the normal distribution. To distinguish the two families, they are referred to below as "symmetric" and "asymmetric"; however, this is not a standard nomenclature.
In probability theory and statistics, the Poisson distribution is a discrete probability distribution that expresses the probability of a given number of events occurring in a fixed interval of time if these events occur with a known constant mean rate and independently of the time since the last event. It can also be used for the number of events in other types of intervals than time, and in dimension greater than 1.
In probability theory and statistics, the Van Houtum distribution is a discrete probability distribution named after prof. Geert-Jan van Houtum. It can be characterized by saying that all values of a finite set of possible values are equally probable, except for the smallest and largest element of this set. Since the Van Houtum distribution is a generalization of the discrete uniform distribution, i.e. it is uniform except possibly at its boundaries, it is sometimes also referred to as quasi-uniform.
In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.
In cryptography, a public key exchange algorithm is a cryptographic algorithm which allows two parties to create and share a secret key, which they can use to encrypt messages between themselves. The ring learning with errors key exchange (RLWE-KEX) is one of a new class of public key exchange algorithms that are designed to be secure against an adversary that possesses a quantum computer. This is important because some public key algorithms in use today will be easily broken by a quantum computer if such computers are implemented. RLWE-KEX is one of a set of post-quantum cryptographic algorithms which are based on the difficulty of solving certain mathematical problems involving lattices. Unlike older lattice based cryptographic algorithms, the RLWE-KEX is provably reducible to a known hard problem in lattices.