In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. [1] The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.
The concept underlying the method is based on the probability integral transform, in that a set of independent random samples derived from any random variable should on average be uniformly distributed with respect to the cumulative distribution function of the random variable. The MPS method chooses the parameter values that make the observed data as uniform as possible, according to a specific quantitative measure of uniformity.
One of the most common methods for estimating the parameters of a distribution from data, the method of maximum likelihood (MLE), can break down in various cases, such as involving certain mixtures of continuous distributions. [2] In these cases the method of maximum spacing estimation may be successful.
Apart from its use in pure mathematics and statistics, the trial applications of the method have been reported using data from fields such as hydrology, [3] econometrics, [4] magnetic resonance imaging, [5] and others. [6]
The MSE method was derived independently by Russel Cheng and Nik Amin at the University of Wales Institute of Science and Technology, and Bo Ranneby at the Swedish University of Agricultural Sciences. [2] The authors explained that due to the probability integral transform at the true parameter, the “spacing” between each observation should be uniformly distributed. This would imply that the difference between the values of the cumulative distribution function at consecutive observations should be equal. This is the case that maximizes the geometric mean of such spacings, so solving for the parameters that maximize the geometric mean would achieve the “best” fit as defined this way. Ranneby (1984) justified the method by demonstrating that it is an estimator of the Kullback–Leibler divergence, similar to maximum likelihood estimation, but with more robust properties for some classes of problems.
There are certain distributions, especially those with three or more parameters, whose likelihoods may become infinite along certain paths in the parameter space. Using maximum likelihood to estimate these parameters often breaks down, with one parameter tending to the specific value that causes the likelihood to be infinite, rendering the other parameters inconsistent. The method of maximum spacings, however, being dependent on the difference between points on the cumulative distribution function and not individual likelihood points, does not have this issue, and will return valid results over a much wider array of distributions. [1]
The distributions that tend to have likelihood issues are often those used to model physical phenomena. Hall & al. (2004) seek to analyze flood alleviation methods, which requires accurate models of river flood effects. The distributions that better model these effects are all three-parameter models, which suffer from the infinite likelihood issue described above, leading to Hall's investigation of the maximum spacing procedure. Wong & Li (2006), when comparing the method to maximum likelihood, use various data sets ranging from a set on the oldest ages at death in Sweden between 1905 and 1958 to a set containing annual maximum wind speeds.
Given an iid random sample {x1, ..., xn} of size n from a univariate distribution with continuous cumulative distribution function F(x;θ0), where θ0 ∈ Θ is an unknown parameter to be estimated, let {x(1), ..., x(n)} be the corresponding ordered sample, that is the result of sorting of all observations from smallest to largest. For convenience also denote x(0) = −∞ and x(n+1) = +∞.
Define the spacings as the “gaps” between the values of the distribution function at adjacent ordered points: [7]
Then the maximum spacing estimator of θ0 is defined as a value that maximizes the logarithm of the geometric mean of sample spacings:
By the inequality of arithmetic and geometric means, function Sn(θ) is bounded from above by −ln(n+1), and thus the maximum has to exist at least in the supremum sense.
Note that some authors define the function Sn(θ) somewhat differently. In particular, Ranneby (1984) multiplies each Di by a factor of (n+1), whereas Cheng & Stephens (1989) omit the 1⁄n+1 factor in front of the sum and add the “−” sign in order to turn the maximization into minimization. As these are constants with respect to θ, the modifications do not alter the location of the maximum of the function Sn.
This section presents two examples of calculating the maximum spacing estimator.
Suppose two values x(1) = 2, x(2) = 4 were sampled from the exponential distribution F(x;λ) = 1 − e−xλ, x ≥ 0 with unknown parameter λ > 0. In order to construct the MSE we have to first find the spacings:
i | F(x(i)) | F(x(i−1)) | Di = F(x(i)) − F(x(i−1)) |
---|---|---|---|
1 | 1 − e−2λ | 0 | 1 − e−2λ |
2 | 1 − e−4λ | 1 − e−2λ | e−2λ − e−4λ |
3 | 1 | 1 − e−4λ | e−4λ |
The process continues by finding the λ that maximizes the geometric mean of the “difference” column. Using the convention that ignores taking the (n+1)st root, this turns into the maximization of the following product: (1 − e−2λ) · (e−2λ − e−4λ) · (e−4λ). Letting μ = e−2λ, the problem becomes finding the maximum of μ5−2μ4+μ3. Differentiating, the μ has to satisfy 5μ4−8μ3+3μ2 = 0. This equation has roots 0, 0.6, and 1. As μ is actually e−2λ, it has to be greater than zero but less than one. Therefore, the only acceptable solution is
which corresponds to an exponential distribution with a mean of 1⁄λ ≈ 3.915. For comparison, the maximum likelihood estimate of λ is the inverse of the sample mean, 3, so λMLE = ⅓ ≈ 0.333.
Suppose {x(1), ..., x(n)} is the ordered sample from a uniform distribution U(a,b) with unknown endpoints a and b. The cumulative distribution function is F(x;a,b) = (x−a)/(b−a) when x∈[a,b]. Therefore, individual spacings are given by
Calculating the geometric mean and then taking the logarithm, statistic Sn will be equal to
Here only three terms depend on the parameters a and b. Differentiating with respect to those parameters and solving the resulting linear system, the maximum spacing estimates will be
These are known to be the uniformly minimum variance unbiased (UMVU) estimators for the continuous uniform distribution. [1] In comparison, the maximum likelihood estimates for this problem and are biased and have higher mean-squared error.
The maximum spacing estimator is a consistent estimator in that it converges in probability to the true value of the parameter, θ0, as the sample size increases to infinity. [2] The consistency of maximum spacing estimation holds under much more general conditions than for maximum likelihood estimators. In particular, in cases where the underlying distribution is J-shaped, maximum likelihood will fail where MSE succeeds. [1] An example of a J-shaped density is the Weibull distribution, specifically a shifted Weibull, with a shape parameter less than 1. The density will tend to infinity as x approaches the location parameter rendering estimates of the other parameters inconsistent.
Maximum spacing estimators are also at least as asymptotically efficient as maximum likelihood estimators, where the latter exist. However, MSEs may exist in cases where MLEs do not. [1]
Maximum spacing estimators are sensitive to closely spaced observations, and especially ties. [8] Given
we get
When the ties are due to multiple observations, the repeated spacings (those that would otherwise be zero) should be replaced by the corresponding likelihood. [1] That is, one should substitute for , as
since .
When ties are due to rounding error, Cheng & Stephens (1989) suggest another method to remove the effects. [note 1] Given r tied observations from xi to xi+r−1, let δ represent the round-off error. All of the true values should then fall in the range . The corresponding points on the distribution should now fall between and . Cheng and Stephens suggest assuming that the rounded values are uniformly spaced in this interval, by defining
The MSE method is also sensitive to secondary clustering. [8] One example of this phenomenon is when a set of observations is thought to come from a single normal distribution, but in fact comes from a mixture normals with different means. A second example is when the data is thought to come from an exponential distribution, but actually comes from a gamma distribution. In the latter case, smaller spacings may occur in the lower tail. A high value of M(θ) would indicate this secondary clustering effect, and suggesting a closer look at the data is required. [8]
The statistic Sn(θ) is also a form of Moran or Moran-Darling statistic, M(θ), which can be used to test goodness of fit. [note 2] It has been shown that the statistic, when defined as
is asymptotically normal, and that a chi-squared approximation exists for small samples. [8] In the case where we know the true parameter , Cheng & Stephens (1989) show that the statistic has a normal distribution with
where γ is the Euler–Mascheroni constant which is approximately 0.57722. [note 3]
The distribution can also be approximated by that of , where
,
in which
and where follows a chi-squared distribution with degrees of freedom. Therefore, to test the hypothesis that a random sample of values comes from the distribution , the statistic can be calculated. Then should be rejected with significance if the value is greater than the critical value of the appropriate chi-squared distribution. [8]
Where θ0 is being estimated by , Cheng & Stephens (1989) showed that has the same asymptotic mean and variance as in the known case. However, the test statistic to be used requires the addition of a bias correction term and is:
where is the number of parameters in the estimate.
Ranneby & Ekström (1997) generalized the MSE method to approximate other measures besides the Kullback–Leibler measure. Ekström (1997) further expanded the method to investigate properties of estimators using higher order spacings, where an m-order spacing would be defined as .
Ranneby & al. (2005) discuss extended maximum spacing methods to the multivariate case. As there is no natural order for , they discuss two alternative approaches: a geometric approach based on Dirichlet cells and a probabilistic approach based on a “nearest neighbor ball” metric.
The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:
In estimation theory and statistics, the Cramér–Rao bound (CRB) relates to estimation of a deterministic parameter. The result is named in honor of Harald Cramér and C. R. Rao, but has also been derived independently by Maurice Fréchet, Georges Darmois, and by Alexander Aitken and Harold Silverstone. It states that the precision of any unbiased estimator is at most the Fisher information; or (equivalently) the reciprocal of the Fisher information is a lower bound on its variance.
In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.
Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:
In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of maximum likelihood estimation.
In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.
In statistics, the method of moments is a method of estimation of population parameters. The same principle is used to derive higher moments like skewness and kurtosis.
In statistics, the Bayesian information criterion (BIC) or Schwarz information criterion is a criterion for model selection among a finite set of models; models with lower BIC are generally preferred. It is based, in part, on the likelihood function and it is closely related to the Akaike information criterion (AIC).
The James–Stein estimator is a biased estimator of the mean, , of (possibly) correlated Gaussian distributed random variables with unknown means .
In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.
In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.
Location estimation in wireless sensor networks is the problem of estimating the location of an object from a set of noisy measurements. These measurements are acquired in a distributed manner by a set of sensors.
Minimum-distance estimation (MDE) is a conceptual method for fitting a statistical model to data, usually the empirical distribution. Often-used estimators such as ordinary least squares can be thought of as special cases of minimum-distance estimation.
In probability theory and directional statistics, a wrapped Cauchy distribution is a wrapped probability distribution that results from the "wrapping" of the Cauchy distribution around the unit circle. The Cauchy distribution is sometimes known as a Lorentzian distribution, and the wrapped Cauchy distribution may sometimes be referred to as a wrapped Lorentzian distribution.
In probability theory and statistics, empirical likelihood (EL) is a nonparametric method for estimating the parameters of statistical models. It requires fewer assumptions about the error distribution while retaining some of the merits in likelihood-based inference. The estimation method requires that the data are independent and identically distributed (iid). It performs well even when the distribution is asymmetric or censored. EL methods can also handle constraints and prior information on parameters. Art Owen pioneered work in this area with his 1988 paper.
In probability theory and statistics, the Hermite distribution, named after Charles Hermite, is a discrete probability distribution used to model count data with more than one parameter. This distribution is flexible in terms of its ability to allow a moderate over-dispersion in the data.
In statistics, the variance function is a smooth function that depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.