# Scale parameter

Last updated

In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.

## Definition

If a family of probability distributions is such that there is a parameter s (and other parameters θ) for which the cumulative distribution function satisfies

${\displaystyle F(x;s,\theta )=F(x/s;1,\theta ),\!}$

then s is called a scale parameter, since its value determines the "scale" or statistical dispersion of the probability distribution. If s is large, then the distribution will be more spread out; if s is small then it will be more concentrated.

If the probability density exists for all values of the complete parameter set, then the density (as a function of the scale parameter only) satisfies

${\displaystyle f_{s}(x)=f(x/s)/s,\!}$

where f is the density of a standardized version of the density, i.e. ${\displaystyle f(x)\equiv f_{s=1}(x)}$.

An estimator of a scale parameter is called an estimator of scale.

### Families with Location Parameters

In the case where a parametrized family has a location parameter, a slightly different definition is often used as follows. If we denote the location parameter by ${\displaystyle m}$, and the scale parameter by ${\displaystyle s}$, then we require that ${\displaystyle F(x;s,m,\theta )=F((x-m)/s;1,0,\theta )}$ where ${\displaystyle F(x,s,m,\theta )}$ is the cmd for the parametrized family. [1] This modification is necessary in order for the standard deviation of a non-central Gaussian to be a scale parameter, since otherwise the mean would change when we rescale ${\displaystyle x}$. However, this alternative definition is not consistently used. [2]

### Simple manipulations

We can write ${\displaystyle f_{s}}$ in terms of ${\displaystyle g(x)=x/s}$, as follows:

${\displaystyle f_{s}(x)=f\left({\frac {x}{s}}\right)\cdot {\frac {1}{s}}=f(g(x))g'(x).}$

Because f is a probability density function, it integrates to unity:

${\displaystyle 1=\int _{-\infty }^{\infty }f(x)\,dx=\int _{g(-\infty )}^{g(\infty )}f(x)\,dx.}$

By the substitution rule of integral calculus, we then have

${\displaystyle 1=\int _{-\infty }^{\infty }f(g(x))g'(x)\,dx=\int _{-\infty }^{\infty }f_{s}(x)\,dx.}$

So ${\displaystyle f_{s}}$ is also properly normalized.

## Rate parameter

Some families of distributions use a rate parameter (or "inverse scale parameter"), which is simply the reciprocal of the scale parameter. So for example the exponential distribution with scale parameter β and probability density

${\displaystyle f(x;\beta )={\frac {1}{\beta }}e^{-x/\beta },\;x\geq 0}$

could equivalently be written with rate parameter λ as

${\displaystyle f(x;\lambda )=\lambda e^{-\lambda x},\;x\geq 0.}$

## Examples

• The uniform distribution can be parameterized with a location parameter of ${\displaystyle (a+b)/2}$ and a scale parameter ${\displaystyle |b-a|}$.
• The normal distribution has two parameters: a location parameter ${\displaystyle \mu }$ and a scale parameter ${\displaystyle \sigma }$. In practice the normal distribution is often parameterized in terms of the squared scale ${\displaystyle \sigma ^{2}}$, which corresponds to the variance of the distribution.
• The gamma distribution is usually parameterized in terms of a scale parameter ${\displaystyle \theta }$ or its inverse.
• Special cases of distributions where the scale parameter equals unity may be called "standard" under certain conditions. For example, if the location parameter equals zero and the scale parameter equals one, the normal distribution is known as the standard normal distribution, and the Cauchy distribution as the standard Cauchy distribution.

## Estimation

A statistic can be used to estimate a scale parameter so long as it:

• Is location-invariant,
• Scales linearly with the scale parameter, and
• Converges as the sample size grows.

Various measures of statistical dispersion satisfy these. In order to make the statistic a consistent estimator for the scale parameter, one must in general multiply the statistic by a constant scale factor. This scale factor is defined as the theoretical value of the value obtained by dividing the required scale parameter by the asymptotic value of the statistic. Note that the scale factor depends on the distribution in question.

For instance, in order to use the median absolute deviation (MAD) to estimate the standard deviation of the normal distribution, one must multiply it by the factor

${\displaystyle 1/\Phi ^{-1}(3/4)\approx 1.4826,}$

where Φ−1 is the quantile function (inverse of the cumulative distribution function) for the standard normal distribution. (See MAD for details.) That is, the MAD is not a consistent estimator for the standard deviation of a normal distribution, but 1.4826... MAD is a consistent estimator. Similarly, the average absolute deviation needs to be multiplied by approximately 1.2533 to be a consistent estimator for standard deviation. Different factors would be required to estimate the standard deviation if the population did not follow a normal distribution.

## Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.

In statistics, a location parameter of a probability distribution is a scalar- or vector-valued parameter , which determines the "location" or shift of the distribution. In the literature of location parameter estimation, the probability distributions with such parameter are found to be formally defined in one of the following equivalent ways:

In statistics, the likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than the statistic, as to which of those probability distributions is the sampling distribution.

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

1. With a shape parameter k and a scale parameter θ.
2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
3. With a shape parameter k and a mean parameter μ = = α/β.

In Bayesian probability theory, if the posterior distributions p(θ | x) are in the same probability distribution family as the prior probability distribution p(θ), the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood function p(x | θ). For example, the Gaussian family is conjugate to itself with respect to a Gaussian likelihood function: if the likelihood function is Gaussian, choosing a Gaussian prior over the mean will ensure that the posterior distribution is also Gaussian. This means that the Gaussian distribution is a conjugate prior for the likelihood that is also Gaussian. The concept, as well as the term "conjugate prior", were introduced by Howard Raiffa and Robert Schlaifer in their work on Bayesian decision theory. A similar concept had been discovered independently by George Alfred Barnard.

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In Bayesian statistics, a maximum a posteriori probability (MAP) estimate is an estimate of an unknown quantity, that equals the mode of the posterior distribution. The MAP can be used to obtain a point estimate of an unobserved quantity on the basis of empirical data. It is closely related to the method of maximum likelihood (ML) estimation, but employs an augmented optimization objective which incorporates a prior distribution over the quantity one wants to estimate. MAP estimation can therefore be seen as a regularization of ML estimation.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; it is proportional to the square root of the determinant of the Fisher information matrix:

In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias can also be measured with respect to the median, rather than the mean, in which case one distinguishes median-unbiased from the usual mean-unbiasedness property. Bias is a distinct concept from consistency. Consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics.

In probability and statistics, a compound probability distribution is the probability distribution that results from assuming that a random variable is distributed according to some parametrized distribution, with the parameters of that distribution themselves being random variables. If the parameter is a scale parameter, the resulting mixture is also called a scale mixture.

## References

1. Prokhorov, A.V. (7 February 2011). "Scale parameter". Encyclopedia of Mathematics. Springer. Retrieved 7 February 2019.
2. Koski, Timo. "Scale parameter". KTH Royal Institute of Technology. Retrieved 7 February 2019.
• Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "VII.6.2 Scale invariance". Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.