# Scale parameter

Last updated

In probability theory and statistics, a scale parameter is a special kind of numerical parameter of a parametric family of probability distributions. The larger the scale parameter, the more spread out the distribution.

Probability theory is the branch of mathematics concerned with probability. Although there are several different probability interpretations, probability theory treats the concept in a rigorous mathematical manner by expressing it through a set of axioms. Typically these axioms formalise probability in terms of a probability space, which assigns a measure taking values between 0 and 1, termed the probability measure, to a set of outcomes called the sample space. Any specified subset of these outcomes is called an event.

Statistics is a branch of mathematics dealing with data collection, organization, analysis, interpretation and presentation. In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied. Populations can be diverse topics such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. See glossary of probability and statistics.

In mathematics and its applications, a parametric family or a parameterized family is a family of objects whose differences depend only on the chosen values for a set of parameters.

## Definition

If a family of probability distributions is such that there is a parameter s (and other parameters θ) for which the cumulative distribution function satisfies

In probability theory and statistics, a probability distribution is a mathematical function that provides the probabilities of occurrence of different possible outcomes in an experiment. In more technical terms, the probability distribution is a description of a random phenomenon in terms of the probabilities of events. For instance, if the random variable X is used to denote the outcome of a coin toss, then the probability distribution of X would take the value 0.5 for X = heads, and 0.5 for X = tails. Examples of random phenomena can include the results of an experiment or survey.

In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable , or just distribution function of , evaluated at , is the probability that will take a value less than or equal to .

${\displaystyle F(x;s,\theta )=F(x/s;1,\theta ),\!}$

then s is called a scale parameter, since its value determines the "scale" or statistical dispersion of the probability distribution. If s is large, then the distribution will be more spread out; if s is small then it will be more concentrated.

The scale ratio of a model represents the proportional ratio of a linear dimension of the model to the same feature of the original. Examples include a 3-dimensional scale model of a building or the scale drawings of the elevations or plans of a building. In such cases the scale is dimensionless and exact throughout the model or drawing.

In statistics, dispersion is the extent to which a distribution is stretched or squeezed. Common examples of measures of statistical dispersion are the variance, standard deviation, and interquartile range.

If the probability density exists for all values of the complete parameter set, then the density (as a function of the scale parameter only) satisfies

In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function whose value at any given sample in the sample space can be interpreted as providing a relative likelihood that the value of the random variable would equal that sample. In other words, while the absolute likelihood for a continuous random variable to take on any particular value is 0, the value of the PDF at two different samples can be used to infer, in any particular draw of the random variable, how much more likely it is that the random variable would equal one sample compared to the other sample.

${\displaystyle f_{s}(x)=f(x/s)/s,\!}$

where f is the density of a standardized version of the density, i.e. ${\displaystyle f(x)\equiv f_{s=1}(x)}$.

An estimator of a scale parameter is called an estimator of scale.

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished.

### Families with Location Parameters

In the case where a parametrized family has a location parameter, a slightly different definition is often used as follows. If we denote the location parameter by ${\displaystyle m}$, and the scale parameter by ${\displaystyle s}$, then we require that ${\displaystyle F(x;s,m,\theta )=F((x-m)/s;1,0,\theta )}$ where ${\displaystyle F(x,s,m,\theta )}$ is the cmd for the parametrized family [1] . This modification is necessary in order for the standard deviation of a non-central Gaussian to be a scale parameter, since otherwise the mean would change when we rescale ${\displaystyle x}$. However, this alternative definition is not consistently used [2] .

In statistics, a location family is a class of probability distributions that is parametrized by a scalar- or vector-valued parameter , which determines the "location" or shift of the distribution. Formally, this means that the probability density functions or probability mass functions in this class have the form

### Simple manipulations

We can write ${\displaystyle f_{s}}$ in terms of ${\displaystyle g(x)=x/s}$, as follows:

${\displaystyle f_{s}(x)=f\left({\frac {x}{s}}\right)\cdot {\frac {1}{s}}=f(g(x))g'(x).}$

Because f is a probability density function, it integrates to unity:

${\displaystyle 1=\int _{-\infty }^{\infty }f(x)\,dx=\int _{g(-\infty )}^{g(\infty )}f(x)\,dx.}$

By the substitution rule of integral calculus, we then have

${\displaystyle 1=\int _{-\infty }^{\infty }f(g(x))g'(x)\,dx=\int _{-\infty }^{\infty }f_{s}(x)\,dx.}$

So ${\displaystyle f_{s}}$ is also properly normalized.

## Rate parameter

Some families of distributions use a rate parameter which is simply the reciprocal of the scale parameter. So for example the exponential distribution with scale parameter β and probability density

${\displaystyle f(x;\beta )={\frac {1}{\beta }}e^{-x/\beta },\;x\geq 0}$

could equivalently be written with rate parameter λ as

${\displaystyle f(x;\lambda )=\lambda e^{-\lambda x},\;x\geq 0.}$

## Examples

• The normal distribution has two parameters: a location parameter ${\displaystyle \mu }$ and a scale parameter ${\displaystyle \sigma }$. In practice the normal distribution is often parameterized in terms of the squared scale ${\displaystyle \sigma ^{2}}$, which corresponds to the variance of the distribution.
• The gamma distribution is usually parameterized in terms of a scale parameter ${\displaystyle \theta }$ or its inverse.
• Special cases of distributions where the scale parameter equals unity may be called "standard" under certain conditions. For example, if the location parameter equals zero and the scale parameter equals one, the normal distribution is known as the standard normal distribution, and the Cauchy distribution as the standard Cauchy distribution.

## Estimation

A statistic can be used to estimate a scale parameter so long as it:

• Is location-invariant,
• Scales linearly with the scale parameter, and
• Converges as the sample size grows.

Various measures of statistical dispersion satisfy these. In order to make the statistic a consistent estimator for the scale parameter, one must in general multiply the statistic by a constant scale factor. This scale factor is defined as the theoretical value of the value obtained by dividing the required scale parameter by the asymptotic value of the statistic. Note that the scale factor depends on the distribution in question.

For instance, in order to use the median absolute deviation (MAD) to estimate the standard deviation of the normal distribution, one must multiply it by the factor

${\displaystyle 1/\Phi ^{-1}(3/4)\approx 1.4826,}$

where Φ−1 is the quantile function (inverse of the cumulative distribution function) for the standard normal distribution. (See MAD for details.) That is, the MAD is not a consistent estimator for the standard deviation of a normal distribution, but 1.4826... MAD is a consistent estimator. Similarly, the average absolute deviation needs to be multiplied by approximately 1.2533 to be a consistent estimator for standard deviation. Different factors would be required to estimate the standard deviation if the population did not follow a normal distribution.

## Related Research Articles

In statistics, the likelihood function is the joint probability distribution of observed data expressed as a function of statistical parameters. It describes the relative probability or odds of obtaining the observed data for all permissible values of the parameters, and is used to identify the particular parameter values that are most plausible given the observed data.

In probability theory and statistics, the exponential distribution is the probability distribution that describes the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts.

In statistics, a statistic is sufficient with respect to a statistical model and its associated unknown parameter if "no other statistic that can be calculated from the same sample provides any additional information as to the value of the parameter". In particular, a statistic is sufficient for a family of probability distributions if the sample from which it is calculated gives no additional information than does the statistic, as to which of those probability distributions is that of the population from which the sample was taken.

In statistics, completeness is a property of a statistic in relation to a model for a set of observed data. In essence, it ensures that the distributions corresponding to different values of the parameters are distinct.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

1. With a shape parameter k and a scale parameter θ.
2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
3. With a shape parameter k and a mean parameter μ = = α/β.

In statistics, the Rao–Blackwell theorem, sometimes referred to as the Rao–Blackwell–Kolmogorov theorem, is a result which characterizes the transformation of an arbitrarily crude estimator into an estimator that is optimal by the mean-squared-error criterion or any of a variety of similar criteria.

Directional statistics is the subdiscipline of statistics that deals with directions, axes or rotations in Rn. More generally, directional statistics deals with observations on compact Riemannian manifolds.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; it is proportional to the square root of the determinant of the Fisher information matrix:

In econometrics and statistics, the generalized method of moments (GMM) is a generic method for estimating parameters in statistical models. Usually it is applied in the context of semiparametric models, where the parameter of interest is finite-dimensional, whereas the full shape of the data's distribution function may not be known, and therefore maximum likelihood estimation is not applicable.

In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.

Robust statistics are statistics with good performance for data drawn from a wide range of probability distributions, especially for distributions that are not normal. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from parametric distributions. For example, robust methods work well for mixtures of two normal distributions with different standard-deviations; under this model, non-robust methods like a t-test work poorly.

In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. Otherwise the estimator is said to be biased. In statistics, "bias" is an objective property of an estimator, and while not a desired property, it is not pejorative, unlike the ordinary English use of the term "bias".

In statistics, the concept of being an invariant estimator is a criterion that can be used to compare the properties of different estimators for the same quantity. It is a way of formalising the idea that an estimator should have certain intuitively appealing qualities. Strictly speaking, "invariant" would mean that the estimates themselves are unchanged when both the measurements and the parameters are transformed in a compatible way, but the meaning has been extended to allow the estimates to change in appropriate ways with such transformations. The term equivariant estimator is used in formal mathematical contexts that include a precise description of the relation of the way the estimator changes in response to changes to the dataset and parameterisation: this corresponds to the use of "equivariance" in more general mathematics.

In probability theory and directional statistics, a wrapped normal distribution is a wrapped probability distribution that results from the "wrapping" of the normal distribution around the unit circle. It finds application in the theory of Brownian motion and is a solution to the heat equation for periodic boundary conditions. It is closely approximated by the von Mises distribution, which, due to its mathematical simplicity and tractability, is the most commonly used distribution in directional statistics.

In statistics, maximum spacing estimation, or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, asymptotic theory, or large sample theory, is a framework for assessing properties of estimators and statistical tests. Within this framework, it is typically assumed that the sample size n grows indefinitely; the properties of estimators and tests are then evaluated in the limit as n → ∞. In practice, a limit evaluation is treated as being approximately valid for large finite sample sizes, as well.

## References

1. Prokhorov, A.V. (7 February 2011). "Scale parameter". Encyclopedia of Mathematics. Springer. Retrieved 7 February 2019.
2. Koski, Timo. "Scale parameter". KTH Royal Institute of Technology. Retrieved 7 February 2019.
• Mood, A. M.; Graybill, F. A.; Boes, D. C. (1974). "VII.6.2 Scale invariance". Introduction to the theory of statistics (3rd ed.). New York: McGraw-Hill.