Parametric model

Last updated November 30, 2022

In statistics, a parametric model or parametric family or finite-dimensional model is a particular class of statistical models. Specifically, a parametric model is a family of probability distributions that has a finite number of parameters.

Definition

A statistical model is a collection of probability distributions on some sample space. We assume that the collection, $𝒫$ , is indexed by some set $Θ$ . The set $Θ$ is called the parameter set or, more commonly, the parameter space . For each $θ \in Θ$ , let $P θ$ denote the corresponding member of the collection; so $P θ$ is a cumulative distribution function. Then a statistical model can be written as

{\mathcal {P}}={\big \{}P_{\theta }\ {\big |}\ \theta \in \Theta {\big \}}.

The model is a parametric model if $Θ \subseteq ℝ k$ for some positive integer $k$ .

When the model consists of absolutely continuous distributions, it is often specified in terms of corresponding probability density functions:

{\mathcal {P}}={\big \{}f_{\theta }\ {\big |}\ \theta \in \Theta {\big \}}.

Examples

The Poisson family of distributions is parametrized by a single number $λ > 0$ :

{\mathcal {P}}={\Big \{}\ p_{\lambda }(j)={\tfrac {\lambda ^{j}}{j!}}e^{-\lambda },\ j=0,1,2,3,\dots \ {\Big |}\;\;\lambda >0\ {\Big \}},

where $p λ$ is the probability mass function. This family is an exponential family.

The normal family is parametrized by $θ = (μ, σ)$ , where $μ \in ℝ$ is a location parameter and $σ > 0$ is a scale parameter:

{\mathcal {P}}={\Big \{}\ f_{\theta }(x)={\tfrac {1}{{\sqrt {2\pi }}\sigma }}\exp \left(-{\tfrac {(x-\mu )^{2}}{2\sigma ^{2}}}\right)\ {\Big |}\;\;\mu \in \mathbb {R} ,\sigma >0\ {\Big \}}.

This parametrized family is both an exponential family and a location-scale family.

The Weibull translation model has a three-dimensional parameter $θ = (λ, β, μ)$ :

{\mathcal {P}}={\Big \{}\ f_{\theta }(x)={\tfrac {\beta }{\lambda }}\left({\tfrac {x-\mu }{\lambda }}\right)^{\beta -1}\!\exp \!{\big (}\!-\!{\big (}{\tfrac {x-\mu }{\lambda }}{\big )}^{\beta }{\big )}\,\mathbf {1} _{\{x>\mu \}}\ {\Big |}\;\;\lambda >0,\,\beta >0,\,\mu \in \mathbb {R} \ {\Big \}}.

The binomial model is parametrized by $θ = (n, p)$ , where $n$ is a non-negative integer and $p$ is a probability (i.e. $p \geq 0$ and $p \leq 1$ ):

{\mathcal {P}}={\Big \{}\ p_{\theta }(k)={\tfrac {n!}{k!(n-k)!}}\,p^{k}(1-p)^{n-k},\ k=0,1,2,\dots ,n\ {\Big |}\;\;n\in \mathbb {Z} _{\geq 0},\,p\geq 0\land p\leq 1{\Big \}}.

This example illustrates the definition for a model with some discrete parameters.

General remarks

A parametric model is called identifiable if the mapping $θ \mapsto P θ$ is invertible, i.e. there are no two different parameter values $θ 1$ and $θ 2$ such that $P θ 1 = P θ 2$ .

Comparisons with other classes of models

Parametric models are contrasted with the semi-parametric, semi-nonparametric, and non-parametric models, all of which consist of an infinite set of "parameters" for description. The distinction between these four classes is as follows:^{[ citation needed ]}

in a " parametric " model all the parameters are in finite-dimensional parameter spaces;
a model is " non-parametric " if all the parameters are in infinite-dimensional parameter spaces;
a "semi-parametric" model contains finite-dimensional parameters of interest and infinite-dimensional nuisance parameters;
a "semi-nonparametric" model has both finite-dimensional and infinite-dimensional unknown parameters of interest.

Some statisticians believe that the concepts "parametric", "non-parametric", and "semi-parametric" are ambiguous.^[1] It can also be noted that the set of all probability measures has cardinality of continuum, and therefore it is possible to parametrize any model at all by a single number in (0,1) interval.^[2] This difficulty can be avoided by considering only "smooth" parametric models.

Notes

↑ Le Cam & Yang 2000, §7.4
↑ Bickel et al. 1998 , p. 2

Bibliography

Bickel, Peter J.; Doksum, Kjell A. (2001), Mathematical Statistics: Basic and selected topics, vol. 1 (Second (updated printing 2007) ed.), Prentice-Hall
Bickel, Peter J.; Klaassen, Chris A. J.; Ritov, Ya’acov; Wellner, Jon A. (1998), Efficient and Adaptive Estimation for Semiparametric Models, Springer
Davison, A. C. (2003), Statistical Models, Cambridge University Press
Le Cam, Lucien; Yang, Grace Lo (2000), Asymptotics in Statistics: Some basic concepts, Springer
Lehmann, Erich L.; Casella, George (1998), Theory of Point Estimation (2nd ed.), Springer
Liese, Friedrich; Miescke, Klaus-J. (2008), Statistical Decision Theory: Estimation, testing, and selection, Springer
Pfanzagl, Johann; with the assistance of R. Hamböker (1994), Parametric Statistical Theory, Walter de Gruyter, MR 1291393

Related Research Articles

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after imposing some constraint. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In theoretical physics, a supermultiplet is a representation of a supersymmetry algebra.

In statistics, a mixture model is a probabilistic model for representing the presence of subpopulations within an overall population, without requiring that an observed data set should identify the sub-population to which an individual observation belongs. Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. However, while problems associated with "mixture distributions" relate to deriving the properties of the overall population from those of the sub-populations, "mixture models" are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population identity information.

Variational Bayesian methods are a family of techniques for approximating intractable integrals arising in Bayesian inference and machine learning. They are typically used in complex statistical models consisting of observed variables as well as unknown parameters and latent variables, with various sorts of relationships among the three types of random variables, as might be described by a graphical model. As typical in Bayesian inference, the parameters and latent variables are grouped together as "unobserved variables". Variational Bayesian methods are primarily used for two purposes:

To provide an analytical approximation to the posterior probability of the unobserved variables, in order to do statistical inference over these variables.
To derive a lower bound for the marginal likelihood of the observed data. This is typically used for performing model selection, the general idea being that a higher marginal likelihood for a given model indicates a better fit of the data by that model and hence a greater probability that the model in question was the one that generated the data.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

In Bayesian probability, the Jeffreys prior, named after Sir Harold Jeffreys, is a non-informative (objective) prior distribution for a parameter space; its density function is proportional to the square root of the determinant of the Fisher information matrix:

In theoretical physics, there are many theories with supersymmetry (SUSY) which also have internal gauge symmetries. Supersymmetric gauge theory generalizes this notion.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

Covariance matrix adaptation evolution strategy (CMA-ES) is a particular kind of strategy for numerical optimization. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation and selection: in each generation (iteration) new individuals are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value $. Like this, over the generation sequence, individuals with better and better -values are generated.$

In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability and statistics, the class of exponential dispersion models (EDM) is a set of probability distributions that represents a generalisation of the natural exponential family. Exponential dispersion models play an important role in statistical theory, in particular in generalized linear models because they have a special structure which enables deductions to be made about appropriate statistical inference.

In statistics, identifiability is a property which a model must satisfy for precise inference to be possible. A model is identifiable if it is theoretically possible to learn the true values of this model's underlying parameters after obtaining an infinite number of observations from it. Mathematically, this is equivalent to saying that different values of the parameters must generate different probability distributions of the observable variables. Usually the model is identifiable only under certain technical restrictions, in which case the set of these requirements is called the identification conditions.

In statistics, an adaptive estimator is an estimator in a parametric or semiparametric model with nuisance parameters such that the presence of these nuisance parameters does not affect efficiency of estimation.

In continuum mechanics, an Arruda–Boyce model is a hyperelastic constitutive model used to describe the mechanical behavior of rubber and other polymeric substances. This model is based on the statistical mechanics of a material with a cubic representative volume element containing eight chains along the diagonal directions. The material is assumed to be incompressible. The model is named after Ellen Arruda and Mary Cunningham Boyce, who published it in 1993.

In mathematics, a statistical manifold is a Riemannian manifold, each of whose points is a probability distribution. Statistical manifolds provide a setting for the field of information geometry. The Fisher information metric provides a metric on these manifolds. Following this definition, the log-likelihood function is a differentiable map and the score is an inclusion.

In statistics, the variance function is a smooth function which depicts the variance of a random quantity as a function of its mean. The variance function is a measure of heteroscedasticity and plays a large role in many settings of statistical modelling. It is a main ingredient in the generalized linear model framework and a tool used in non-parametric regression, semiparametric regression and functional data analysis. In parametric modeling, variance functions take on a parametric form and explicitly describe the relationship between the variance and the mean of a random quantity. In a non-parametric setting, the variance function is assumed to be a smooth function.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Le Cam & Yang 2000, §7.4

[2] Bickel et al. 1998 , p. 2

[1]

[2]