Monotone likelihood ratio

Last updated
A monotonic likelihood ratio in distributions ${\displaystyle f(x)}$ and ${\displaystyle g(x)}$

The ratio of the density functions above is increasing in the parameter ${\displaystyle x}$, so ${\displaystyle f(x)/g(x)}$ satisfies the monotone likelihood ratio property.

Contents

In statistics, the monotone likelihood ratio property is a property of the ratio of two probability density functions (PDFs). Formally, distributions ƒ(x) and g(x) bear the property if

${\displaystyle {\text{for every }}x_{1}>x_{0},\quad {\frac {f(x_{1})}{g(x_{1})}}\geq {\frac {f(x_{0})}{g(x_{0})}}}$

that is, if the ratio is nondecreasing in the argument ${\displaystyle x}$.

If the functions are first-differentiable, the property may sometimes be stated

${\displaystyle {\frac {\partial }{\partial x}}\left({\frac {f(x)}{g(x)}}\right)\geq 0}$

For two distributions that satisfy the definition with respect to some argument x, we say they "have the MLRP in x." For a family of distributions that all satisfy the definition with respect to some statistic T(X), we say they "have the MLR in T(X)."

Intuition

The MLRP is used to represent a data-generating process that enjoys a straightforward relationship between the magnitude of some observed variable and the distribution it draws from. If ${\displaystyle f(x)}$ satisfies the MLRP with respect to ${\displaystyle g(x)}$, the higher the observed value ${\displaystyle x}$, the more likely it was drawn from distribution ${\displaystyle f}$ rather than ${\displaystyle g}$. As usual for monotonic relationships, the likelihood ratio's monotonicity comes in handy in statistics, particularly when using maximum-likelihood estimation. Also, distribution families with MLR have a number of well-behaved stochastic properties, such as first-order stochastic dominance and increasing hazard ratios. Unfortunately, as is also usual, the strength of this assumption comes at the price of realism. Many processes in the world do not exhibit a monotonic correspondence between input and output.

Example: Working hard or slacking off

Suppose you are working on a project, and you can either work hard or slack off. Call your choice of effort ${\displaystyle e}$ and the quality of the resulting project ${\displaystyle q}$. If the MLRP holds for the distribution of q conditional on your effort ${\displaystyle e}$, the higher the quality the more likely you worked hard. Conversely, the lower the quality the more likely you slacked off.

1. Choose effort ${\displaystyle e\in \{H,L\}}$ where H means high, L means low
2. Observe ${\displaystyle q}$ drawn from ${\displaystyle f(q\mid e)}$. By Bayes' law with a uniform prior,
${\displaystyle \Pr[e=H\mid q]={\frac {f(q\mid H)}{f(q\mid H)+f(q\mid L)}}}$
3. Suppose ${\displaystyle f(q\mid e)}$ satisfies the MLRP. Rearranging, the probability the worker worked hard is
${\displaystyle {\frac {1}{1+f(q\mid L)/f(q\mid H)}}}$
which, thanks to the MLRP, is monotonically increasing in ${\displaystyle q}$ (because ${\displaystyle f(q\mid L)/f(q\mid H)}$ is decreasing in ${\displaystyle q}$). Hence if some employer is doing a "performance review" he can infer his employee's behavior from the merits of his work.

Families of distributions satisfying MLR

Statistical models often assume that data are generated by a distribution from some family of distributions and seek to determine that distribution. This task is simplified if the family has the monotone likelihood ratio property (MLRP).

A family of density functions ${\displaystyle \{f_{\theta }(x)\}_{\theta \in \Theta }}$ indexed by a parameter ${\displaystyle \theta }$ taking values in an ordered set ${\displaystyle \Theta }$ is said to have a monotone likelihood ratio (MLR) in the statistic ${\displaystyle T(X)}$ if for any ${\displaystyle \theta _{1}<\theta _{2}}$,

${\displaystyle {\frac {f_{\theta _{2}}(X=x_{1},x_{2},x_{3},\dots )}{f_{\theta _{1}}(X=x_{1},x_{2},x_{3},\dots )}}}$  is a non-decreasing function of ${\displaystyle T(X)}$.

Then we say the family of distributions "has MLR in ${\displaystyle T(X)}$".

List of families

Family${\displaystyle T(X)}$  in which ${\displaystyle f_{\theta }(X)}$ has the MLR
Exponential${\displaystyle [\lambda ]}$ ${\displaystyle \sum x_{i}}$ observations
Binomial${\displaystyle [n,p]}$ ${\displaystyle \sum x_{i}}$ observations
Poisson${\displaystyle [\lambda ]}$ ${\displaystyle \sum x_{i}}$ observations
Normal${\displaystyle [\mu ,\sigma ]}$ if ${\displaystyle \sigma }$ known, ${\displaystyle \sum x_{i}}$ observations

Hypothesis testing

If the family of random variables has the MLRP in ${\displaystyle T(X)}$, a uniformly most powerful test can easily be determined for the hypothesis ${\displaystyle H_{0}:\theta \leq \theta _{0}}$ versus ${\displaystyle H_{1}:\theta >\theta _{0}}$.

Example: Effort and output

Example: Let ${\displaystyle e}$ be an input into a stochastic technology – worker's effort, for instance – and ${\displaystyle y}$ its output, the likelihood of which is described by a probability density function ${\displaystyle f(y;e).}$ Then the monotone likelihood ratio property (MLRP) of the family ${\displaystyle f}$ is expressed as follows: for any ${\displaystyle e_{1},e_{2}}$, the fact that ${\displaystyle e_{2}>e_{1}}$ implies that the ratio ${\displaystyle f(y;e_{2})/f(y;e_{1})}$ is increasing in ${\displaystyle y}$.

Relation to other statistical properties

Monotone likelihoods are used in several areas of statistical theory, including point estimation and hypothesis testing, as well as in probability models.

Exponential families

One-parameter exponential families have monotone likelihood-functions. In particular, the one-dimensional exponential family of probability density functions or probability mass functions with

${\displaystyle f_{\theta }(x)=c(\theta )h(x)\exp(\pi (\theta )T(x))}$

has a monotone non-decreasing likelihood ratio in the sufficient statistic T(x), provided that ${\displaystyle \pi (\theta )}$ is non-decreasing.

Most powerful tests: The Karlin–Rubin theorem

Monotone likelihood functions are used to construct uniformly most powerful tests, according to the Karlin–Rubin theorem. [1] Consider a scalar measurement having a probability density function parameterized by a scalar parameter θ, and define the likelihood ratio ${\displaystyle \ell (x)=f_{\theta _{1}}(x)/f_{\theta _{0}}(x)}$. If ${\displaystyle \ell (x)}$ is monotone non-decreasing, in ${\displaystyle x}$, for any pair ${\displaystyle \theta _{1}\geq \theta _{0}}$ (meaning that the greater ${\displaystyle x}$ is, the more likely ${\displaystyle H_{1}}$ is), then the threshold test:

${\displaystyle \varphi (x)={\begin{cases}1&{\text{if }}x>x_{0}\\0&{\text{if }}x
where ${\displaystyle x_{0}}$ is chosen so that ${\displaystyle \operatorname {E} _{\theta _{0}}\varphi (X)=\alpha }$

is the UMP test of size α for testing ${\displaystyle H_{0}:\theta \leq \theta _{0}{\text{ vs. }}H_{1}:\theta >\theta _{0}.}$

Note that exactly the same test is also UMP for testing ${\displaystyle H_{0}:\theta =\theta _{0}{\text{ vs. }}H_{1}:\theta >\theta _{0}.}$

Median unbiased estimation

Monotone likelihood-functions are used to construct median-unbiased estimators, using methods specified by Johann Pfanzagl and others. [2] [3] One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss functions. [3] (p713)

Lifetime analysis: Survival analysis and reliability

If a family of distributions ${\displaystyle f_{\theta }(x)}$ has the monotone likelihood ratio property in ${\displaystyle T(X)}$,

1. the family has monotone decreasing hazard rates in ${\displaystyle \theta }$ (but not necessarily in ${\displaystyle T(X)}$)
2. the family exhibits the first-order (and hence second-order) stochastic dominance in ${\displaystyle x}$, and the best Bayesian update of ${\displaystyle \theta }$ is increasing in ${\displaystyle T(X)}$.

But not conversely: neither monotone hazard rates nor stochastic dominance imply the MLRP.

Proofs

Let distribution family ${\displaystyle f_{\theta }}$ satisfy MLR in x, so that for ${\displaystyle \theta _{1}>\theta _{0}}$ and ${\displaystyle x_{1}>x_{0}}$:

${\displaystyle {\frac {f_{\theta _{1}}(x_{1})}{f_{\theta _{0}}(x_{1})}}\geq {\frac {f_{\theta _{1}}(x_{0})}{f_{\theta _{0}}(x_{0})}},}$

or equivalently:

${\displaystyle f_{\theta _{1}}(x_{1})f_{\theta _{0}}(x_{0})\geq f_{\theta _{1}}(x_{0})f_{\theta _{0}}(x_{1}).\,}$

Integrating this expression twice, we obtain:

 1. To ${\displaystyle x_{1}}$ with respect to ${\displaystyle x_{0}}${\displaystyle {\begin{aligned}&\int _{\min _{x}\in X}^{x_{1}}f_{\theta _{1}}(x_{1})f_{\theta _{0}}(x_{0})\,dx_{0}\\[6pt]\geq {}&\int _{\min _{x}\in X}^{x_{1}}f_{\theta _{1}}(x_{0})f_{\theta _{0}}(x_{1})\,dx_{0}\end{aligned}}}integrate and rearrange to obtain${\displaystyle {\frac {f_{\theta _{1}}}{f_{\theta _{0}}}}(x)\geq {\frac {F_{\theta _{1}}}{F_{\theta _{0}}}}(x)}$ 2. From ${\displaystyle x_{0}}$ with respect to ${\displaystyle x_{1}}${\displaystyle {\begin{aligned}&\int _{x_{0}}^{\max _{x}\in X}f_{\theta _{1}}(x_{1})f_{\theta _{0}}(x_{0})\,dx_{1}\\[6pt]\geq {}&\int _{x_{0}}^{\max _{x}\in X}f_{\theta _{1}}(x_{0})f_{\theta _{0}}(x_{1})\,dx_{1}\end{aligned}}}integrate and rearrange to obtain${\displaystyle {\frac {1-F_{\theta _{1}}(x)}{1-F_{\theta _{0}}(x)}}\geq {\frac {f_{\theta _{1}}}{f_{\theta _{0}}}}(x)}$

First-order stochastic dominance

Combine the two inequalities above to get first-order dominance:

${\displaystyle F_{\theta _{1}}(x)\leq F_{\theta _{0}}(x)\ \forall x}$

Monotone hazard rate

Use only the second inequality above to get a monotone hazard rate:

${\displaystyle {\frac {f_{\theta _{1}}(x)}{1-F_{\theta _{1}}(x)}}\leq {\frac {f_{\theta _{0}}(x)}{1-F_{\theta _{0}}(x)}}\ \forall x}$

Uses

Economics

The MLR is an important condition on the type distribution of agents in mechanism design.[ citation needed ] Most solutions to mechanism design models assume a type distribution to satisfy the MLR to take advantage of a common solution method.[ citation needed ]

Related Research Articles

In statistics, the likelihood function measures the goodness of fit of a statistical model to a sample of data for given values of the unknown parameters. It is formed from the joint probability distribution of the sample, but viewed and used as a function of the parameters only, thus treating the random variables as fixed at the observed values.

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models based on the ratio of their likelihoods, specifically one found by maximization over the entire parameter space and another found after imposing some constraint. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of a probability distribution by maximizing a likelihood function, so that under the assumed statistical model the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Fréchet (1927) and first applied by Rosin & Rammler (1933) to describe a particle size distribution.

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-square distribution are special cases of the gamma distribution. There are three different parametrizations in common use:

1. With a shape parameter k and a scale parameter θ.
2. With a shape parameter α = k and an inverse scale parameter β = 1/θ, called a rate parameter.
3. With a shape parameter k and a mean parameter μ = = α/β.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

Mechanism design is a field in economics and game theory that takes an objectives-first approach to designing economic mechanisms or incentives, toward desired objectives, in strategic settings, where players act rationally. Because it starts at the end of the game, then goes backwards, it is also called reverse game theory. It has broad applications, from economics and politics in such fields as market design, auction theory and social choice theory to networked-systems.

In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is a stochastic process with independent, stationary increments: it represents the motion of a point whose successive displacements are random, in which displacements in pairwise disjoint time intervals are independent, and displacements in different time intervals of the same length have identical probability distributions. A Lévy process may thus be viewed as the continuous-time analog of a random walk.

In convex analysis, a non-negative function f : RnR+ is logarithmically concave if its domain is a convex set, and if it satisfies the inequality

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, it has an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" than another. These are usually partial orders, so that one random variable may be neither stochastically greater than, less than nor equal to another random variable . Many different orders exist, which have different applications.

In statistical hypothesis testing, a uniformly most powerful (UMP) test is a hypothesis test which has the greatest power among all possible tests of a given size α. For example, according to the Neyman–Pearson lemma, the likelihood-ratio test is UMP for testing simple (point) hypotheses.

In probability theory and statistics, the half-normal distribution is a special case of the folded normal distribution.

In particle physics, CLs represents a statistical method for setting upper limits on model parameters, a particular form of interval estimation used for parameters that can take only non-negative values. Although CLs are said to refer to Confidence Levels, "The method's name is ... misleading, as the CLs exclusion region is not a confidence interval." It was first introduced by physicists working at the LEP experiment at CERN and has since been used by many high energy physics experiments. It is a frequentist method in the sense that the properties of the limit are defined by means of error probabilities, however it differs from standard confidence intervals in that the stated confidence level of the interval is not equal to its coverage probability. The reason for this deviation is that standard upper limits based on a most powerful test necessarily produce empty intervals with some fixed probability when the parameter value is zero, and this property is considered undesirable by most physicists and statisticians.

In Monte Carlo Estimation, exponential tilting (ET), exponential twisting, or exponential change of measure (ECM) is a distribution shifting technique commonly used in rare-event simulation, and rejection and importance sampling in particular. Exponential tilting is also used in Esscher tilting, an indirect Edgeworth approximation technique. The earliest formalization of ECM is often attributed to Esscher with its use in importance sampling being attributed to David Siegmund. ET is known as the Esscher transform in mathematical finance and is used in such contexts as insurance futures pricing.

References

1. Casella, G.; Berger, R.L. (2008), Statistical Inference, Brooks/Cole. ISBN   0-495-39187-5 (Theorem 8.3.17)
2. Pfanzagl, Johann (1979). "On optimal median unbiased estimators in the presence of nuisance parameters". Annals of Statistics . 7 (1): 187–193. doi:.
3. Brown, L. D.; Cohen, Arthur; Strawderman, W. E. (1976). "A Complete Class Theorem for Strict Monotone Likelihood Ratio With Applications". Ann. Statist. 4 (4): 712–722. doi:.