Monotone likelihood ratio

Last updated
A monotonic likelihood ratio in distributions and
MLRP-illustration.png

The ratio of the density functions above is monotone in the parameter , so satisfies the monotone likelihood ratio property.

Contents

In statistics, the monotone likelihood ratio property is a property of the ratio of two probability density functions (PDFs). Formally, distributions ƒ(x) and g(x) bear the property if

that is, if the ratio is nondecreasing in the argument .

If the functions are first-differentiable, the property may sometimes be stated

For two distributions that satisfy the definition with respect to some argument x, we say they "have the MLRP in x." For a family of distributions that all satisfy the definition with respect to some statistic T(X), we say they "have the MLR in T(X)."

Intuition

The MLRP is used to represent a data-generating process that enjoys a straightforward relationship between the magnitude of some observed variable and the distribution it draws from. If satisfies the MLRP with respect to , the higher the observed value , the more likely it was drawn from distribution rather than . As usual for monotonic relationships, the likelihood ratio's monotonicity comes in handy in statistics, particularly when using maximum-likelihood estimation. Also, distribution families with MLR have a number of well-behaved stochastic properties, such as first-order stochastic dominance and increasing hazard ratios. Unfortunately, as is also usual, the strength of this assumption comes at the price of realism. Many processes in the world do not exhibit a monotonic correspondence between input and output.

Example: Working hard or slacking off

Suppose you are working on a project, and you can either work hard or slack off. Call your choice of effort and the quality of the resulting project . If the MLRP holds for the distribution of q conditional on your effort , the higher the quality the more likely you worked hard. Conversely, the lower the quality the more likely you slacked off.

  1. Choose effort where H means high, L means low
  2. Observe drawn from . By Bayes' law with a uniform prior,
  3. Suppose satisfies the MLRP. Rearranging, the probability the worker worked hard is
which, thanks to the MLRP, is monotonically increasing in (because is decreasing in ). Hence if some employer is doing a "performance review" he can infer his employee's behavior from the merits of his work.

Families of distributions satisfying MLR

Statistical models often assume that data are generated by a distribution from some family of distributions and seek to determine that distribution. This task is simplified if the family has the monotone likelihood ratio property (MLRP).

A family of density functions indexed by a parameter taking values in an ordered set is said to have a monotone likelihood ratio (MLR) in the statistic if for any ,

  is a non-decreasing function of .

Then we say the family of distributions "has MLR in ".

List of families

Family  in which has the MLR
Exponential observations
Binomial observations
Poisson observations
Normal if known, observations

Hypothesis testing

If the family of random variables has the MLRP in , a uniformly most powerful test can easily be determined for the hypothesis versus .

Example: Effort and output

Example: Let be an input into a stochastic technology – worker's effort, for instance – and its output, the likelihood of which is described by a probability density function Then the monotone likelihood ratio property (MLRP) of the family is expressed as follows: for any , the fact that implies that the ratio is increasing in .

Relation to other statistical properties

Monotone likelihoods are used in several areas of statistical theory, including point estimation and hypothesis testing, as well as in probability models.

Exponential families

One-parameter exponential families have monotone likelihood-functions. In particular, the one-dimensional exponential family of probability density functions or probability mass functions with

has a monotone non-decreasing likelihood ratio in the sufficient statistic T(x), provided that is non-decreasing.

Uniformly most powerful tests: The Karlin–Rubin theorem

Monotone likelihood functions are used to construct uniformly most powerful tests, according to the Karlin–Rubin theorem. [1] Consider a scalar measurement having a probability density function parameterized by a scalar parameter θ, and define the likelihood ratio . If is monotone non-decreasing, in , for any pair (meaning that the greater is, the more likely is), then the threshold test:

where is chosen so that

is the UMP test of size α for testing

Note that exactly the same test is also UMP for testing

Median unbiased estimation

Monotone likelihood-functions are used to construct median-unbiased estimators, using methods specified by Johann Pfanzagl and others. [2] [3] One such procedure is an analogue of the Rao–Blackwell procedure for mean-unbiased estimators: The procedure holds for a smaller class of probability distributions than does the Rao–Blackwell procedure for mean-unbiased estimation but for a larger class of loss functions. [3] :713

Lifetime analysis: Survival analysis and reliability

If a family of distributions has the monotone likelihood ratio property in ,

  1. the family has monotone decreasing hazard rates in (but not necessarily in )
  2. the family exhibits the first-order (and hence second-order) stochastic dominance in , and the best Bayesian update of is increasing in .

But not conversely: neither monotone hazard rates nor stochastic dominance imply the MLRP.

Proofs

Let distribution family satisfy MLR in x, so that for and :

or equivalently:

Integrating this expression twice, we obtain:

1. To with respect to

integrate and rearrange to obtain

2. From with respect to

integrate and rearrange to obtain

First-order stochastic dominance

Combine the two inequalities above to get first-order dominance:

Monotone hazard rate

Use only the second inequality above to get a monotone hazard rate:

Uses

Economics

The MLR is an important condition on the type distribution of agents in mechanism design and economics of information, where Paul Milgrom defined "favorableness" of signals (in terms of stochastic dominance) as a consequence of MLR. [4] Most solutions to mechanism design models assume type distributions that satisfy the MLR to take advantage of solution methods that may be easier to apply and interpret.

Related Research Articles

The likelihood function is the joint probability of observed data viewed as a function of the parameters of a statistical model.

In statistics, the likelihood-ratio test assesses the goodness of fit of two competing statistical models, specifically one found by maximization over the entire parameter space and another found after imposing some constraint, based on the ratio of their likelihoods. If the constraint is supported by the observed data, the two likelihoods should not differ by more than sampling error. Thus the likelihood-ratio test tests whether this ratio is significantly different from one, or equivalently whether its natural logarithm is significantly different from zero.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

<span class="mw-page-title-main">Weibull distribution</span> Continuous probability distribution

In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It models a broad range of random variables, largely in the nature of a time to failure or time between events. Examples are maximum one-day rainfalls and the time a user spends on a web page.

<span class="mw-page-title-main">Gamma distribution</span> Probability distribution

In probability theory and statistics, the gamma distribution is a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. There are two equivalent parameterizations in common use:

  1. With a shape parameter and a scale parameter .
  2. With a shape parameter and an inverse scale parameter , called a rate parameter.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In numerical analysis and computational statistics, rejection sampling is a basic technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm" and is a type of exact simulation method. The method works for any distribution in with a density.

In mathematical statistics, the Fisher information is a way of measuring the amount of information that an observable random variable X carries about an unknown parameter θ of a distribution that models X. Formally, it is the variance of the score, or the expected value of the observed information.

In probability theory, a Lévy process, named after the French mathematician Paul Lévy, is a stochastic process with independent, stationary increments: it represents the motion of a point whose successive displacements are random, in which displacements in pairwise disjoint time intervals are independent, and displacements in different time intervals of the same length have identical probability distributions. A Lévy process may thus be viewed as the continuous-time analog of a random walk.

In mathematical statistics, the Kullback–Leibler divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.

In convex analysis, a non-negative function f : RnR+ is logarithmically concave if its domain is a convex set, and if it satisfies the inequality

In statistics, the score test assesses constraints on statistical parameters based on the gradient of the likelihood function—known as the score—evaluated at the hypothesized parameter value under the null hypothesis. Intuitively, if the restricted estimator is near the maximum of the likelihood function, the score should not differ from zero by more than sampling error. While the finite sample distributions of score tests are generally unknown, they have an asymptotic χ2-distribution under the null hypothesis as first proved by C. R. Rao in 1948, a fact that can be used to determine statistical significance.

<span class="mw-page-title-main">Single-crossing condition</span>

In monotone comparative statics, the single-crossing condition or single-crossing property refers to a condition where the relationship between two or more functions is such that they will only cross once. For example, a mean-preserving spread will result in an altered probability distribution whose cumulative distribution function will intersect with the original's only once.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

A ratio distribution is a probability distribution constructed as the distribution of the ratio of random variables having two other known distributions. Given two random variables X and Y, the distribution of the random variable Z that is formed as the ratio Z = X/Y is a ratio distribution.

In probability theory and statistics, a stochastic order quantifies the concept of one random variable being "bigger" than another. These are usually partial orders, so that one random variable may be neither stochastically greater than, less than nor equal to another random variable . Many different orders exist, which have different applications.

In statistical hypothesis testing, a uniformly most powerful (UMP) test is a hypothesis test which has the greatest power among all possible tests of a given size α. For example, according to the Neyman–Pearson lemma, the likelihood-ratio test is UMP for testing simple (point) hypotheses.

In particle physics, CLs represents a statistical method for setting upper limits on model parameters, a particular form of interval estimation used for parameters that can take only non-negative values. Although CLs are said to refer to Confidence Levels, "The method's name is ... misleading, as the CLs exclusion region is not a confidence interval." It was first introduced by physicists working at the LEP experiment at CERN and has since been used by many high energy physics experiments. It is a frequentist method in the sense that the properties of the limit are defined by means of error probabilities, however it differs from standard confidence intervals in that the stated confidence level of the interval is not equal to its coverage probability. The reason for this deviation is that standard upper limits based on a most powerful test necessarily produce empty intervals with some fixed probability when the parameter value is zero, and this property is considered undesirable by most physicists and statisticians.

In probability theory and statistics, empirical likelihood (EL) is a nonparametric method for estimating the parameters of statistical models. It requires fewer assumptions about the error distribution while retaining some of the merits in likelihood-based inference. The estimation method requires that the data are independent and identically distributed (iid). It performs well even when the distribution is asymmetric or censored. EL methods can also handle constraints and prior information on parameters. Art Owen pioneered work in this area with his 1988 paper.

Exponential Tilting (ET), Exponential Twisting, or Exponential Change of Measure (ECM) is a distribution shifting technique used in many parts of mathematics. The different exponential tiltings of a random variable is known as the natural exponential family of .

References

  1. Casella, G.; Berger, R.L. (2008), Statistical Inference, Brooks/Cole. ISBN   0-495-39187-5 (Theorem 8.3.17)
  2. Pfanzagl, Johann (1979). "On optimal median unbiased estimators in the presence of nuisance parameters". Annals of Statistics . 7 (1): 187–193. doi: 10.1214/aos/1176344563 .
  3. 1 2 Brown, L. D.; Cohen, Arthur; Strawderman, W. E. (1976). "A Complete Class Theorem for Strict Monotone Likelihood Ratio With Applications". Ann. Statist. 4 (4): 712–722. doi: 10.1214/aos/1176343543 .
  4. Milgrom, P. R. (1981). Good News and Bad News: Representation Theorems and Applications. The Bell Journal of Economics, 12(2), 380–391. https://doi.org/10.2307/3003562