In statistics, an L-estimator is an estimator which is a linear combination of order statistics of the measurements ( also called an L-statistic). This can be as little as a single point, as in the median (of an odd number of values), or as many as all points, as in the mean.
The main benefits of L-estimators are that they are often extremely simple, and often robust statistics: assuming sorted data, they are very easy to calculate and interpret, and are often resistant to outliers. They thus are useful in robust statistics, as descriptive statistics, in statistics education, and when computation is difficult. However, they are inefficient, and in modern times robust statistics M-estimators are preferred, although these are much more difficult computationally. In many circumstances L-estimators are reasonably efficient, and thus adequate for initial estimation.
A basic example is the median. Given n values , if is odd, the median equals , the -th order statistic; if is even, it is the average of two order statistics: . These are both linear combinations of order statistics, and the median is therefore a simple example of an L-estimator.
A more detailed list of examples includes: with a single point, the maximum, the minimum, or any single order statistic or quantile; with one or two points, the median; with two points, the mid-range, the range, the midsummary (trimmed mid-range, including the midhinge), and the trimmed range (including the interquartile range and interdecile range); with three points, the trimean; with a fixed fraction of the points, the trimmed mean (including interquartile mean) and the Winsorized mean; with all points, the mean.
Note that some of these (such as median, or mid-range) are measures of central tendency, and are used as estimators for a location parameter, such as the mean of a normal distribution, while others (such as range or trimmed range) are measures of statistical dispersion, and are used as estimators of a scale parameter, such as the standard deviation of a normal distribution.
L-estimators can also measure the shape of a distribution, beyond location and scale. For example, the midhinge minus the median is a 3-term L-estimator that measures the skewness, and other differences of midsummaries give measures of asymmetry at different points in the tail. [1]
Sample L-moments are L-estimators for the population L-moment, and have rather complex expressions. L-moments are generally treated separately; see that article for details.
L-estimators are often statistically resistant, having a high breakdown point. This is defined as the fraction of the measurements which can be arbitrarily changed without causing the resulting estimate to tend to infinity (i.e., to "break down"). The breakdown point of an L-estimator is given by the closest order statistic to the minimum or maximum: for instance, the median has a breakdown point of 50% (the highest possible), and a n% trimmed or Winsorized mean has a breakdown point of n%.
Not all L-estimators are robust; if it includes the minimum or maximum, then it has a breakdown point of 0. These non-robust L-estimators include the minimum, maximum, mean, and mid-range. The trimmed equivalents are robust, however.
Robust L-estimators used to measure dispersion, such as the IQR, provide robust measures of scale.
In practical use in robust statistics, L-estimators have been replaced by M-estimators, which provide robust statistics that also have high relative efficiency, at the cost of being much more computationally complex and opaque.
However, the simplicity of L-estimators means that they are easily interpreted and visualized, and makes them suited for descriptive statistics and statistics education; many can even be computed mentally from a five-number summary or seven-number summary, or visualized from a box plot. L-estimators play a fundamental role in many approaches to non-parametric statistics.
Though non-parametric, L-estimators are frequently used for parameter estimation, as indicated by the name, though they must often be adjusted to yield an unbiased consistent estimator. The choice of L-estimator and adjustment depend on the distribution whose parameter is being estimated.
For example, when estimating a location parameter, for a symmetric distribution a symmetric L-estimator (such as the median or midhinge) will be unbiased. However, if the distribution has skew, symmetric L-estimators will generally be biased and require adjustment. For example, in a skewed distribution, the nonparametric skew (and Pearson's skewness coefficients) measure the bias of the median as an estimator of the mean.
When estimating a scale parameter, such as when using an L-estimator as a robust measures of scale, such as to estimate the population variance or population standard deviation, one generally must multiply by a scale factor to make it an unbiased consistent estimator; see scale parameter: estimation.
For example, dividing the IQR by (using the error function) makes it an unbiased, consistent estimator for the population standard deviation if the data follow a normal distribution.
L-estimators can also be used as statistics in their own right – for example, the median is a measure of location, and the IQR is a measure of dispersion. In these cases, the sample statistics can act as estimators of their own expected value; for example, the sample median is an estimator of the population median.
Beyond simplicity, L-estimators are also frequently easy to calculate and robust.
Assuming sorted data, L-estimators involving only a few points can be calculated with far fewer mathematical operations than efficient estimates. [2] [3] Before the advent of electronic calculators and computers, these provided a useful way to extract much of the information from a sample with minimal labour. These remained in practical use through the early and mid 20th century, when automated sorting of punch card data was possible, but computation remained difficult, [2] and is still of use today, for estimates given a list of numerical values in non-machine-readable form, where data input is more costly than manual sorting. They also allow rapid estimation.
L-estimators are often much more robust than maximally efficient conventional methods – the median is maximally statistically resistant, having a 50% breakdown point, and the X% trimmed mid-range has an X% breakdown point, while the sample mean (which is maximally efficient) is minimally robust, breaking down for a single outlier.
While L-estimators are not as efficient as other statistics, they often have reasonably high relative efficiency, and show that a large fraction of the information used in estimation can be obtained using only a few points – as few as one, two, or three. Alternatively, they show that order statistics contain a significant amount of information.
For example, in terms of efficiency, given a sample of a normally-distributed numerical parameter, the arithmetic mean (average) for the population can be estimated with maximum efficiency by computing the sample mean – adding all the members of the sample and dividing by the number of members.
However, for a large data set (over 100 points) from a symmetric population, the mean can be estimated reasonably efficiently relative to the best estimate by L-estimators. Using a single point, this is done by taking the median of the sample, with no calculations required (other than sorting); this yields an efficiency of 64% or better (for all n). Using two points, a simple estimate is the midhinge (the 25% trimmed mid-range), but a more efficient estimate is the 29% trimmed mid-range, that is, averaging the two values 29% of the way in from the smallest and the largest values: the 29th and 71st percentiles; this has an efficiency of about 81%. [3] For three points, the trimean (average of median and midhinge) can be used, though the average of the 20th, 50th, and 80th percentile yields 88% efficiency. Using further points yield higher efficiency, though it is notable that only 3 points are needed for very high efficiency.
For estimating the standard deviation of a normal distribution, the scaled interdecile range gives a reasonably efficient estimator, though instead taking the 7% trimmed range (the difference between the 7th and 93rd percentiles) and dividing by 3 (corresponding to 86% of the data of a normal distribution falling within 1.5 standard deviations of the mean) yields an estimate of about 65% efficiency. [3]
For small samples, L-estimators are also relatively efficient: the midsummary of the 3rd point from each end has an efficiency around 84% for samples of size about 10, and the range divided by has reasonably good efficiency for sizes up to 20, though this drops with increasing n and the scale factor can be improved (efficiency 85% for 10 points). Other heuristic estimators for small samples include the range over n (for standard error), and the range squared over the median (for the chi-squared of a Poisson distribution). [3]
In statistics, a central tendency is a central or typical value for a probability distribution.
In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.
The median of a set of numbers is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics.
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined.
In statistics, point estimation involves the use of sample data to calculate a single value which is to serve as a "best guess" or "best estimate" of an unknown population parameter. More formally, it is the application of a point estimator to the data to obtain a point estimate.
The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, mode, or the result of any other measure of central tendency or any reference value related to the given data set. AAD includes the mean absolute deviation and the median absolute deviation.
A truncated mean or trimmed mean is a statistical measure of central tendency, much like the mean and median. It involves the calculation of the mean after discarding given parts of a probability distribution or sample at the high and low end, and typically discarding an equal amount of both. This number of points to be discarded is usually given as a percentage of the total number of points, but may also be given as a fixed number of points.
This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.
In robust statistics, robust regression seeks to overcome some limitations of traditional regression analysis. A regression analysis models the relationship between one or more independent variables and a dependent variable. Standard types of regression, such as ordinary least squares, have favourable properties if their underlying assumptions are true, but can give misleading results otherwise. Robust regression methods are designed to limit the effect that violations of assumptions by the underlying data-generating process have on regression estimates.
In statistics, the mid-range or mid-extreme is a measure of central tendency of a sample defined as the arithmetic mean of the maximum and minimum values of the data set:
Robust statistics are statistics that maintain their properties even if the underlying distributional assumptions are incorrect. Robust statistical methods have been developed for many common problems, such as estimating location, scale, and regression parameters. One motivation is to produce statistical methods that are not unduly affected by outliers. Another motivation is to provide methods with good performance when there are small departures from a parametric distribution. For example, robust methods work well for mixtures of two normal distributions with different standard deviations; under this model, non-robust methods like a t-test work poorly.
In statistics, M-estimators are a broad class of extremum estimators for which the objective function is a sample average. Both non-linear least squares and maximum likelihood estimation are special cases of M-estimators. The definition of M-estimators was motivated by robust statistics, which contributed new types of M-estimators. However, M-estimators are not inherently robust, as is clear from the fact that they include maximum likelihood estimators, which are in general not robust. The statistical procedure of evaluating an M-estimator on a data set is called M-estimation.
In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.
In statistics, the Hodges–Lehmann estimator is a robust and nonparametric estimator of a population's location parameter. For populations that are symmetric about one median, such as the Gaussian or normal distribution or the Student t-distribution, the Hodges–Lehmann estimator is a consistent and median-unbiased estimate of the population median. For non-symmetric populations, the Hodges–Lehmann estimator estimates the "pseudo–median", which is closely related to the population median.
In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.
In statistics, the interdecile range is the difference between the first and the ninth deciles. The interdecile range is a measure of statistical dispersion of the values in a set of data, similar to the range and the interquartile range, and can be computed from the (non-parametric) seven-number summary.
In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation, which are greatly influenced by outliers.
In statistics, L-moments are a sequence of statistics used to summarize the shape of a probability distribution. They are linear combinations of order statistics (L-statistics) analogous to conventional moments, and can be used to calculate quantities analogous to standard deviation, skewness and kurtosis, termed the L-scale, L-skewness and L-kurtosis respectively. Standardised L-moments are called L-moment ratios and are analogous to standardized moments. Just as for conventional moments, a theoretical distribution has a set of population L-moments. Sample L-moments can be defined for a sample from the population, and can be used as estimators of the population L-moments.
In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.
In non-parametric statistics, the Theil–Sen estimator is a method for robustly fitting a line to sample points in the plane by choosing the median of the slopes of all lines through pairs of points. It has also been called Sen's slope estimator, slope selection, the single median method, the Kendall robust line-fit method, and the Kendall–Theil robust line. It is named after Henri Theil and Pranab K. Sen, who published papers on this method in 1950 and 1968 respectively, and after Maurice Kendall because of its relation to the Kendall tau rank correlation coefficient.
This article includes a list of general references, but it lacks sufficient corresponding inline citations .(April 2013) |