Point estimation

Last updated May 19, 2024 • 10 min readFrom Wikipedia, The Free Encyclopedia

In statistics, point estimation involves the use of sample data to calculate a single value (known as a point estimate since it identifies a point in some parameter space) which is to serve as a "best guess" or "best estimate" of an unknown population parameter (for example, the population mean). More formally, it is the application of a point estimator to the data to obtain a point estimate.

Properties of point estimates
Biasedness
Consistency
Efficiency
Sufficiency
Types of point estimation
Bayesian point estimation
Methods of finding point estimates
Method of maximum likelihood (MLE)
Method of moments (MOM)
Method of least square
Minimum-variance mean-unbiased estimator (MVUE)
Median unbiased estimator
Best linear unbiased estimator (BLUE)
Point estimate v.s. confidence interval estimate
See also
References
Further reading

Point estimation can be contrasted with interval estimation: such interval estimates are typically either confidence intervals, in the case of frequentist inference, or credible intervals, in the case of Bayesian inference. More generally, a point estimator can be contrasted with a set estimator. Examples are given by confidence sets or credible sets. A point estimator can also be contrasted with a distribution estimator. Examples are given by confidence distributions, randomized estimators, and Bayesian posteriors.

Properties of point estimates

Biasedness

“Bias” is defined as the difference between the expected value of the estimator and the true value of the population parameter being estimated. It can also be described that the closer the expected value of a parameter is to the measured parameter, the lesser the bias. When the estimated number and the true value is equal, the estimator is considered unbiased. This is called an unbiased estimator. The estimator will become a best unbiased estimator if it has minimum variance. However, a biased estimator with a small variance may be more useful than an unbiased estimator with a large variance.^[1] Most importantly, we prefer point estimators that have the smallest mean square errors.

If we let T = h(X₁,X₂, . . . , X_n) be an estimator based on a random sample X₁,X₂, . . . , X_n, the estimator T is called an unbiased estimator for the parameter θ if E[T] = θ, irrespective of the value of θ.^[1] For example, from the same random sample we have E(x̄) = μ (mean) and E(s²) = σ² (variance), then x̄ and s² would be unbiased estimators for μ and σ². The difference E[T ] − θ is called the bias of T ; if this difference is nonzero, then T is called biased.

Consistency

Consistency is about whether the point estimate stays close to the value when the parameter increases its size. The larger the sample size, the more accurate the estimate is. If a point estimator is consistent, its expected value and variance should be close to the true value of the parameter. An unbiased estimator is consistent if the limit of the variance of estimator T equals zero.

Efficiency

Let T₁ and T₂ be two unbiased estimators for the same parameter θ. The estimator T₂ would be called more efficient than estimator T₁ if Var(T₂) < Var(T₁), irrespective of the value of θ.^[1] We can also say that the most efficient estimators are the ones with the least variability of outcomes. Therefore, if the estimator has smallest variance among sample to sample, it is both most efficient and unbiased. We extend the notion of efficiency by saying that estimator T₂ is more efficient than estimator T₁ (for the same parameter of interest), if the MSE(mean square error) of T₂ is smaller than the MSE of T₁.^[1]

Generally, we must consider the distribution of the population when determining the efficiency of estimators. For example, in a normal distribution, the mean is considered more efficient than the median, but the same does not apply in asymmetrical, or skewed, distributions.

Sufficiency

In statistics, the job of a statistician is to interpret the data that they have collected and to draw statistically valid conclusion about the population under investigation. But in many cases the raw data, which are too numerous and too costly to store, are not suitable for this purpose. Therefore, the statistician would like to condense the data by computing some statistics and to base their analysis on these statistics so that there is no loss of relevant information in doing so, that is the statistician would like to choose those statistics which exhaust all information about the parameter, which is contained in the sample. We define sufficient statistics as follows: Let X =( X₁, X₂, ... ,X_n) be a random sample. A statistic T(X) is said to be sufficient for θ (or for the family of distribution) if the conditional distribution of X given T is free from θ.^[2]

Types of point estimation

Bayesian point estimation

Bayesian inference is typically based on the posterior distribution. Many Bayesian point estimators are the posterior distribution's statistics of central tendency, e.g., its mean, median, or mode:

Posterior mean, which minimizes the (posterior) risk (expected loss) for a squared-error loss function; in Bayesian estimation, the risk is defined in terms of the posterior distribution, as observed by Gauss.^[3]
Posterior median, which minimizes the posterior risk for the absolute-value loss function, as observed by Laplace.^[3]^[4]
maximum a posteriori (MAP), which finds a maximum of the posterior distribution; for a uniform prior probability, the MAP estimator coincides with the maximum-likelihood estimator;

The MAP estimator has good asymptotic properties, even for many difficult problems, on which the maximum-likelihood estimator has difficulties. For regular problems, where the maximum-likelihood estimator is consistent, the maximum-likelihood estimator ultimately agrees with the MAP estimator.^[5]^[6]^[7] Bayesian estimators are admissible, by Wald's theorem.^[6]^[8]

The Minimum Message Length (MML) point estimator is based in Bayesian information theory and is not so directly related to the posterior distribution.

Special cases of Bayesian filters are important:

Several methods of computational statistics have close connections with Bayesian analysis:

particle filter
Markov chain Monte Carlo (MCMC)

Methods of finding point estimates

Below are some commonly used methods of estimating unknown parameters which are expected to provide estimators having some of these important properties. In general, depending on the situation and the purpose of our study we apply any one of the methods that may be suitable among the methods of point estimation.

Method of maximum likelihood (MLE)

The method of maximum likelihood, due to R.A. Fisher, is the most important general method of estimation. This estimator method attempts to acquire unknown parameters that maximize the likelihood function. It uses a known model (ex. the normal distribution) and uses the values of parameters in the model that maximize a likelihood function to find the most suitable match for the data.^[9]

Let X = (X₁, X₂, ... ,X_n) denote a random sample with joint p.d.f or p.m.f. f(x, θ) (θ may be a vector). The function f(x, θ), considered as a function of θ, is called the likelihood function. In this case, it is denoted by L(θ). The principle of maximum likelihood consists of choosing an estimate within the admissible range of θ, that maximizes the likelihood. This estimator is called the maximum likelihood estimate (MLE) of θ. In order to obtain the MLE of θ, we use the equation

dlogL(θ)/dθ_i=0, i = 1, 2, …, k. If θ is a vector, then partial derivatives are considered to get the likelihood equations.^[2]

Method of moments (MOM)

The method of moments was introduced by K. Pearson and P. Chebyshev in 1887, and it is one of the oldest methods of estimation. This method is based on law of large numbers, which uses all the known facts about a population and apply those facts to a sample of the population by deriving equations that relate the population moments to the unknown parameters. We can then solve with the sample mean of the population moments.^[10] However, due to the simplicity, this method is not always accurate and can be biased easily.

Let (X₁, X₂,…X_n) be a random sample from a population having p.d.f. (or p.m.f) f(x,θ), θ = (θ₁, θ₂, …, θ_k). The objective is to estimate the parameters θ₁, θ₂, ..., θ_k. Further, let the first k population moments about zero exist as explicit function of θ, i.e. μ_r = μ_r(θ₁, θ₂,…, θ_k), r = 1, 2, …, k. In the method of moments, we equate k sample moments with the corresponding population moments. Generally, the first k moments are taken because the errors due to sampling increase with the order of the moment. Thus, we get k equations μ_r(θ₁, θ₂,…, θ_k) = m_r, r = 1, 2, …, k. Solving these equations we get the method of moment estimators (or estimates) as

m_r = 1/n ΣX_i^r.^[2] See also generalized method of moments.

Method of least square

In the method of least square, we consider the estimation of parameters using some specified form of the expectation and second moment of the observations. For

fitting a curve of the form y = f( x, β₀, β₁, ,,,, β_p) to the data (x_i, y_i), i = 1, 2,…n, we may use the method of least squares. This method consists of minimizing the

sum of squares.

When f(x, β₀, β₁, ,,,, β_p) is a linear function of the parameters and the x-values are known, least square estimators will be best linear unbiased estimator (BLUE). Again, if we assume that the least square estimates are independently and identically normally distributed, then a linear estimator will be minimum-variance unbiased estimator (MVUE) for the entire class of unbiased estimators. See also minimum mean squared error (MMSE).^[2]

Minimum-variance mean-unbiased estimator (MVUE)

The method of minimum-variance unbiased estimator minimizes the risk (expected loss) of the squared-error loss-function.

Median unbiased estimator

Median-unbiased estimator minimizes the risk of the absolute-error loss function.

Best linear unbiased estimator (BLUE)

Best linear unbiased estimator, also known as the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero.^[11]

Point estimate v.s. confidence interval estimate

There are two major types of estimates: point estimate and confidence interval estimate. In the point estimate we try to choose a unique point in the parameter space which can reasonably be considered as the true value of the parameter. On the other hand, instead of unique estimate of the parameter, we are interested in constructing a family of sets that contain the true (unknown) parameter value with a specified probability. In many problems of statistical inference we are not interested only in estimating the parameter or testing some hypothesis concerning the parameter, we also want to get a lower or an upper bound or both, for the real-valued parameter. To do this, we need to construct a confidence interval.

Confidence interval describes how reliable an estimate is. We can calculate the upper and lower confidence limits of the intervals from the observed data. Suppose a dataset x₁, . . . , x_n is given, modeled as realization of random variables X₁, . . . , X_n. Let θ be the parameter of interest, and γ a number between 0 and 1. If there exist sample statistics L_n = g(X₁, . . . , X_n) and U_n = h(X₁, . . . , X_n) such that P(L_n < θ < U_n) = γ for every value of θ, then (l_n, u_n), where l_n = g(x₁, . . . , x_n) and u_n = h(x₁, . . . , x_n), is called a 100γ% confidence interval for θ. The number γ is called the confidence level.^[1] In general, with a normally-distributed sample mean, Ẋ, and with a known value for the standard deviation, σ, a 100(1-α)% confidence interval for the true μ is formed by taking Ẋ ± e, with e = z_1-α/2(σ/n^1/2), where z_1-α/2 is the 100(1-α/2)% cumulative value of the standard normal curve, and n is the number of data values in that column. For example, z_1-α/2 equals 1.96 for 95% confidence.^[12]

Here two limits are computed from the set of observations, say l_n and u_n and it is claimed with a certain degree of confidence (measured in probabilistic terms) that the true value of γ lies between l_n and u_n. Thus we get an interval (l_n and u_n) which we expect would include the true value of γ(θ). So this type of estimation is called confidence interval estimation.^[2] This estimation provides a range of values which the parameter is expected to lie. It generally gives more information than point estimates and are preferred when making inferences. In some way, we can say that point estimation is the opposite of interval estimation.

Related Research Articles

In statistics, an estimator is a rule for calculating an estimate of a given quantity based on observed data: thus the rule, the quantity of interest and its result are distinguished. For example, the sample mean is a commonly used estimator of the population mean.

Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population.

The following outline is provided as an overview of and topical guide to statistics:

The likelihood function is the joint probability mass of observed data viewed as a function of the parameters of a statistical model. Intuitively, the likelihood function $is the probability of observing data assuming is the actual parameter.$

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In statistics, interval estimation is the use of sample data to estimate an interval of possible values of a parameter of interest. This is in contrast to point estimation, which gives a single value.

In statistics, completeness is a property of a statistic in relation to a parameterised model for a set of observed data.

Informally, in frequentist statistics, a confidence interval (CI) is an interval which is expected to typically contain the parameter being estimated. More specifically, given a confidence level $, a CI is a random interval which contains the parameter being estimated % of the time. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.$

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter θ₀—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ₀. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ₀ converges to one.

Estimation theory is a branch of statistics that deals with estimating the values of parameters based on measured empirical data that has a random component. The parameters describe an underlying physical setting in such a way that their value affects the distribution of the measured data. An estimator attempts to approximate the unknown parameters using the measurements. In estimation theory, two approaches are generally considered:

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function. Equivalently, it maximizes the posterior expectation of a utility function. An alternative way of formulating an estimator within Bayesian statistics is maximum a posteriori estimation.

In statistics, shrinkage is the reduction in the effects of sampling variation. In regression analysis, a fitted relationship appears to perform less well on a new data set than on the data set used for fitting. In particular the value of the coefficient of determination 'shrinks'. This idea is complementary to overfitting and, separately, to the standard adjustment made in the coefficient of determination to compensate for the subjunctive effects of further sampling, like controlling for the potential of new explanatory terms improving the model by chance: that is, the adjustment formula itself provides "shrinkage." But the adjustment formula yields an artificial shrinkage.

In statistics, the bias of an estimator is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called unbiased. In statistics, "bias" is an objective property of an estimator. Bias is a distinct concept from consistency: consistent estimators converge in probability to the true value of the parameter, but may be biased or unbiased; see bias versus consistency for more.

In statistics, Fisher consistency, named after Ronald Fisher, is a desirable property of an estimator asserting that if the estimator were calculated using the entire population rather than a sample, the true value of the estimated parameter would be obtained.

<span class="mw-page-title-main">Maximum spacing estimation</span> Method of estimating a statistical models parameters

In statistics, maximum spacing estimation (MSE or MSP), or maximum product of spacing estimation (MPS), is a method for estimating the parameters of a univariate statistical model. The method requires maximization of the geometric mean of spacings in the data, which are the differences between the values of the cumulative distribution function at neighbouring data points.

In statistics, efficiency is a measure of quality of an estimator, of an experimental design, or of a hypothesis testing procedure. Essentially, a more efficient estimator needs fewer input data or observations than a less efficient one to achieve the Cramér–Rao bound. An efficient estimator is characterized by having the smallest possible variance, indicating that there is a small deviance between the estimated value and the "true" value in the L2 norm sense.

In statistical inference, the concept of a confidence distribution (CD) has often been loosely referred to as a distribution function on the parameter space that can represent confidence intervals of all levels for a parameter of interest. Historically, it has typically been constructed by inverting the upper limits of lower sided confidence intervals of all levels, and it was also commonly associated with a fiducial interpretation, although it is a purely frequentist concept. A confidence distribution is NOT a probability distribution function of the parameter of interest, but may still be a function useful for making inferences.

In Bayesian inference, the Bernstein–von Mises theorem provides the basis for using Bayesian credible sets for confidence statements in parametric models. It states that under some conditions, a posterior distribution converges in the limit of infinite data to a multivariate normal distribution centered at the maximum likelihood estimator with covariance matrix given by $, where is the true population parameter and is the Fisher information matrix at the true population parameter value:$

References

1 2 3 4 5 A Modern Introduction to Probability and Statistics. F.M. Dekking, C. Kraaikamp, H.P. Lopuhaa, L.E. Meester. 2005.
1 2 3 4 5 Estimation and Inferential Statistics. Pradip Kumar Sahu, Santi Ranjan Pal, Ajit Kumar Das. 2015.
1 2 Dodge, Yadolah, ed. (1987). Statistical data analysis based on the L1-norm and related methods: Papers from the First International Conference held at Neuchâtel, August 31–September 4, 1987. North-Holland Publishing.
↑ Jaynes, E. T. (2007). Probability Theory: The logic of science (5. print. ed.). Cambridge University Press. p. 172. ISBN 978-0-521-59271-0.
↑ Ferguson, Thomas S. (1996). A Course in Large Sample Theory. Chapman & Hall. ISBN 0-412-04371-8.
1 2 Le Cam, Lucien (1986). Asymptotic Methods in Statistical Decision Theory. Springer-Verlag. ISBN 0-387-96307-3.
↑ Ferguson, Thomas S. (1982). "An inconsistent maximum likelihood estimate". Journal of the American Statistical Association . 77 (380): 831–834. doi:10.1080/01621459.1982.10477894. JSTOR 2287314.
↑ Lehmann, E. L.; Casella, G. (1998). Theory of Point Estimation (2nd ed.). Springer. ISBN 0-387-98502-6.
↑ Categorical Data Analysis. John Wiley and Sons, New York: Agresti A. 1990.
↑ The Concise Encyclopedia of Statistics. Springer: Dodge, Y. 2008.
↑ Best Linear Unbiased Estimation and Prediction. New York: John Wiley & Sons: Theil Henri. 1971.
↑ Experimental Design – With Applications in Management, Engineering, and the Sciences. Springer: Paul D. Berger, Robert E. Maurer, Giovana B. Celli. 2019.