Three-point estimation

Last updated May 22, 2024

The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information. While the distribution used for the approximation might be a normal distribution, this is not always so. For example, a triangular distribution might be used, depending on the application.

These are then combined to yield either a full probability distribution, for later combination with distributions obtained similarly for other variables, or summary descriptors of the distribution, such as the mean, standard deviation or percentage points of the distribution. The accuracy attributed to the results derived can be no better than the accuracy inherent in the three initial points, and there are clear dangers in using an assumed form for an underlying distribution that itself has little basis.

Estimation

Based on the assumption that a PERT distribution governs the data, several estimates are possible. These values are used to calculate an E value for the estimate and a standard deviation (SD) as L-estimators, where:

E = (a + 4m + b) / 6

SD = (b − a) / 6

E is a weighted average which takes into account both the most optimistic and most pessimistic estimates provided. SD measures the variability or uncertainty in the estimate. In Program Evaluation and Review Techniques (PERT) the three values are used to fit a PERT distribution for Monte Carlo simulations.

The triangular distribution is also commonly used. It differs from the double-triangular by its simple triangular shape and by the property that the mode does not have to coincide with the median. The mean (expected value) is then:

E = (a + m + b) / 3.

In some applications,^[1] the triangular distribution is used directly as an estimated probability distribution, rather than for the derivation of estimated statistics.

Project management

To produce a project estimate the project manager:

Decomposes the project into a list of estimable tasks, i.e. a work breakdown structure
Estimates the expected value E(task) and the standard deviation SD(task) of this estimate for each task time
Calculates the expected value for the total project work time as $\operatorname {E} ({\text{project}})=\sum {\operatorname {E} ({\text{task}})}$
Calculates the value SD(project) for the standard error of the estimated total project work time as: $\operatorname {SD} ({\text{project}})={\sqrt {\sum {\operatorname {SD} ({\text{task}})^{2}}}}$ under the assumption that the project work time estimates are uncorrelated

The E and SD values are then used to convert the project time estimates to confidence intervals as follows:

The 68% confidence interval for the true project work time is approximately E(project) ± SD(project)
The 90% confidence interval for the true project work time is approximately E(project) ± 1.645 × SD(project)
The 95% confidence interval for the true project work time is approximately E(project) ± 2 × SD(project)
The 99.7% confidence interval for the true project work time is approximately E(project) ± 3 × SD(project)
Information Systems typically uses the 95% confidence interval for all project and task estimates.^[2]

These confidence interval estimates assume that the data from all of the tasks combine to be approximately normal (see asymptotic normality). Typically, there would need to be 20–30 tasks for this to be reasonable, and each of the estimates E for the individual tasks would have to be unbiased.

Related Research Articles

The median of a set of numbers is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as the “middle" value. The basic feature of the median in describing data compared to the mean is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of the center. Median income, for example, may be a better way to describe the center of the income distribution because increases in the largest incomes alone have no effect on the median. For this reason, the median is of central importance in robust statistics.

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

In probability and statistics, Student's $t$ distribution $is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.$

In probability theory, Chebyshev's inequality provides an upper bound on the probability of deviation of a random variable from its mean. More specifically, the probability that a random variable deviates from its mean by more than $is at most, where is any positive constant and is the standard deviation.$

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

Informally, in frequentist statistics, a confidence interval (CI) is an interval which is expected to typically contain the parameter being estimated. More specifically, given a confidence level $, a CI is a random interval which contains the parameter being estimated % of the time. The confidence level, degree of confidence or confidence coefficient represents the long-run proportion of CIs that theoretically contain the true value of the parameter; this is tantamount to the nominal coverage probability. For example, out of all intervals computed at the 95% level, 95% of them should contain the parameter's true value.$

In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter for a hypothetical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the statistic is the sample mean, it is called the standard error of the mean (SEM). The standard error is a key ingredient in producing confidence intervals.

In probability theory and statistics, the triangular distribution is a continuous probability distribution with lower limit a, upper limit b, and mode c, where a < b and a ≤ c ≤ b.

In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.

Sample size determination or estimation is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complex studies, different sample sizes may be allocated, such as in stratified surveys or experimental designs with multiple treatment groups. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

This glossary of statistics and probability is a list of definitions of terms and concepts used in the mathematical sciences of statistics and probability, their sub-disciplines, and related fields. For additional related terms, see Glossary of mathematics and Glossary of experimental design.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability $when only the number of experiments and the number of successes are known.$

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution: 68%, 95%, and 99.7% of the values lie within one, two, and three standard deviations of the mean, respectively.

In statistics, the t-statistic is the ratio of the difference in a number’s estimated value from its assumed value to its standard error. It is used in hypothesis testing via Student's t-test. The t-statistic is used in a t-test to determine whether to support or reject the null hypothesis. It is very similar to the z-score but with the difference that t-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the t-statistic is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown. It is also used along with p-value when running hypothesis tests where the p-value tells us what the odds are of the results to have happened.

In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample standard deviation, which are greatly influenced by outliers.

<span class="mw-page-title-main">Cumulative frequency analysis</span>

Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value. The phenomenon may be time- or space-dependent. Cumulative frequency is also called frequency of non-exceedance.

In probability and statistics, the PERT distributions are a family of continuous probability distributions defined by the minimum (a), most likely (b) and maximum (c) values that a variable can take. It is a transformation of the four-parameter beta distribution with an additional assumption that its expected value is

Additive disequilibrium (D) is a statistic that estimates the difference between observed genotypic frequencies and the genotypic frequencies that would be expected under Hardy–Weinberg equilibrium. At a biallelic locus with alleles 1 and 2, the additive disequilibrium exists according to the equations

References

↑ Ministry of Defence (2007) "Three point estimates and quantitative risk analysis" Policy, information and guidance on the Risk Management aspects of UK MOD Defence Acquisition
↑ 68–95–99.7 rule

External links

Risk and duration estimates: 3 point estimating from www.4pm.com

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[MOD2007-1] Ministry of Defence (2007) "Three point estimates and quantitative risk analysis" Policy, information and guidance on the Risk Management aspects of UK MOD Defence Acquisition

[2] 68–95–99.7 rule

[1]

[2]