# Three-point estimation

Last updated

The three-point estimation technique is used in management and information systems applications for the construction of an approximate probability distribution representing the outcome of future events, based on very limited information. While the distribution used for the approximation might be a normal distribution, this is not always so and, for example a triangular distribution might be used, depending on the application.

## Contents

In three-point estimation, three figures are produced initially for every distribution that is required, based on prior experience or best-guesses:

• a = the best-case estimate
• m = the most likely estimate
• b = the worst-case estimate

These are then combined to yield either a full probability distribution, for later combination with distributions obtained similarly for other variables, or summary descriptors of the distribution, such as the mean, standard deviation or percentage points of the distribution. The accuracy attributed to the results derived can be no better than the accuracy inherent in the three initial points, and there are clear dangers in using an assumed form for an underlying distribution that itself has little basis.

## Estimation

Based on the assumption that a PERT distribution governs the data, several estimates are possible. These values are used to calculate an E value for the estimate and a standard deviation (SD) as L-estimators, where:

E = (a + 4m + b) / 6
SD = (b  a) / 6

E is a weighted average which takes into account both the most optimistic and most pessimistic estimates provided. SD measures the variability or uncertainty in the estimate. In Project Evaluation and Review Techniques (PERT) the three values are used to fit a PERT distribution for Monte Carlo simulations.

The triangular distribution is also commonly used. It differs from the double-triangular by its simple triangular shape and by the property that the mode does not have to coincide with the median. The mean (expected value) is then:

E = (a + m + b) / 3.

In some applications, [1] the triangular distribution is used directly as an estimated probability distribution, rather than for the derivation of estimated statistics.

## Project management

To produce a project estimate the project manager:

• Decomposes the project into a list of estimable tasks, i.e. a work breakdown structure
• Estimates the expected value E(task) and the standard error SE(task) of this estimate for each task time
• Calculates the expected value for the total project work time as ${\displaystyle \operatorname {E} ({\text{project}})=\sum {\operatorname {E} ({\text{task}})}}$
• Calculates the value SE(project) for the standard error of the estimated total project work time as: ${\displaystyle \operatorname {SE} ({\text{project}})={\sqrt {\sum {\operatorname {SE} ({\text{task}})^{2}}}}}$ under the assumption that the project work time estimates are uncorrelated

The E and SE values are then used to convert the project time estimates to confidence intervals as follows:

• The 68% confidence interval for the true project work time is approximately E(project) ± SE(project)
• The 90% confidence interval for the true project work time is approximately E(project) ± 1.645 × SE(project)
• The 95% confidence interval for the true project work time is approximately E(project) ± 2 × SE(project)
• The 99.7% confidence interval for the true project work time is approximately E(project) ± 3 × SE(project)
• Information Systems typically uses the 95% confidence interval for all project and task estimates. [2]

These confidence interval estimates assume that the data from all of the tasks combine to be approximately normal (see asymptotic normality). Typically, there would need to be 2030 tasks for this to be reasonable, and each of the estimates E for the individual tasks would have to be unbiased.

## Related Research Articles

In statistics, the standard deviation is a measure of the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range.

In probability and statistics, Student's t-distribution is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the sample size is small and the population standard deviation is unknown. It was developed by William Sealy Gosset under the pseudonym Student.

In probability theory, Chebyshev's inequality guarantees that, for a wide class of probability distributions, no more than a certain fraction of values can be more than a certain distance from the mean. Specifically, no more than 1/k2 of the distribution's values can be more than k standard deviations away from the mean. The rule is often called Chebyshev's theorem, about the range of standard deviations around the mean, in statistics. The inequality has great utility because it can be applied to any probability distribution in which the mean and variance are defined. For example, it can be used to prove the weak law of large numbers.

In statistics, the Pearson correlation coefficient, also referred to as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), or the bivariate correlation, is a statistic that measures linear correlation between two variables X and Y. It has a value between +1 and −1. A value of +1 is total positive linear correlation, 0 is no linear correlation, and −1 is total negative linear correlation.

In health-related fields, a reference range or reference interval is the range of values that is deemed normal for a physiologic measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

In statistics, a confidence interval (CI) is a type of estimate computed from the statistics of the observed data. This proposes a range of plausible values for an unknown parameter. The interval has an associated confidence level that the true parameter is in the proposed range. Given observations and a confidence level , a valid confidence interval has a probability of containing the true underlying parameter. The level of confidence can be chosen by the investigator. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator.

A Z-test is any statistical test for which the distribution of the test statistic under the null hypothesis can be approximated by a normal distribution. Z-test tests the mean of a distribution in which we already know the population variance σ2. Because of the central limit theorem, many test statistics are approximately normally distributed for large samples. For each significance level in the confidence interval, the Z-test has a single critical value which makes it more convenient than the Student's t-test which has separate and different critical values for each sample size. Therefore, many statistical tests can be conveniently performed as approximate Z-tests if the sample size is large or the population variance is known. If the population variance is unknown and the sample size is not large, the Student's t-test may be more appropriate.

In statistics, an effect size is a quantitative measure of the magnitude of a phenomenon. It can refer to the value of a statistic calculated from a sample of data, the value of a parameter of a hypothetical statistical population, or to the equation that operationalizes how statistics or parameters lead to the effect size value. Examples of effect sizes include the correlation between two variables, the regression coefficient in a regression, the mean difference, or the risk of a particular event happening. Effect sizes complement statistical hypothesis testing, and play an important role in power analyses, sample size planning, and in meta-analyses. The cluster of data-analysis methods concerning effect sizes is referred to as estimation statistics.

The standard error (SE) of a statistic is the standard deviation of its sampling distribution or an estimate of that standard deviation. If the parameter or the statistic is the mean, it is called the standard error of the mean (SEM).

In probability theory and statistics, the triangular distribution is a continuous probability distribution with lower limit a, upper limit b and mode c, where a < b and a ≤ c ≤ b.

In probability theory and statistics, the continuous uniform distribution or rectangular distribution is a family of symmetric probability distributions. The distribution describes an experiment where there is an arbitrary outcome that lies between certain bounds. The bounds are defined by the parameters, a and b, which are the minimum and maximum values. The interval can be either be closed or open. Therefore, the distribution is often abbreviated U, where U stands for uniform distribution. The difference between the bounds defines the interval length; all intervals of the same length on the distribution's support are equally probable. It is the maximum entropy probability distribution for a random variable X under no constraint other than that it is contained in the distribution's support.

Sample size determination is the act of choosing the number of observations or replicates to include in a statistical sample. The sample size is an important feature of any empirical study in which the goal is to make inferences about a population from a sample. In practice, the sample size used in a study is usually determined based on the cost, time, or convenience of collecting the data, and the need for it to offer sufficient statistical power. In complicated studies there may be several different sample sizes: for example, in a stratified survey there would be different sizes for each stratum. In a census, data is sought for an entire population, hence the intended sample size is equal to the population. In experimental design, where a study may be divided into different treatment groups, there may be different sample sizes for each group.

The following is a glossary of terms used in the mathematical sciences statistics and probability.

In statistics, a binomial proportion confidence interval is a confidence interval for the probability of success calculated from the outcome of a series of success–failure experiments. In other words, a binomial proportion confidence interval is an interval estimate of a success probability p when only the number of experiments n and the number of successes nS are known.

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within a band around the mean in a normal distribution with a width of two, four and six standard deviations, respectively; more precisely, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard deviations of the mean, respectively.

In statistics, the t-statistic is the ratio of the departure of the estimated value of a parameter from its hypothesized value to its standard error. It is used in hypothesis testing via Student's t-test. The T-statistic is used in a T test to determine if you should support or reject the null hypothesis. It is very similar to the Z-score but with the difference that T-statistic is used when the sample size is small or the population standard deviation is unknown. For example, the T-statistic is used in estimating the population mean from a sampling distribution of sample means if the population standard deviation is unknown. It is also used along with p-value when running hypothesis test where the p-value tells us what the odds are of the results to have happened.

The Datar–Mathews Method is a method for real options valuation. The method provides an easy way to determine the real option value of a project simply by using the average of positive outcomes for the project. The method can be understood as an extension of the net present value (NPV) multi-scenario Monte Carlo model with an adjustment for risk aversion and economic decision-making. The method uses information that arises naturally in a standard discounted cash flow (DCF), or NPV, project financial valuation. It was created in 2000 by Vinay Datar, professor at Seattle University; and Scott H. Mathews, Technical Fellow at The Boeing Company.

Cumulative frequency analysis is the analysis of the frequency of occurrence of values of a phenomenon less than a reference value. The phenomenon may be time- or space-dependent. Cumulative frequency is also called frequency of non-exceedance.

In probability and statistics, the PERT distribution is a family of continuous probability distributions defined by the minimum (a), most likely (b) and maximum (c) values that a variable can take. It is a transformation of the four-parameter Beta distribution with an additional assumption that its expected value is

Additive disequilibrium (D) is a statistic that estimates the difference between observed genotypic frequencies and the genotypic frequencies that would be expected under Hardy–Weinberg equilibrium. At a biallelic locus with alleles 1 and 2, the additive disequilibrium exists according to the equations