Polynomial least squares

Last updated June 28, 2020

In mathematical statistics, polynomial least squares comprises a broad range of statistical methods for estimating an underlying polynomial that describes observations. These methods include polynomial regression, curve fitting, linear regression, least squares, ordinary least squares, simple linear regression, linear least squares, approximation theory and method of moments. Polynomial least squares has applications in radar trackers, estimation theory, signal processing, statistics, and econometrics.

Polynomial least squares estimate of a deterministic first degree polynomial corrupted with observation errors
Definitions and assumptions
Polynomial least squares and the orthogonality principle
The empirically determined polynomial least squares output of a first degree polynomial corrupted with observation errors
The weighting function describing the linear polynomial least squares "system"
Empirically determined statistical moments
Measuring or approximating the statistical variance of the random errors
Properties of polynomial least squares modeled as a linear "system"
The synergy of integrating polynomial least squares with statistical estimation theory
See also
References

Two common applications of polynomial least squares methods are generating a low-degree polynomial that approximates a complicated function and estimating an assumed underlying polynomial from corrupted (also known as "noisy") observations. The former is commonly used in statistics and econometrics to fit a scatter plot with a first degree polynomial (that is, a linear expression).^[1]^[2]^[3] The latter is commonly used in target tracking in the form of Kalman filtering, which is effectively a recursive implementation of polynomial least squares.^[4]^[5]^[6]^[7] Estimating an assumed underlying deterministic polynomial can be used in econometrics as well.^[8] In effect, both applications produce average curves as generalizations of the common average of a set of numbers, which is equivalent to zero degree polynomial least squares.^[1]^[2]^[9]

In the above applications, the term "approximate" is used when no statistical measurement or observation errors are assumed, as when fitting a scatter plot. The term "estimate", derived from statistical estimation theory, is used when assuming that measurements or observations of a polynomial are corrupted.

Polynomial least squares estimate of a deterministic first degree polynomial corrupted with observation errors

Assume the deterministic first degree polynomial equation $y$ with unknown coefficients $\alpha$ and $\beta$ is written as

y=\alpha +\beta t.

This is corrupted with an additive stochastic process $\varepsilon$ described as an error (noise in tracking), resulting in

z=y+\varepsilon =\alpha +\beta t+\varepsilon .

Given observations $z_{n}$ from a sample, where the subscript $n$ is the observation index, the problem is to apply polynomial least squares to estimate $y(t)$ , and to determine its variance along with its expected value.

Definitions and assumptions

(1) The term linearity in mathematics may be considered to take two forms that are sometimes confusing: a linear system or transformation (sometimes called an operator)^[9] and a linear equation. The term "function" is often used to describe both a system and an equation, which may lead to confusion. A linear system is defined by

f(ax+by)=af(x)+bf(y)

where $a$ and $b$ are constants, and where $x$ and $y$ are variables. In a linear system $E[f(x)]=f(E[x])$ , where $E$ is the linear expectation operator. A linear equation is a straight line as is the first degree polynomial described above.

(2) The error $\varepsilon$ is modeled as a zero mean stochastic process, sample points of which are random variables that are uncorrelated and assumed to have identical probability distributions (specifically same mean and variance), but not necessarily Gaussian, treated as inputs to polynomial least squares. Stochastic processes and random variables are described only by probability distributions.^[1]^[9]^[2]

(3) Polynomial least squares is modeled as a linear signal processing system which processes statistical inputs deterministically, the output being the linearly processed empirically determined statistical estimate, variance, and expected value.^[6]^[7]^[8]

(4) Polynomial least squares processing produces deterministic moments (analogous to mechanical moments), which may be considered as moments of sample statistics, but not of statistical moments.^[8]

Polynomial least squares and the orthogonality principle

Approximating a function $z(t)$ with a polynomial

{\hat {z}}(t)=\sum _{j=1}^{J}a_{j}t^{j-1}

where hat (^) denotes the estimate and (J − 1) is the polynomial degree, can be performed by applying the orthogonality principle. The sum of squared residuals can be written as

\sum _{n=1}^{N}(z_{n}-{\hat {z}}_{n})^{2}.

According to the orthogonality principle,^[4]^[5]^[6]^[7]^[8]^[9]^[10]^[11] this is at its minimum when the residual vector ( $z-{\hat {z}}$ ) is orthogonal to the estimate ${\hat {z}}$ , that is

\sum _{n=1}^{N}(z_{n}-{\hat {z}}_{n}){\hat {z}}_{n}=0.

This can be described as the orthogonal projection of the data values { $z_{n}$ } onto a solution in the form of the polynomial ${\hat {z}}(t)$ .^[4]^[6]^[7] For N > J, orthogonal projection yields the standard overdetermined system of equations (often called normal equations) used to compute the coefficients in the polynomial approximation.^[1]^[10]^[11] The minimum sum of squared residuals is then

SSR_{\min }=\sum _{n=1}^{N}(z_{n}-{\hat {z}}_{n})z_{n}

The advantage of using orthogonal projection is that $SSR_{\min }$ can be determined for use in the polynomial least squares processed statistical variance of the estimate.^[8]^[9]^[11]

The empirically determined polynomial least squares output of a first degree polynomial corrupted with observation errors

To fully determine the output of polynomial least squares, a weighting function describing the processing must first be structured and then the statistical moments can be computed.

The weighting function describing the linear polynomial least squares "system"

The weighting function $w_{n}(\tau )$ can be formulated from polynomial least squares to estimate the unknown $y(t)$ as follows:^[8]

{\hat {y}}(\tau )={\frac {1}{N}}\sum _{n=1}^{N}z_{n}w_{n}(\tau )={\frac {1}{N}}\sum _{n=1}^{N}(\alpha +\beta t_{n}+\varepsilon _{n})w_{n}(\tau )

where N is the number of samples, $z_{n}$ are random variables as samples of the stochastic $z$ (noisy signal), and the first degree polynomial data weights are

w_{n}(\tau )\equiv {\frac {[{\bar {t^{2}}}-{\bar {t}}t_{n}+(t_{n}-{\bar {t}})\tau ]}{({\bar {t^{2}}}-{\bar {t}}^{2})}}

which represent the linear polynomial least squares "system" and describe its processing.^[8] The Greek letter $\tau$ is the independent variable $t$ when estimating the dependent variable $y(t)$ after data fitting has been performed. (The letter $\tau$ is used to avoid confusion with $t$ before and sampling during polynomial least squares processing.) The overbar ( ¯ ) defines the deterministic centroid of $u_{n}$ as processed by polynomial least squares ^[8] – i.e., it defines the deterministic first order moment, which may be considered a sample average, but does not here approximate a first order statistical moment:

{\bar {u}}{\overset {\underset {\mathrm {def} }{}}{=}}{\frac {1}{N}}\sum _{n=1}^{N}u_{n}

Empirically determined statistical moments

Applying $w_{n}(\tau )$ yields

{\hat {y}}(\tau )={\hat {\alpha }}+{\hat {\beta }}\tau

where

{\hat {\alpha }}={\frac {({\bar {z}}{\bar {t^{2}}}-{\bar {zt}}{\bar {t}})}{({\bar {t^{2}}}-{\bar {t}}^{2})}}=\alpha +{\frac {({\bar {\varepsilon }}{\bar {t^{2}}}-{\bar {{\varepsilon }t}}{\bar {t}})}{({\bar {t^{2}}}-{\bar {t}}^{2})}}

and

{\hat {\beta }}={\frac {({\bar {zt}}-{\bar {z}}{\bar {t}})}{({\bar {t^{2}}}-{\bar {t}}^{2})}}=\beta +{\frac {({\bar {\varepsilon t}}-{\bar {\varepsilon }}{\bar {t}})}{({\bar {t^{2}}}-{\bar {t}}^{2})}}

As linear functions of the random variables $\varepsilon _{n}$ , both coefficient estimates ${\hat {\alpha }}$ and ${\hat {\beta }}$ are random variables.^[8] In the absence of the errors $\varepsilon _{n}$ , ${\hat {\alpha }}=\alpha$ and ${\hat {\beta }}=\beta$ , as they should to meet that boundary condition.

Because the statistical expectation operator E[•] is a linear function and the sampled stochastic process errors $\varepsilon _{n}$ are zero mean, the expected value of the estimate ${\hat {y}}$ is the first order statistical moment as follows:^[1]^[2]^[3]^[8]

E[{\hat {y}}(\tau )]=\alpha +\beta \tau +{\frac {1}{N}}\sum _{n=1}^{N}E[\varepsilon _{n}]w_{n}(\tau )=\alpha +\beta \tau =\alpha +\beta t

The statistical variance in ${\hat {y}}$ is given by the second order statistical central moment as follows:^[1]^[2]^[3]^[8]

\sigma _{\hat {y}}^{2}=E[\left({\hat {y}}-E[{\hat {y}}]\right)^{2}]={\frac {1}{N}}{\frac {1}{N}}\sum _{n=1}^{N}\sum _{i=1}^{N}w_{n}(\tau )E[\varepsilon _{n}\varepsilon _{i}]w_{i}(\tau )

$=\sigma _{\varepsilon }^{2}{\frac {1}{N}}{\frac {1}{N}}\sum _{n=1}^{N}\sum _{i=1}^{N}w_{n}^{2}(\tau )$

because

\sum _{i=1}^{N}E[\varepsilon _{n}\varepsilon _{i}]w_{i}(\tau )=\sigma _{\varepsilon }^{2}w_{n}(\tau )

where $\sigma _{\varepsilon }^{2}$ is the statistical variance of random variables $\varepsilon _{n}$ ; i.e., $E[\varepsilon _{n}\varepsilon _{i}]=\sigma _{\varepsilon }^{2}$ for i = n and (because $\varepsilon _{n}$ are uncorrelated) $\sigma _{\varepsilon }^{2}=0$ for $i\neq n$ ^[8]

Carrying out the multiplications and summations in $\sigma _{\hat {y}}^{2}$ yields^[8]

\sigma _{\hat {y}}^{2}=\sigma _{\varepsilon }^{2}{\frac {({\bar {t^{2}}}-2{\bar {t}}\tau +\tau ^{2})}{N({\bar {t^{2}}}-{\bar {t}}^{2})}}.

Measuring or approximating the statistical variance of the random errors

In a hardware system, such as a tracking radar, the measurement noise variance $\sigma _{\varepsilon }^{2}$ can be determined from measurements when there is no target return – i.e., by just taking measurements of the noise alone.

However, if polynomial least squares is used when the variance $\sigma _{\varepsilon }^{2}$ is not measurable (such as in econometrics or statistics), it can be estimated with observations in $e_{\min }$ from orthogonal projection as follows:

\sigma _{\varepsilon }^{2}\approx {\hat {\sigma _{\varepsilon }^{2}}}=({\bar {z^{2}}}-{\hat {\alpha }}{\bar {z}}-{\hat {\beta }}{\bar {zt}})

^[8]

As a result, to the first order approximation from the estimates ${\hat {\alpha }}$ and ${\hat {\beta }}$ as functions of sampled $z$ and $t$

\sigma _{\hat {y}}^{2}\approx {\bigg [}{\frac {({\bar {z^{2}}}-{\bar {z}}^{2})}{({\bar {t^{2}}}-{\bar {t}}^{2})}}-{\Biggl (}{\frac {({\bar {zt}}-{\bar {z}}{\bar {t}})}{({\bar {t^{2}}}-{\bar {t}})}}{\Biggl )}^{2}{\bigg ]}{\frac {({\bar {t^{2}}}-2{\bar {t}}\tau +\tau ^{2})}{N}}

which goes to zero in the absence of the errors $\varepsilon _{n}$ , as it should to meet that boundary condition.^[8]

As a result, the samples $z_{n}$ (noisy signal) are considered to be the input to the linear polynomial least squares "system" which transforms the samples into the empirically determined statistical estimate ${\hat {y}}(\tau )$ , the expected value $E[{\hat {y}}]$ , and the variance $\sigma _{\hat {y}}^{2}$ .^[8]

Properties of polynomial least squares modeled as a linear "system"

(1) The empirical statistical variance $\sigma _{\hat {y}}^{2}$ is a function of $\sigma _{\varepsilon }^{2}$ , N and $\tau$ . Setting the derivative of $\sigma _{\hat {y}}^{2}$ with respect to $\tau$ equal to zero shows the minimum to occur at $\tau ={\bar {t}}$ ; i.e., at the centroid (sample average) of the samples $t_{n}$ . The minimum statistical variance thus becomes ${\frac {\sigma _{\varepsilon }^{2}}{N}}$ . This is equivalent to the statistical variance from polynomial least squares of a zero degree polynomial – i.e., of the centroid (sample average) of $\alpha$ .^[1]^[2]^[8]^[9]

(2) The empirical statistical variance $\sigma _{\hat {y}}^{2}$ is a function of the quadratic $\tau ^{2}$ . Moreover, the further $\tau$ deviates from ${\bar {t}}$ (even within the data window), the larger is the variance $\sigma _{\hat {y}}^{2}$ due to the random variable errors $\varepsilon _{n}$ . The independent variable $\tau$ can take any value on the $t$ axis. It is not limited to the data window. It can extend beyond the data window – and likely will at times depending on the application. If it is within the data window, estimation is described as interpolation. If it is outside the data window, estimation is described as extrapolation. It is both intuitive and well known that the further is extrapolation, the larger is the error.^[8]

(3) The empirical statistical variance $\sigma _{\hat {y}}^{2}$ due to the random variable errors $\varepsilon _{n}$ is inversely proportional to N. As N increases, the statistical variance decreases. This is well known and what filtering out the errors $\varepsilon _{n}$ is all about.^[1]^[2]^[8]^[12] The underlying purpose of polynomial least squares is to filter out the errors to improve estimation accuracy by reducing the empirical statistical estimation variance. In reality, only two data points are required to estimate $\alpha$ and $\beta$ ; albeit the more data points with zero mean statistical errors included, the smaller is the empirical statistical estimation variance as established by N samples.

(4) There is an additional issue to be considered when the noise variance is not measurable: Independent of the polynomial least squares estimation, any new observations would be described by the variance $\sigma _{\varepsilon }^{2}\approx {\hat {\sigma _{\varepsilon }^{2}}}=({\bar {z^{2}}}-{\hat {\alpha }}{\bar {z}}-{\hat {\beta }}{\bar {zt}})$ .^[8]^[9]

Thus, the polynomial least squares statistical estimation variance $\sigma _{\hat {y}}^{2}$ and the statistical variance of any new sample in $\sigma _{\varepsilon }^{2}$ would both contribute to the uncertainty of any future observation. Both variances are clearly determined by polynomial least squares in advance.

(5) This concept also applies to higher degree polynomials. However, the weighting function $w_{n}(\tau )$ is obviously more complicated. In addition, the estimation variances increase exponentially as polynomial degrees increase linearly (i.e., in unit steps). However, there are ways of dealing with this as described in.^[6]^[7]

The synergy of integrating polynomial least squares with statistical estimation theory

Modeling polynomial least squares as a linear signal processing "system" creates the synergy of integrating polynomial least squares with statistical estimation theory to deterministically process samples of an assumed polynomial corrupted with a statistically described stochastic error ε. In the absence of the error ε, statistical estimation theory is irrelevant and polynomial least squares reverts to the conventional approximation of complicated functions and scatter plots.

Related Research Articles

In probability theory, a normaldistribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is

Pauli matrices Matrices important in quantum mechanics and the study of spin

In mathematical physics and mathematics, the Pauli matrices are a set of three $2 \times 2$ complex matrices which are Hermitian and unitary. Usually indicated by the Greek letter sigma, they are occasionally denoted by tau when used in connection with isospin symmetries. They are

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed. The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator or ridge regression.

In statistics, Deming regression, named after W. Edwards Deming, is an errors-in-variables model which tries to find the line of best fit for a two-dimensional dataset. It differs from the simple linear regression in that it accounts for errors in observations on both the x- and the y- axis. It is a special case of total least squares, which allows for any number of predictors and a more complicated error structure.

In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's t-statistic, with the estimate of error varying between points.

In econometrics, the autoregressive conditional heteroscedasticity (ARCH) model is a statistical model for time series data that describes the variance of the current error term or innovation as a function of the actual sizes of the previous time periods' error terms; often the variance is related to the squares of the previous innovations. The ARCH model is appropriate when the error variance in a time series follows an autoregressive (AR) model; if an autoregressive moving average (ARMA) model is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH) model.

In statistics, a confidence region is a multi-dimensional generalization of a confidence interval. It is a set of points in an n-dimensional space, often represented as an ellipsoid around a point which is an estimated solution to a problem, although other shapes can occur.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function.

The Havriliak–Negami relaxation is an empirical modification of the Debye relaxation model in electromagnetism. Unlike the Debye model, the Havriliak–Negami relaxation accounts for the asymmetry and broadness of the dielectric dispersion curve. The model was first used to describe the dielectric relaxation of some polymers, by adding two exponential parameters to the Debye equation:

In statistics, simple linear regression is a linear regression model with a single explanatory variable. That is, it concerns two-dimensional sample points with one independent variable and one dependent variable and finds a linear function that, as accurately as possible, predicts the dependent variable values as a function of the independent variables. The adjective simple refers to the fact that the outcome variable is related to a single predictor.

In statistics, the Breusch–Pagan test, developed in 1979 by Trevor Breusch and Adrian Pagan, is used to test for heteroskedasticity in a linear regression model. It was independently suggested with some extension by R. Dennis Cook and Sanford Weisberg in 1983. Derived from the Lagrange multiplier test principle, it tests whether the variance of the errors from a regression is dependent on the values of the independent variables. In that case, heteroskedasticity is present.

The Newman–Penrose (NP) formalism is a set of notation developed by Ezra T. Newman and Roger Penrose for general relativity (GR). Their notation is an effort to treat general relativity in terms of spinor notation, which introduces complex forms of the usual variables used in GR. The NP formalism is itself a special case of the tetrad formalism, where the tensors of the theory are projected onto a complete vector basis at each point in spacetime. Usually this vector basis is chosen to reflect some symmetry of the spacetime, leading to simplified expressions for physical observables. In the case of the NP formalism, the vector basis chosen is a null tetrad: a set of four null vectors—two real, and a complex-conjugate pair. The two real members asymptotically point radially inward and radially outward, and the formalism is well adapted to treatment of the propagation of radiation in curved spacetime. The Weyl scalars, derived from the Weyl tensor, are often used. In particular, it can be shown that one of these scalars— $in the appropriate frame—encodes the outgoing gravitational radiation of an asymptotically flat system.$

In the Newman–Penrose (NP) formalism of general relativity, Weyl scalars refer to a set of five complex scalars $which encode the ten independent components of the Weyl tensor of a four-dimensional spacetime.$

In linear regression, mean response and predicted response are values of the dependent variable calculated from the regression parameters and a given value of the independent variable. The values of these two responses are the same, but their calculated variances are different.

Viscoplasticity is a theory in continuum mechanics that describes the rate-dependent inelastic behavior of solids. Rate-dependence in this context means that the deformation of the material depends on the rate at which loads are applied. The inelastic behavior that is the subject of viscoplasticity is plastic deformation which means that the material undergoes unrecoverable deformations when a load level is reached. Rate-dependent plasticity is important for transient plasticity calculations. The main difference between rate-independent plastic and viscoplastic material models is that the latter exhibit not only permanent deformations after the application of loads but continue to undergo a creep flow as a function of time under the influence of the applied load.

A fiber-reinforced composite (FRC) is a composite building material that consists of three components:

the fibers as the discontinuous or dispersed phase,
the matrix as the continuous phase, and
the fine interphase region, also known as the interface.

The purpose of this page is to provide supplementary materials for the ordinary least squares article, reducing the load of the main article with mathematics and improving its accessibility, while at the same time retaining the completeness of exposition.

In statistics, errors-in-variables models or measurement error models are regression models that account for measurement errors in the independent variables. In contrast, standard regression models assume that those regressors have been measured exactly, or observed without error; as such, those models account only for errors in the dependent variables, or responses.

In the Newman–Penrose (NP) formalism of general relativity, independent components of the Ricci tensors of a four-dimensional spacetime are encoded into seven Ricci scalars which consist of three real scalars $, three complex scalars and the NP curvature scalar . Physically, Ricci-NP scalars are related with the energy-momentum distribution of the spacetime due to Einstein's field equation.$

In theoretical physics, relativistic Lagrangian mechanics is Lagrangian mechanics applied in the context of special relativity and general relativity.

References

1 2 3 4 5 6 7 8 Gujarati, Damodar N.; Porter, Dawn C. (2008). Basic Econometrics (PDF) (5 ed.). McGraw-Hill Education. ISBN 978-0073375779.
1 2 3 4 5 6 7 Hansen, Bruce E. (January 16, 2015). Econometrics (PDF).
1 2 3 Copeland, Thomas E.; Weston, John Fred; Shastri, Kuldeep (January 10, 2004). Financial Theory and Corporate Policy (4 ed.). Prentice Hall. ISBN 978-0321127211.
1 2 3 Kálmán, Rudolf E. (March 1, 1960). "A New Approach to Linear Filtering and Prediction Problems". Journal of Basic Engineering. 82: 35. doi:10.1115/1.3662552.
1 2 Sorenson, H. W., Least-squares estimation: Gauss to Kalman, IEEE Spectrum, July, 1970.
1 2 3 4 5 Bell, J. W., Simple Disambiguation Of Orthogonal Projection In Kalman’s Filter Derivation, Proceedings of the International Conference on Radar Systems, Glasgow, UK. October, 2012.
1 2 3 4 5 Bell, J. W., A Simple Kalman Filter Alternative: The Multi-Fractional Order Estimator, IET-RSN, Vol. 7, Issue 8, October 2013.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 "Ordinary Least Squares Revolutionized: Establishing the Vital Missing Empirically Determined Statistical Prediction Variance by Jeff Bell". SSRN. doi:10.2139/ssrn.2573840 . Retrieved 2019-02-27.
1 2 3 4 5 6 7 Papoulis, A., Probability, RVs, and Stochastic Processes, McGraw-Hill, New York, 1965
1 2 Wylie, C. R., Jr., Advanced Engineering Mathematics, McGraw-Hill, New York, 1960.
1 2 3 Schied, F., Numerical Analysis, Schaum's Outline Series, McGraw-Hill, New York, 1968.
↑ Ordinary least squares

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Gujarati-1] 1 2 3 4 5 6 7 8 Gujarati, Damodar N.; Porter, Dawn C. (2008). Basic Econometrics (PDF) (5 ed.). McGraw-Hill Education. ISBN 978-0073375779.

[Hansen-2] 1 2 3 4 5 6 7 Hansen, Bruce E. (January 16, 2015). Econometrics (PDF).

[Copeland-3] 1 2 3 Copeland, Thomas E.; Weston, John Fred; Shastri, Kuldeep (January 10, 2004). Financial Theory and Corporate Policy (4 ed.). Prentice Hall. ISBN 978-0321127211.

[Kalman-4] 1 2 3 Kálmán, Rudolf E. (March 1, 1960). "A New Approach to Linear Filtering and Prediction Problems". Journal of Basic Engineering. 82: 35. doi:10.1115/1.3662552.

[Sorenson-5] 1 2 Sorenson, H. W., Least-squares estimation: Gauss to Kalman, IEEE Spectrum, July, 1970.

[Bell1-6] 1 2 3 4 5 Bell, J. W., Simple Disambiguation Of Orthogonal Projection In Kalman’s Filter Derivation, Proceedings of the International Conference on Radar Systems, Glasgow, UK. October, 2012.

[Bell2-7] 1 2 3 4 5 Bell, J. W., A Simple Kalman Filter Alternative: The Multi-Fractional Order Estimator, IET-RSN, Vol. 7, Issue 8, October 2013.

[web_reference_2-8] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 "Ordinary Least Squares Revolutionized: Establishing the Vital Missing Empirically Determined Statistical Prediction Variance by Jeff Bell". SSRN. doi:10.2139/ssrn.2573840 . Retrieved 2019-02-27.

[Papoulis-9] 1 2 3 4 5 6 7 Papoulis, A., Probability, RVs, and Stochastic Processes, McGraw-Hill, New York, 1965

[Wylie-10] 1 2 Wylie, C. R., Jr., Advanced Engineering Mathematics, McGraw-Hill, New York, 1960.

[Schied-11] 1 2 3 Schied, F., Numerical Analysis, Schaum's Outline Series, McGraw-Hill, New York, 1968.

[web_reference_3-12] Ordinary least squares