Extrapolation

Last updated

In mathematics, extrapolation is a type of estimation, beyond the original observation range, of the value of a variable on the basis of its relationship with another variable. It is similar to interpolation, which produces estimates between known observations, but extrapolation is subject to greater uncertainty and a higher risk of producing meaningless results. Extrapolation may also mean extension of a method, assuming similar methods will be applicable. Extrapolation may also apply to human experience to project, extend, or expand known experience into an area not known or previously experienced so as to arrive at a (usually conjectural) knowledge of the unknown [1] (e.g. a driver extrapolates road conditions beyond his sight while driving). The extrapolation method can be applied in the interior reconstruction problem.

Contents

Example illustration of the extrapolation problem, consisting of assigning a meaningful value at the blue box, at
x
=
7
{\displaystyle x=7}
, given the red data points Extrapolation example.svg
Example illustration of the extrapolation problem, consisting of assigning a meaningful value at the blue box, at , given the red data points

Method

A sound choice of which extrapolation method to apply relies on a prior knowledge of the process that created the existing data points. Some experts have proposed the use of causal forces in the evaluation of extrapolation methods. [2] Crucial questions are, for example, if the data can be assumed to be continuous, smooth, possibly periodic, etc.

Linear

Linear extrapolation means creating a tangent line at the end of the known data and extending it beyond that limit. Linear extrapolation will only provide good results when used to extend the graph of an approximately linear function or not too far beyond the known data.

If the two data points nearest the point to be extrapolated are and , linear extrapolation gives the function:

(which is identical to linear interpolation if ). It is possible to include more than two points, and averaging the slope of the linear interpolant, by regression-like techniques, on the data points chosen to be included. This is similar to linear prediction.

Polynomial

Lagrange extrapolations of the sequence 1,2,3. Extrapolating by 4 leads to a polynomial of minimal degree (cyan line). Lagrange polynomials for continuations of sequence 1,2,3.gif
Lagrange extrapolations of the sequence 1,2,3. Extrapolating by 4 leads to a polynomial of minimal degree (cyan line).

A polynomial curve can be created through the entire known data or just near the end (two points for linear extrapolation, three points for quadratic extrapolation, etc.). The resulting curve can then be extended beyond the end of the known data. Polynomial extrapolation is typically done by means of Lagrange interpolation or using Newton's method of finite differences to create a Newton series that fits the data. The resulting polynomial may be used to extrapolate the data.

High-order polynomial extrapolation must be used with due care. For the example data set and problem in the figure above, anything above order 1 (linear extrapolation) will possibly yield unusable values; an error estimate of the extrapolated value will grow with the degree of the polynomial extrapolation. This is related to Runge's phenomenon.


Conic

A conic section can be created using five points near the end of the known data. If the conic section created is an ellipse or circle, when extrapolated it will loop back and rejoin itself. An extrapolated parabola or hyperbola will not rejoin itself, but may curve back relative to the X-axis. This type of extrapolation could be done with a conic sections template (on paper) or with a computer.

French curve

French curve extrapolation is a method suitable for any distribution that has a tendency to be exponential, but with accelerating or decelerating factors. [3] This method has been used successfully in providing forecast projections of the growth of HIV/AIDS in the UK since 1987 and variant CJD in the UK for a number of years. Another study has shown that extrapolation can produce the same quality of forecasting results as more complex forecasting strategies. [4]

Geometric Extrapolation with error prediction

Can be created with 3 points of a sequence and the "moment" or "index", this type of extrapolation have 100% accuracy in predictions in a big percentage of known series database (OEIS). [5]

Example of extrapolation with error prediction :

Quality

Typically, the quality of a particular method of extrapolation is limited by the assumptions about the function made by the method. If the method assumes the data are smooth, then a non-smooth function will be poorly extrapolated.

In terms of complex time series, some experts have discovered that extrapolation is more accurate when performed through the decomposition of causal forces. [6]

Even for proper assumptions about the function, the extrapolation can diverge severely from the function. The classic example is truncated power series representations of sin(x) and related trigonometric functions. For instance, taking only data from near the x = 0, we may estimate that the function behaves as sin(x) ~ x. In the neighborhood of x = 0, this is an excellent estimate. Away from x = 0 however, the extrapolation moves arbitrarily away from the x-axis while sin(x) remains in the interval [1, 1]. I.e., the error increases without bound.

Taking more terms in the power series of sin(x) around x = 0 will produce better agreement over a larger interval near x = 0, but will produce extrapolations that eventually diverge away from the x-axis even faster than the linear approximation.

This divergence is a specific property of extrapolation methods and is only circumvented when the functional forms assumed by the extrapolation method (inadvertently or intentionally due to additional information) accurately represent the nature of the function being extrapolated. For particular problems, this additional information may be available, but in the general case, it is impossible to satisfy all possible function behaviors with a workably small set of potential behavior.

In the complex plane

In complex analysis, a problem of extrapolation may be converted into an interpolation problem by the change of variable . This transform exchanges the part of the complex plane inside the unit circle with the part of the complex plane outside of the unit circle. In particular, the compactification point at infinity is mapped to the origin and vice versa. Care must be taken with this transform however, since the original function may have had "features", for example poles and other singularities, at infinity that were not evident from the sampled data.

Another problem of extrapolation is loosely related to the problem of analytic continuation, where (typically) a power series representation of a function is expanded at one of its points of convergence to produce a power series with a larger radius of convergence. In effect, a set of data from a small region is used to extrapolate a function onto a larger region.

Again, analytic continuation can be thwarted by function features that were not evident from the initial data.

Also, one may use sequence transformations like Padé approximants and Levin-type sequence transformations as extrapolation methods that lead to a summation of power series that are divergent outside the original radius of convergence. In this case, one often obtains rational approximants.

Fast

The extrapolated data often convolute to a kernel function. After data is extrapolated, the size of data is increased N times, here N is approximately 2–3. If this data needs to be convoluted to a known kernel function, the numerical calculations will increase N log(N) times even with fast Fourier transform (FFT). There exists an algorithm, it analytically calculates the contribution from the part of the extrapolated data. The calculation time can be omitted compared with the original convolution calculation. Hence with this algorithm the calculations of a convolution using the extrapolated data is nearly not increased. This is referred as the fast extrapolation. The fast extrapolation has been applied to CT image reconstruction. [7]

Extrapolation arguments

Extrapolation arguments are informal and unquantified arguments which assert that something is probably true beyond the range of values for which it is known to be true. For example, we believe in the reality of what we see through magnifying glasses because it agrees with what we see with the naked eye but extends beyond it; we believe in what we see through light microscopes because it agrees with what we see through magnifying glasses but extends beyond it; and similarly for electron microscopes. Such arguments are widely used in biology in extrapolating from animal studies to humans and from pilot studies to a broader population. [8]

Like slippery slope arguments, extrapolation arguments may be strong or weak depending on such factors as how far the extrapolation goes beyond the known range. [9]

See also

Notes

  1. Extrapolation, entry at Merriam–Webster
  2. J. Scott Armstrong; Fred Collopy (1993). "Causal Forces: Structuring Knowledge for Time-series Extrapolation". Journal of Forecasting. 12 (2): 103–115. CiteSeerX   10.1.1.42.40 . doi:10.1002/for.3980120205. S2CID   3233162 . Retrieved 2012-01-10.
  3. AIDSCJDUK.info Main Index
  4. J. Scott Armstrong (1984). "Forecasting by Extrapolation: Conclusions from Twenty-Five Years of Research". Interfaces. 14 (6): 52–66. CiteSeerX   10.1.1.715.6481 . doi:10.1287/inte.14.6.52. S2CID   5805521 . Retrieved 2012-01-10.
  5. V. Nos (2021). "Probnet: Geometric Extrapolation of Integer Sequences with error prediction" . Retrieved 2023-03-14.
  6. J. Scott Armstrong; Fred Collopy; J. Thomas Yokum (2004). "Decomposition by Causal Forces: A Procedure for Forecasting Complex Time Series". International Journal of Forecasting. 21: 25–36. doi:10.1016/j.ijforecast.2004.05.001. S2CID   8816023.
  7. Shuangren Zhao; Kang Yang; Xintie Yang (2011). "Reconstruction from truncated projections using mixed extrapolations of exponential and quadratic functions" (PDF). Journal of X-Ray Science and Technology. 19 (2): 155–72. doi:10.3233/XST-2011-0284. PMID   21606580. Archived from the original (PDF) on 2017-09-29. Retrieved 2014-06-03.
  8. Steel, Daniel (2007). Across the Boundaries: Extrapolation in Biology and Social Science. Oxford: Oxford University Press. ISBN   9780195331448.
  9. Franklin, James (2013). "Arguments whose strength depends on continuous variation". Informal Logic. 33 (1): 33–56. doi:10.22329/il.v33i1.3610 . Retrieved 29 June 2021.

Related Research Articles

<span class="mw-page-title-main">Interpolation</span> Method for estimating new data within known data points

In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points.

<span class="mw-page-title-main">B-spline</span> Spline function

In the mathematical subfield of numerical analysis, a B-spline or basis spline is a spline function that has minimal support with respect to a given degree, smoothness, and domain partition. Any spline function of given degree can be expressed as a linear combination of B-splines of that degree. Cardinal B-splines have knots that are equidistant from each other. B-splines can be used for curve-fitting and numerical differentiation of experimental data.

<span class="mw-page-title-main">Linear interpolation</span> Method of curve fitting to construct new data points within the range of known data points

In mathematics, linear interpolation is a method of curve fitting using linear polynomials to construct new data points within the range of a discrete set of known data points.

<span class="mw-page-title-main">Numerical integration</span> Methods of calculating definite integrals

In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral. The term numerical quadrature is more or less a synonym for "numerical integration", especially as applied to one-dimensional integrals. Some authors refer to numerical integration over more than one dimension as cubature; others take "quadrature" to include higher-dimensional integration.

In numerical analysis, polynomial interpolation is the interpolation of a given bivariate data set by the polynomial of lowest possible degree that passes through the points of the dataset.

In mathematics before the 1970s, the term umbral calculus referred to the surprising similarity between seemingly unrelated polynomial equations and certain shadowy techniques used to "prove" them. These techniques were introduced by John Blissard and are sometimes called Blissard's symbolic method. They are often attributed to Édouard Lucas, who used the technique extensively.

In the mathematical field of numerical analysis, a Newton polynomial, named after its inventor Isaac Newton, is an interpolation polynomial for a given set of data points. The Newton polynomial is sometimes called Newton's divided differences interpolation polynomial because the coefficients of the polynomial are calculated using Newton's divided differences method.

<span class="mw-page-title-main">Lagrange polynomial</span> Polynomials used for interpolation

In numerical analysis, the Lagrange interpolating polynomial is the unique polynomial of lowest degree that interpolates a given set of data.

Forecasting is the process of making predictions based on past and present data. Later these can be compared (resolved) against what happens. For example, a company might estimate their revenue in the next year, then compare it against the actual results creating a variance actual analysis. Prediction is a similar but more general term. Forecasting might refer to specific formal statistical methods employing time series, cross-sectional or longitudinal data, or alternatively to less formal judgmental methods or the process of prediction and resolution itself. Usage can vary between areas of application: for example, in hydrology the terms "forecast" and "forecasting" are sometimes reserved for estimates of values at certain specific future times, while the term "prediction" is used for more general estimates, such as the number of times floods will occur over a long period.

<span class="mw-page-title-main">Time series</span> Sequence of data points over time

In mathematics, a time series is a series of data points indexed in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Examples of time series are heights of ocean tides, counts of sunspots, and the daily closing value of the Dow Jones Industrial Average.

<span class="mw-page-title-main">Spline (mathematics)</span> Mathematical function defined piecewise by polynomials

In mathematics, a spline is a function defined piecewise by polynomials. In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree polynomials, while avoiding Runge's phenomenon for higher degrees.

In the mathematical field of numerical analysis, spline interpolation is a form of interpolation where the interpolant is a special type of piecewise polynomial called a spline. That is, instead of fitting a single, high-degree polynomial to all of the values at once, spline interpolation fits low-degree polynomials to small subsets of the values, for example, fitting nine cubic polynomials between each of the pairs of ten points, instead of fitting a single degree-nine polynomial to all of them. Spline interpolation is often preferred over polynomial interpolation because the interpolation error can be made small even when using low-degree polynomials for the spline. Spline interpolation also avoids the problem of Runge's phenomenon, in which oscillation can occur between points when interpolating using high-degree polynomials.

<span class="mw-page-title-main">Curve fitting</span> Process of constructing a curve that has the best fit to a series of data points

Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available, and to summarize the relationships among two or more variables. Extrapolation refers to the use of a fitted curve beyond the range of the observed data, and is subject to a degree of uncertainty since it may reflect the method used to construct the curve as much as it reflects the observed data.

<span class="mw-page-title-main">Regression analysis</span> Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In mathematics, trigonometric interpolation is interpolation with trigonometric polynomials. Interpolation is the process of finding a function which goes through some given data points. For trigonometric interpolation, this function has to be a trigonometric polynomial, that is, a sum of sines and cosines of given periods. This form is especially suited for interpolation of periodic functions.

<span class="mw-page-title-main">Local regression</span> Moving average and polynomial regression method for smoothing data

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS and LOWESS, both pronounced. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

In numerical analysis, multivariate interpolation is interpolation on functions of more than one variable ; when the variates are spatial coordinates, it is also known as spatial interpolation.

Group method of data handling (GMDH) is a family of inductive algorithms for computer-based mathematical modeling of multi-parametric datasets that features fully automatic structural and parametric optimization of models.

In statistics, polynomial regression is a form of regression analysis in which the relationship between the independent variable x and the dependent variable y is modelled as an nth degree polynomial in x. Polynomial regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, denoted E(y |x). Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function E(y | x) is linear in the unknown parameters that are estimated from the data. For this reason, polynomial regression is considered to be a special case of multiple linear regression.

References