Shanks transformation

Last updated November 29, 2023

In numerical analysis, the Shanks transformation is a non-linear series acceleration method to increase the rate of convergence of a sequence. This method is named after Daniel Shanks, who rediscovered this sequence transformation in 1955. It was first derived and published by R. Schmidt in 1941.^[1]

One can calculate only a few terms of a perturbation expansion, usually no more than two or three, and almost never more than seven. The resulting series is often slowly convergent, or even divergent. Yet those few terms contain a remarkable amount of information, which the investigator should do his best to extract.
This viewpoint has been persuasively set forth in a delightful paper by Shanks (1955), who displays a number of amazing examples, including several from fluid mechanics.

Milton D. Van Dyke (1975) Perturbation methods in fluid mechanics, p. 202.

Formulation

For a sequence $\left\{a_{m}\right\}_{m\in \mathbb {N} }$ the series

A=\sum _{m=0}^{\infty }a_{m}\,

is to be determined. First, the partial sum $A_{n}$ is defined as:

A_{n}=\sum _{m=0}^{n}a_{m}\,

and forms a new sequence $\left\{A_{n}\right\}_{n\in \mathbb {N} }$ . Provided the series converges, $A_{n}$ will also approach the limit $A$ as $n\to \infty .$ The Shanks transformation $S(A_{n})$ of the sequence $A_{n}$ is the new sequence defined by^[2]^[3]

S(A_{n})={\frac {A_{n+1}\,A_{n-1}\,-\,A_{n}^{2}}{A_{n+1}-2A_{n}+A_{n-1}}}=A_{n+1}-{\frac {(A_{n+1}-A_{n})^{2}}{(A_{n+1}-A_{n})-(A_{n}-A_{n-1})}}

where this sequence $S(A_{n})$ often converges more rapidly than the sequence $A_{n}.$ Further speed-up may be obtained by repeated use of the Shanks transformation, by computing $S^{2}(A_{n})=S(S(A_{n})),$ $S^{3}(A_{n})=S(S(S(A_{n}))),$ etc.

Note that the non-linear transformation as used in the Shanks transformation is essentially the same as used in Aitken's delta-squared process so that as with Aitken's method, the right-most expression in $S(A_{n})$ 's definition (i.e. $S(A_{n})=A_{n+1}-{\frac {(A_{n+1}-A_{n})^{2}}{(A_{n+1}-A_{n})-(A_{n}-A_{n-1})}}$ ) is more numerically stable than the expression to its left (i.e. $S(A_{n})={\frac {A_{n+1}\,A_{n-1}\,-\,A_{n}^{2}}{A_{n+1}-2A_{n}+A_{n-1}}}$ ). Both Aitken's method and the Shanks transformation operate on a sequence, but the sequence the Shanks transformation operates on is usually thought of as being a sequence of partial sums, although any sequence may be viewed as a sequence of partial sums.

Example

As an example, consider the slowly convergent series^[3]

4\sum _{k=0}^{\infty }(-1)^{k}{\frac {1}{2k+1}}=4\left(1-{\frac {1}{3}}+{\frac {1}{5}}-{\frac {1}{7}}+\cdots \right)

which has the exact sum π ≈ 3.14159265. The partial sum $A_{6}$ has only one digit accuracy, while six-figure accuracy requires summing about 400,000 terms.

In the table below, the partial sums $A_{n}$ , the Shanks transformation $S(A_{n})$ on them, as well as the repeated Shanks transformations $S^{2}(A_{n})$ and $S^{3}(A_{n})$ are given for $n$ up to 12. The figure to the right shows the absolute error for the partial sums and Shanks transformation results, clearly showing the improved accuracy and convergence rate.

$n$	$A_{n}$	$S(A_{n})$	$S^{2}(A_{n})$	$S^{3}(A_{n})$
0	4.00000000	—	—	—
1	2.66666667	3.16666667	—	—
2	3.46666667	3.13333333	3.14210526	—
3	2.89523810	3.14523810	3.14145022	3.14159936
4	3.33968254	3.13968254	3.14164332	3.14159086
5	2.97604618	3.14271284	3.14157129	3.14159323
6	3.28373848	3.14088134	3.14160284	3.14159244
7	3.01707182	3.14207182	3.14158732	3.14159274
8	3.25236593	3.14125482	3.14159566	3.14159261
9	3.04183962	3.14183962	3.14159086	3.14159267
10	3.23231581	3.14140672	3.14159377	3.14159264
11	3.05840277	3.14173610	3.14159192	3.14159266
12	3.21840277	3.14147969	3.14159314	3.14159265

The Shanks transformation $S(A_{1})$ already has two-digit accuracy, while the original partial sums only establish the same accuracy at $A_{24}.$ Remarkably, $S^{3}(A_{3})$ has six digits accuracy, obtained from repeated Shank transformations applied to the first seven terms $A_{0},\ldots ,A_{6}.$ As mentioned before, $A_{n}$ only obtains 6-digit accuracy after summing about 400,000 terms.

Motivation

The Shanks transformation is motivated by the observation that — for larger $n$ — the partial sum $A_{n}$ quite often behaves approximately as^[2]

A_{n}=A+\alpha q^{n},\,

with $|q|<1$ so that the sequence converges transiently to the series result $A$ for $n\to \infty .$ So for $n-1,$ $n$ and $n+1$ the respective partial sums are:

A_{n-1}=A+\alpha q^{n-1}\quad ,\qquad A_{n}=A+\alpha q^{n}\qquad {\text{and}}\qquad A_{n+1}=A+\alpha q^{n+1}.

These three equations contain three unknowns: $A,$ $\alpha$ and $q.$ Solving for $A$ gives^[2]

A={\frac {A_{n+1}\,A_{n-1}\,-\,A_{n}^{2}}{A_{n+1}-2A_{n}+A_{n-1}}}.

In the (exceptional) case that the denominator is equal to zero: then $A_{n}=A$ for all $n.$

Generalized Shanks transformation

The generalized kth-order Shanks transformation is given as the ratio of the determinants:^[4]

S_{k}(A_{n})={\frac {\begin{vmatrix}A_{n-k}&\cdots &A_{n-1}&A_{n}\\\Delta A_{n-k}&\cdots &\Delta A_{n-1}&\Delta A_{n}\\\Delta A_{n-k+1}&\cdots &\Delta A_{n}&\Delta A_{n+1}\\\vdots &&\vdots &\vdots \\\Delta A_{n-1}&\cdots &\Delta A_{n+k-2}&\Delta A_{n+k-1}\\\end{vmatrix}}{\begin{vmatrix}1&\cdots &1&1\\\Delta A_{n-k}&\cdots &\Delta A_{n-1}&\Delta A_{n}\\\Delta A_{n-k+1}&\cdots &\Delta A_{n}&\Delta A_{n+1}\\\vdots &&\vdots &\vdots \\\Delta A_{n-1}&\cdots &\Delta A_{n+k-2}&\Delta A_{n+k-1}\\\end{vmatrix}}},

with $\Delta A_{p}=A_{p+1}-A_{p}.$ It is the solution of a model for the convergence behaviour of the partial sums $A_{n}$ with $k$ distinct transients:

A_{n}=A+\sum _{p=1}^{k}\alpha _{p}q_{p}^{n}.

This model for the convergence behaviour contains $2k+1$ unknowns. By evaluating the above equation at the elements $A_{n-k},A_{n-k+1},\ldots ,A_{n+k}$ and solving for $A,$ the above expression for the kth-order Shanks transformation is obtained. The first-order generalized Shanks transformation is equal to the ordinary Shanks transformation: $S_{1}(A_{n})=S(A_{n}).$

The generalized Shanks transformation is closely related to Padé approximants and Padé tables.^[4]

Note: The calculation of determinants requires many arithmetic operations to make, however Peter Wynn discovered a recursive evaluation procedure called epsilon-algorithm which avoids calculating the determinants.^[5]^[6]

Notes

↑ Weniger (2003).
1 2 3 Bender & Orszag (1999), pp. 368–375.
1 2 Van Dyke (1975), pp. 202–205.
1 2 Bender & Orszag (1999), pp. 389–392.
↑ Wynn (1956)
↑ Wynn (1962)

Related Research Articles

In mathematical analysis, the Dirac delta distribution, also known as the unit impulse, is a generalized function or distribution over the real numbers, whose value is zero everywhere except at zero, and whose integral over the entire real line is equal to one.

In probability theory, the central limit theorem (CLT) establishes that, in many situations, for independent and identically distributed random variables, the sampling distribution of the standardized sample mean tends towards the standard normal distribution even if the original variables themselves are not normally distributed.

Noether's theorem or Noether's first theorem states that every differentiable symmetry of the action of a physical system with conservative forces has a corresponding conservation law. The theorem was proven by mathematician Emmy Noether in 1915 and published in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries over physical space.

In mathematics, a generating function is a way of encoding an infinite sequence of numbers by treating them as the coefficients of a formal power series. This series is called the generating function of the sequence. Unlike an ordinary series, the formal power series is not required to converge: in fact, the generating function is not actually regarded as a function, and the "variable" remains an indeterminate. Generating functions were first introduced by Abraham de Moivre in 1730, in order to solve the general linear recurrence problem. One can generalize to formal power series in more than one indeterminate, to encode information about infinite multi-dimensional arrays of numbers.

In mathematics, particularly in linear algebra, tensor analysis, and differential geometry, the Levi-Civita symbol or Levi-Civita epsilon represents a collection of numbers; defined from the sign of a permutation of the natural numbers $1, 2, ..., n$ , for some positive integer $n$ . It is named after the Italian mathematician and physicist Tullio Levi-Civita. Other names include the permutation symbol, antisymmetric symbol, or alternating symbol, which refer to its antisymmetric property and definition in terms of permutations.

In calculus, the product rule is a formula used to find the derivatives of products of two or more functions. For two functions, it may be stated in Lagrange's notation as

In mathematics, the Hodge star operator or Hodge star is a linear map defined on the exterior algebra of a finite-dimensional oriented vector space endowed with a nondegenerate symmetric bilinear form. Applying the operator to an element of the algebra produces the Hodge dual of the element. This map was introduced by W. V. D. Hodge.

In the physical science of dynamics, rigid-body dynamics studies the movement of systems of interconnected bodies under the action of external forces. The assumption that the bodies are rigid simplifies analysis, by reducing the parameters that describe the configuration of the system to the translation and rotation of reference frames attached to each body. This excludes bodies that display fluid, highly elastic, and plastic behavior.

In mathematics, the Poisson summation formula is an equation that relates the Fourier series coefficients of the periodic summation of a function to values of the function's continuous Fourier transform. Consequently, the periodic summation of a function is completely defined by discrete samples of the original function's Fourier transform. And conversely, the periodic summation of a function's Fourier transform is completely defined by discrete samples of the original function. The Poisson summation formula was discovered by Siméon Denis Poisson and is sometimes called Poisson resummation.

In Hamiltonian mechanics, a canonical transformation is a change of canonical coordinates $(q, p, t) \to$ that preserves the form of Hamilton's equations. This is sometimes known as form invariance. It need not preserve the form of the Hamiltonian itself. Canonical transformations are useful in their own right, and also form the basis for the Hamilton–Jacobi equations and Liouville's theorem.

In mathematics, a divergent series is an infinite series that is not convergent, meaning that the infinite sequence of the partial sums of the series does not have a finite limit.

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In numerical analysis, Aitken's delta-squared process or Aitken extrapolation is a series acceleration method, used for accelerating the rate of convergence of a sequence. It is named after Alexander Aitken, who introduced this method in 1926. Its early form was known to Seki Kōwa and was found for rectification of the circle, i.e. the calculation of π. It is most useful for accelerating the convergence of a sequence that is converging linearly.

In mathematics, series acceleration is one of a collection of sequence transformations for improving the rate of convergence of a series. Techniques for series acceleration are often applied in numerical analysis, where they are used to improve the speed of numerical integration. Series acceleration techniques may also be used, for example, to obtain a variety of identities on special functions. Thus, the Euler transform applied to the hypergeometric series gives some of the classic, well-known hypergeometric series identities.

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

In computational chemistry, a constraint algorithm is a method for satisfying the Newtonian motion of a rigid body which consists of mass points. A restraint algorithm is used to ensure that the distance between mass points is maintained. The general steps involved are: (i) choose novel unconstrained coordinates, (ii) introduce explicit constraint forces, (iii) minimize constraint forces implicitly by the technique of Lagrange multipliers or projection methods.

In classical mechanics, Appell's equation of motion is an alternative general formulation of classical mechanics described by Josiah Willard Gibbs in 1879 and Paul Émile Appell in 1900.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

In mathematics, Capelli's identity, named after Alfredo Capelli (1887), is an analogue of the formula det(AB) = det(A) det(B), for certain matrices with noncommuting entries, related to the representation theory of the Lie algebra $. It can be used to relate an invariant ƒ to the invariant Ω ƒ, where Ω is Cayley's Ω process.$

<span class="mw-page-title-main">Lagrangian mechanics</span> Formulation of classical mechanics

In physics, Lagrangian mechanics is a formulation of classical mechanics founded on the stationary-action principle. It was introduced by the Italian-French mathematician and astronomer Joseph-Louis Lagrange in his presentation to the Turin Academy of Science in 1760 culminating in his 1788 grand opus, Mécanique analytique.

References

Shanks, D. (1955), "Non-linear transformation of divergent and slowly convergent sequences", Journal of Mathematics and Physics, 34: 1–42, doi:10.1002/sapm19553411
Schmidt, R.J. (1941), "On the numerical solution of linear simultaneous equations by an iterative method", Philosophical Magazine, 32 (214): 369–383, doi:10.1080/14786444108520797
Van Dyke, M.D. (1975), Perturbation methods in fluid mechanics (annotated ed.), Parabolic Press, ISBN 0-915760-01-0
Bender, C.M.; Orszag, S.A. (1999), Advanced mathematical methods for scientists and engineers, Springer, ISBN 0-387-98931-5
Weniger, E.J. (1989). "Nonlinear sequence transformations for the acceleration of convergence and the summation of divergent series". Computer Physics Reports. 10 (5–6): 189–371. arXiv: math.NA/0306302 . Bibcode:1989CoPhR..10..189W. doi:10.1016/0167-7977(89)90011-7.
Brezinski, C.; Redivo-Zaglia, M.; Saad, Y. (2018), "Shanks sequence transformations and Anderson acceleration", SIAM Review, 60 (3): 646–669, doi:10.1137/17M1120725, hdl: 11577/3270110
Senhadji, M.N. (2001), "On condition numbers of the Shanks transformation", J. Comput. Appl. Math., 135: 41–61
Wynn, P. (1956), "On a device for computing the e_m(S_n) transformation", Mathematical Tables and Other Aids to Computation, 10 (54): 91–96, doi:10.2307/2002183
Wynn, P. (1962), "Acceleration techniques for iterated vector and matrix problems", Math. Comp., 16: 301–322

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Weniger (2003).

[BenderOrszag368-2] 1 2 3 Bender & Orszag (1999), pp. 368–375.

[VanDyke-3] 1 2 Van Dyke (1975), pp. 202–205.

[BenderOrszag389-4] 1 2 Bender & Orszag (1999), pp. 389–392.

[5] Wynn (1956)

[6] Wynn (1962)

[1]

[2]

[3]

[4]

[5]

[6]