Steffensen's method

Last updated

In numerical analysis, Steffensen's method is an iterative method for root-finding named after Johan Frederik Steffensen which is similar to Newton's method, but with certain situational advantages. In particular, Steffensen's method achieves similar quadratic convergence, but without using derivatives as Newton's method does.

Contents

Simple description

The simplest form of the formula for Steffensen's method occurs when it is used to find a zero of a real function that is, to find the real value that satisfies Near the solution the function is supposed to approximately satisfy this condition makes adequate as a correction-function for for finding its own solution, although it is not required to work efficiently. For some functions, Steffensen's method can work even if this condition is not met, but in such a case, the starting value must be very close to the actual solution and convergence to the solution may be slow. Adjustments of the method's step size, mentioned later, can improve convergence in some of these cases.

Given an adequate starting value a sequence of values can be generated using the formula below. When it works, each value in the sequence is much closer to the solution than the prior value. The value from the current step generates the value for the next step, via this formula: [1]

for where the slope function is a composite of the original function given by the following formula:

or perhaps more clearly,

where is a step-size between the last iteration point, and an auxiliary point located at

Technically, the function is called the first-order divided difference of between those two points (either forward type or backward type, depending on the sign of ). Practically, it is the averaged value of the slope of the function between the last sequence point and the auxiliary point at with step of size (and direction)

Because the value of is an approximation for its value can optionally be checked to see if it meets the condition which is required of to guarantee convergence of Steffensen's algorithm. Although slight non-conformance may not necessarily be dire, any large departure from the condition warns that Steffensen's method is liable to fail, and temporary use of some fallback algorithm is warranted (e.g. the more robust Illinois algorithm, or plain regula falsi).

It is only for the purpose of finding for this auxiliary point that the value of the function must be an adequate correction to get closer to its own solution, and for that reason fulfill the requirement that For all other parts of the calculation, Steffensen's method only requires the function to be continuous and to actually have a nearby solution. [1] Several modest modifications of the step in the formula for the slope exist, such as multiplying it by  1 /2 or  3 /4, to accommodate functions that do not quite meet the requirement.

Advantages and drawbacks

The main advantage of Steffensen's method is that it has quadratic convergence [1] like Newton's method that is, both methods find roots to an equation just as 'quickly'. In this case quickly means that for both methods, the number of correct digits in the answer doubles with each step. But the formula for Newton's method requires evaluation of the function's derivative as well as the function while Steffensen's method only requires itself. This is important when the derivative is not easily or efficiently available.

The price for the quick convergence is the double function evaluation: Both and must be calculated, which might be time-consuming if is a complicated function. For comparison, the secant method needs only one function evaluation per step. The secant method increases the number of correct digits by "only" a factor of roughly 1.6 per step, but one can do twice as many steps of the secant method within a given time. Since the secant method can carry out twice as many steps in the same time as Steffensen's method, [lower-alpha 1] in practical use the secant method actually converges faster than Steffensen's method, when both algorithms succeed: The secant method achieves a factor of about (1.6)2 ≈ 2.6 times as many digits for every two steps (two function evaluations), compared to Steffensen's factor of 2 for every one step (two function evaluations).

Similar to most other iterative root-finding algorithms, the crucial weakness in Steffensen's method is choosing an 'adequate' starting value If the value of is not 'close enough' to the actual solution the method may fail and the sequence of values may either erratically flip-flop between two extremes, or diverge to infinity, or both.

Derivation using Aitken's delta-squared process

The version of Steffensen's method implemented in the MATLAB code shown below can be found using the Aitken's delta-squared process for accelerating convergence of a sequence. To compare the following formulae to the formulae in the section above, notice that This method assumes starting with a linearly convergent sequence and increases the rate of convergence of that sequence. If the signs of agree and is 'sufficiently close' to the desired limit of the sequence we can assume the following:

then

so

and hence

Solving for the desired limit of the sequence gives:

which results in the more rapidly convergent sequence:

Code example

In Matlab

Here is the source for an implementation of Steffensen's Method in MATLAB.

functionSteffensen(f, p0, tol)% This function takes as inputs: a fixed point iteration function, f, % and initial guess to the fixed point, p0, and a tolerance, tol.% The fixed point iteration function is assumed to be input as an% inline function. % This function will calculate and return the fixed point, p, % that makes the expression f(x) = p true to within the desired % tolerance, tol.formatcompact% This shortens the output.formatlong% This prints more decimal places.fori=1:1000% get ready to do a large, but finite, number of iterations.% This is so that if the method fails to converge, we won't% be stuck in an infinite loop.p1=f(p0);% calculate the next two guesses for the fixed point.p2=f(p1);p=p0-(p1-p0)^2/(p2-2*p1+p0)% use Aitken's delta squared method to% find a better approximation to p0.ifabs(p-p0)<tol% test to see if we are within tolerance.break% if we are, stop the iterations, we have our answer.endp0=p;% update p0 for the next iteration.endifabs(p-p0)>tol% If we fail to meet the tolerance, we output a% message of failure.'failed to converge in 1000 iterations.'end

In Python

Here is the source for an implementation of Steffensen's Method in Python.

fromtypingimportCallable,IteratorFunc=Callable[[float],float]defg(f:Func,x:float,fx:float)->Func:"""First-order divided difference function.    Arguments:        f: Function input to g        x: Point at which to evaluate g        fx: Function f evaluated at x     """returnlambdax:f(x+fx)/fx-1defsteff(f:Func,x:float)->Iterator[float]:"""Steffenson algorithm for finding roots.    This recursive generator yields the x_{n+1} value first then, when the generator iterates,    it yields x_{n+2} from the next level of recursion.    Arguments:        f: Function whose root we are searching for        x: Starting value upon first call, each level n that the function recurses x is x_n    """whileTrue:fx=f(x)gx=g(f,x,fx)(x)ifgx==0:breakelse:x=x-fx/gx# Update to x_{n+1}yieldx# Yield value

Generalization to Banach space

Steffensen's method can also be used to find an input for a different kind of function that produces output the same as its input: for the special value Solutions like are called fixed points . Many of these functions can be used to find their own solutions by repeatedly recycling the result back as input, but the rate of convergence can be slow, or the function can fail to converge at all, depending on the individual function. Steffensen's method accelerates this convergence, to make it quadratic.

For orientation, the root function and the fixed-point functions are simply related by where is some scalar constant small enough in magnitude to make stable under iteration, but large enough for the non-linearity of the function to be appreciable. All issues of a more general Banach space vs. basic real numbers being momentarily ignored for the sake of the comparison.

This method for finding fixed points of a real-valued function has been generalised for functions that map a Banach space onto itself or even more generally that map from one Banach space into another Banach space The generalized method assumes that a family of bounded linear operators associated with and can be devised that (locally) satisfies this condition: [2]

eqn. (𝄋)

If division is possible in the Banach space, the linear operator can be obtained from

which may provide some insight: Expressed in this way, the linear operator can be more easily seen to be an elaborate version of the divided difference discussed in the first section, above. The quotient form is shown here for orientation only; it is not required per se. Note also that division within the Banach space is not necessary for the elaborated Steffensen's method to be viable; the only requirement is that the operator satisfy the equation marked with the segno, (𝄋).

For the basic real number function , given in the first section, the function simply takes in and puts out real numbers. There, the function is a divided difference . In the generalized form here, the operator is the analogue of a divided difference for use in the Banach space. The operator is roughly equivalent to a matrix whose entries are all functions of vector arguments and .

Steffensen's method is then very similar to the Newton's method, except that it uses the divided difference instead of the derivative Note that for arguments close to some fixed point , fixed point functions and their linear operators meeting the marked (𝄋) condition, where is the identity operator.

In the case that division is possible in the Banach space, the generalized iteration formula is given by

for In the more general case in which division may not be possible, the iteration formula requires finding a solution close to for which

Equivalently one may seek the solution to the somewhat reduced form

with all the values inside square brackets being independent of they depend only on Be aware, however, that the second form may not be as numerically stable as the first: Because the first involves finding a value for a (hopefully) small difference it may be numerically more likely to only cause modest changes to the iterated value

If the linear operator satisfies

for some positive real constant then the method converges quadratically to a fixed point of if the initial approximation is 'sufficiently close' to the desired solution that satisfies

Notes

  1. Because requires the prior calculation of the two evaluations must be done sequentially – the algorithm per se cannot be made faster by running the function evaluations in parallel. This is yet another disadvantage of Steffensen's method.

Related Research Articles

In mathematics, the Euler–Maclaurin formula is a formula for the difference between an integral and a closely related sum. It can be used to approximate integrals by finite sums, or conversely to evaluate finite sums and infinite series using integrals and the machinery of calculus. For example, many asymptotic expansions are derived from the formula, and Faulhaber's formula for the sum of powers is an immediate consequence.

In integral calculus, an elliptic integral is one of a number of related functions defined as the value of certain integrals, which were first studied by Giulio Fagnano and Leonhard Euler. Their name originates from their originally arising in connection with the problem of finding the arc length of an ellipse.

<span class="mw-page-title-main">Newton's method</span> Algorithm for finding zeros of functions

In numerical analysis, Newton's method, also known as the Newton–Raphson method, named after Isaac Newton and Joseph Raphson, is a root-finding algorithm which produces successively better approximations to the roots of a real-valued function. The most basic version starts with a real-valued function f, its derivative f, and an initial guess x0 for a root of f. If f satisfies certain assumptions and the initial guess is close, then

<span class="mw-page-title-main">Fourier transform</span> Mathematical transform that expresses a function of time as a function of frequency

In physics, engineering and mathematics, the Fourier transform (FT) is an integral transform that takes as input a function and outputs another function that describes the extent to which various frequencies are present in the original function. The output of the transform is a complex-valued function of frequency. The term Fourier transform refers to both this complex-valued function and the mathematical operation. When a distinction needs to be made the Fourier transform is sometimes called the frequency domain representation of the original function. The Fourier transform is analogous to decomposing the sound of a musical chord into the intensities of its constituent pitches.

In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.

In mathematical analysis, Hölder's inequality, named after Otto Hölder, is a fundamental inequality between integrals and an indispensable tool for the study of Lp spaces.

Spectral methods are a class of techniques used in applied mathematics and scientific computing to numerically solve certain differential equations. The idea is to write the solution of the differential equation as a sum of certain "basis functions" and then to choose the coefficients in the sum in order to satisfy the differential equation as well as possible.

In mathematical analysis, Fubini's theorem is a result that gives conditions under which it is possible to compute a double integral by using an iterated integral, introduced by Guido Fubini in 1907. It states that if a function is integrable on a rectangle , then one can evaluate the double integral as an iterated integral:

In mathematics and its applications, a Sturm–Liouville problem is a second-order linear ordinary differential equation of the form:

In mathematics and signal processing, the Hilbert transform is a specific singular integral that takes a function, u(t) of a real variable and produces another function of a real variable H(u)(t). The Hilbert transform is given by the Cauchy principal value of the convolution with the function (see § Definition). The Hilbert transform has a particularly simple representation in the frequency domain: It imparts a phase shift of ±90° (π/2 radians) to every frequency component of a function, the sign of the shift depending on the sign of the frequency (see § Relationship with the Fourier transform). The Hilbert transform is important in signal processing, where it is a component of the analytic representation of a real-valued signal u(t). The Hilbert transform was first introduced by David Hilbert in this setting, to solve a special case of the Riemann–Hilbert problem for analytic functions.

Verlet integration is a numerical method used to integrate Newton's equations of motion. It is frequently used to calculate trajectories of particles in molecular dynamics simulations and computer graphics. The algorithm was first used in 1791 by Jean Baptiste Delambre and has been rediscovered many times since then, most recently by Loup Verlet in the 1960s for use in molecular dynamics. It was also used by P. H. Cowell and A. C. C. Crommelin in 1909 to compute the orbit of Halley's Comet, and by Carl Størmer in 1907 to study the trajectories of electrical particles in a magnetic field . The Verlet integrator provides good numerical stability, as well as other properties that are important in physical systems such as time reversibility and preservation of the symplectic form on phase space, at no significant additional computational cost over the simple Euler method.

<span class="mw-page-title-main">Conjugate gradient method</span> Mathematical optimization algorithm

In mathematics, the conjugate gradient method is an algorithm for the numerical solution of particular systems of linear equations, namely those whose matrix is positive-semidefinite. The conjugate gradient method is often implemented as an iterative algorithm, applicable to sparse systems that are too large to be handled by a direct implementation or other direct methods such as the Cholesky decomposition. Large sparse systems often arise when numerically solving partial differential equations or optimization problems.

In probability theory and statistics, the generalized extreme value (GEV) distribution is a family of continuous probability distributions developed within extreme value theory to combine the Gumbel, Fréchet and Weibull families also known as type I, II and III extreme value distributions. By the extreme value theorem the GEV distribution is the only possible limit distribution of properly normalized maxima of a sequence of independent and identically distributed random variables. Note that a limit distribution needs to exist, which requires regularity conditions on the tail of the distribution. Despite this, the GEV distribution is often used as an approximation to model the maxima of long (finite) sequences of random variables.

In mathematics, a stiff equation is a differential equation for which certain numerical methods for solving the equation are numerically unstable, unless the step size is taken to be extremely small. It has proven difficult to formulate a precise definition of stiffness, but the main idea is that the equation includes some terms that can lead to rapid variation in the solution.

In the mathematical field of analysis, the Nash–Moser theorem, discovered by mathematician John Forbes Nash and named for him and Jürgen Moser, is a generalization of the inverse function theorem on Banach spaces to settings when the required solution mapping for the linearized problem is not bounded.

<span class="mw-page-title-main">Euler method</span> Approach to finding numerical solutions of ordinary differential equations

In mathematics and computational science, the Euler method is a first-order numerical procedure for solving ordinary differential equations (ODEs) with a given initial value. It is the most basic explicit method for numerical integration of ordinary differential equations and is the simplest Runge–Kutta method. The Euler method is named after Leonhard Euler, who first proposed it in his book Institutionum calculi integralis.

In numerical analysis, fixed-point iteration is a method of computing fixed points of a function.

In numerical analysis, Aitken's delta-squared process or Aitken extrapolation is a series acceleration method, used for accelerating the rate of convergence of a sequence. It is named after Alexander Aitken, who introduced this method in 1926. Its early form was known to Seki Kōwa and was found for rectification of the circle, i.e. the calculation of π. It is most useful for accelerating the convergence of a sequence that is converging linearly.

In statistics, the Fisher–Tippett–Gnedenko theorem is a general result in extreme value theory regarding asymptotic distribution of extreme order statistics. The maximum of a sample of iid random variables after proper renormalization can only converge in distribution to one of only 3 possible distribution families: the Gumbel distribution, the Fréchet distribution, or the Weibull distribution. Credit for the extreme value theorem and its convergence details are given to Fréchet (1927), Fisher and Tippett (1928), Mises (1936), and Gnedenko (1943).

(Stochastic) variance reduction is an algorithmic approach to minimizing functions that can be decomposed into finite sums. By exploiting the finite sum structure, variance reduction techniques are able to achieve convergence rates that are impossible to achieve with methods that treat the objective as an infinite sum, as in the classical Stochastic approximation setting.

References

  1. 1 2 3 Dahlquist, Germund; Björck, Åke (1974). Numerical Methods . Translated by Anderson, Ned. Englewood Cliffs, NJ: Prentice Hall. pp.  230–231.
  2. Johnson, L.W.; Scholz, D.R. (June 1968). "On Steffensen's method". SIAM Journal on Numerical Analysis. 5 (2): 296–302. doi:10.1137/0705026. JSTOR   2949443.