Alternating-direction implicit method

Last updated

In numerical linear algebra, the alternating-direction implicit (ADI) method is an iterative method used to solve Sylvester matrix equations. It is a popular method for solving the large matrix equations that arise in systems theory and control, [1] and can be formulated to construct solutions in a memory-efficient, factored form. [2] [3] It is also used to numerically solve parabolic and elliptic partial differential equations, and is a classic method used for modeling heat conduction and solving the diffusion equation in two or more dimensions. [4] It is an example of an operator splitting method. [5]

Contents

ADI for matrix equations

The method

The ADI method is a two step iteration process that alternately updates the column and row spaces of an approximate solution to . One ADI iteration consists of the following steps: [6]

1. Solve for , where

2. Solve for , where .

The numbers are called shift parameters, and convergence depends strongly on the choice of these parameters. [7] [8] To perform iterations of ADI, an initial guess is required, as well as shift parameters, .

When to use ADI

If and , then can be solved directly in using the Bartels-Stewart method. [9] It is therefore only beneficial to use ADI when matrix-vector multiplication and linear solves involving and can be applied cheaply.

The equation has a unique solution if and only if , where is the spectrum of . [1] However, the ADI method performs especially well when and are well-separated, and and are normal matrices. These assumptions are met, for example, by the Lyapunov equation when is positive definite. Under these assumptions, near-optimal shift parameters are known for several choices of and . [7] [8] Additionally, a priori error bounds can be computed, thereby eliminating the need to monitor the residual error in implementation.

The ADI method can still be applied when the above assumptions are not met. The use of suboptimal shift parameters may adversely affect convergence, [1] and convergence is also affected by the non-normality of or (sometimes advantageously). [10] Krylov subspace methods, such as the Rational Krylov Subspace Method, [11] are observed to typically converge more rapidly than ADI in this setting, [1] [3] and this has led to the development of hybrid ADI-projection methods. [3]

Shift-parameter selection and the ADI error equation

The problem of finding good shift parameters is nontrivial. This problem can be understood by examining the ADI error equation. After iterations, the error is given by

Choosing results in the following bound on the relative error:

where is the operator norm. The ideal set of shift parameters defines a rational function that minimizes the quantity . If and are normal matrices and have eigendecompositions and , then

.

Near-optimal shift parameters

Near-optimal shift parameters are known in certain cases, such as when and , where and are disjoint intervals on the real line. [7] [8] The Lyapunov equation , for example, satisfies these assumptions when is positive definite. In this case, the shift parameters can be expressed in closed form using elliptic integrals, and can easily be computed numerically.

More generally, if closed, disjoint sets and , where and , are known, the optimal shift parameter selection problem is approximately solved by finding an extremal rational function that attains the value

where the infimum is taken over all rational functions of degree . [8] This approximation problem is related to several results in potential theory, [12] [13] and was solved by Zolotarev in 1877 for = [a, b] and [14] The solution is also known when and are disjoint disks in the complex plane. [15]

Heuristic shift-parameter strategies

When less is known about and , or when or are non-normal matrices, it may not be possible to find near-optimal shift parameters. In this setting, a variety of strategies for generating good shift parameters can be used. These include strategies based on asymptotic results in potential theory, [16] using the Ritz values of the matrices , , , and to formulate a greedy approach, [17] and cyclic methods, where the same small collection of shift parameters are reused until a convergence tolerance is met. [17] [10] When the same shift parameter is used at every iteration, ADI is equivalent to an algorithm called Smith's method. [18]

Factored ADI

In many applications, and are very large, sparse matrices, and can be factored as , where , with . [1] In such a setting, it may not be feasible to store the potentially dense matrix explicitly. A variant of ADI, called factored ADI, [3] [2] can be used to compute , where . The effectiveness of factored ADI depends on whether is well-approximated by a low rank matrix. This is known to be true under various assumptions about and . [10] [8]

ADI for parabolic equations

Historically, the ADI method was developed to solve the 2D diffusion equation on a square domain using finite differences. [4] Unlike ADI for matrix equations, ADI for parabolic equations does not require the selection of shift parameters, since the shift appearing in each iteration is determined by parameters such as the timestep, diffusion coefficient, and grid spacing. The connection to ADI on matrix equations can be observed when one considers the action of the ADI iteration on the system at steady state.

Example: 2D diffusion equation

Stencil figure for the alternating direction implicit method in finite difference equations ADI-stencil.svg
Stencil figure for the alternating direction implicit method in finite difference equations

The traditional method for solving the heat conduction equation numerically is the Crank–Nicolson method. This method results in a very complicated set of equations in multiple dimensions, which are costly to solve. The advantage of the ADI method is that the equations that have to be solved in each step have a simpler structure and can be solved efficiently with the tridiagonal matrix algorithm.

Consider the linear diffusion equation in two dimensions,

The implicit Crank–Nicolson method produces the following finite difference equation:

where:

and is the central second difference operator for the p-th coordinate

with or for or respectively (and a shorthand for lattice points ).

After performing a stability analysis, it can be shown that this method will be stable for any .

A disadvantage of the Crank–Nicolson method is that the matrix in the above equation is banded with a band width that is generally quite large. This makes direct solution of the system of linear equations quite costly (although efficient approximate solutions exist, for example use of the conjugate gradient method preconditioned with incomplete Cholesky factorization).

The idea behind the ADI method is to split the finite difference equations into two, one with the x-derivative taken implicitly and the next with the y-derivative taken implicitly,

The system of equations involved is symmetric and tridiagonal (banded with bandwidth 3), and is typically solved using tridiagonal matrix algorithm.

It can be shown that this method is unconditionally stable and second order in time and space. [19] There are more refined ADI methods such as the methods of Douglas, [20] or the f-factor method [21] which can be used for three or more dimensions.

Generalizations

The usage of the ADI method as an operator splitting scheme can be generalized. That is, we may consider general evolution equations

where and are (possibly nonlinear) operators defined on a Banach space. [22] [23] In the diffusion example above we have and .

Fundamental ADI (FADI)

Simplification of ADI to FADI

It is possible to simplify the conventional ADI method into Fundamental ADI method, which only has the similar operators at the left-hand sides while being operator-free at the right-hand sides. This may be regarded as the fundamental (basic) scheme of ADI method, [24] [25] with no more operator (to be reduced) at the right-hand sides, unlike most traditional implicit methods that usually consist of operators at both sides of equations. The FADI method leads to simpler, more concise and efficient update equations without degrading the accuracy of conventional ADI method.

Relations to other implicit methods

Many classical implicit methods by Peaceman-Rachford, Douglas-Gunn, D'Yakonov, Beam-Warming, Crank-Nicolson, etc., may be simplified to fundamental implicit schemes with operator-free right-hand sides. [25] In their fundamental forms, the FADI method of second-order temporal accuracy can be related closely to the fundamental locally one-dimensional (FLOD) method, which can be upgraded to second-order temporal accuracy, such as for three-dimensional Maxwell's equations [26] [27] in computational electromagnetics. For two- and three-dimensional heat conduction and diffusion equations, both FADI and FLOD methods may be implemented in simpler, more efficient and stable manner compared to their conventional methods. [28] [29]

Related Research Articles

<span class="mw-page-title-main">Least squares</span> Approximation method in statistics

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

Linear elasticity is a mathematical model of how solid objects deform and become internally stressed by prescribed loading conditions. It is a simplification of the more general nonlinear theory of elasticity and a branch of continuum mechanics.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

<span class="mw-page-title-main">Total least squares</span> Statistical technique

In applied statistics, total least squares is a type of errors-in-variables regression, a least squares data modeling technique in which observational errors on both dependent and independent variables are taken into account. It is a generalization of Deming regression and also of orthogonal regression, and can be applied to both linear and non-linear models.

<span class="mw-page-title-main">Gauss–Newton algorithm</span> Mathematical algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the components of the sum, and thus minimizing the sum. In this sense, the algorithm is also an effective method for solving overdetermined systems of equations. It has the advantage that second derivatives, which can be challenging to compute, are not required.

<span class="mw-page-title-main">Large eddy simulation</span> Mathematical model for turbulence

Large eddy simulation (LES) is a mathematical model for turbulence used in computational fluid dynamics. It was initially proposed in 1963 by Joseph Smagorinsky to simulate atmospheric air currents, and first explored by Deardorff (1970). LES is currently applied in a wide variety of engineering applications, including combustion, acoustics, and simulations of the atmospheric boundary layer.

In mathematics, the discrete Laplace operator is an analog of the continuous Laplace operator, defined so that it has meaning on a graph or a discrete grid. For the case of a finite-dimensional graph, the discrete Laplace operator is more commonly called the Laplacian matrix.

In numerical analysis, the Crank–Nicolson method is a finite difference method used for numerically solving the heat equation and similar partial differential equations. It is a second-order method in time. It is implicit in time, can be written as an implicit Runge–Kutta method, and it is numerically stable. The method was developed by John Crank and Phyllis Nicolson in the 1940s.

In mathematics, the convergence condition by Courant–Friedrichs–Lewy is a necessary condition for convergence while solving certain partial differential equations numerically. It arises in the numerical analysis of explicit time integration schemes, when these are used for the numerical solution. As a consequence, the time step must be less than a certain upper bound, given a fixed spatial increment, in many explicit time-marching computer simulations; otherwise, the simulation produces incorrect or unstable results. The condition is named after Richard Courant, Kurt Friedrichs, and Hans Lewy who described it in their 1928 paper.

<span class="mw-page-title-main">Yang–Baxter equation</span> Quantum consistency equation

In physics, the Yang–Baxter equation is a consistency equation which was first introduced in the field of statistical mechanics. It depends on the idea that in some scattering situations, particles may preserve their momentum while changing their quantum internal states. It states that a matrix , acting on two out of three objects, satisfies

In numerical analysis, finite-difference methods (FDM) are a class of numerical techniques for solving differential equations by approximating derivatives with finite differences. Both the spatial domain and time domain are discretized, or broken into a finite number of intervals, and the values of the solution at the end points of the intervals are approximated by solving algebraic equations containing finite differences and values from nearby points.

In mathematics, an eigenvalue perturbation problem is that of finding the eigenvectors and eigenvalues of a system that is perturbed from one with known eigenvectors and eigenvalues . This is useful for studying how sensitive the original system's eigenvectors and eigenvalues are to changes in the system. This type of analysis was popularized by Lord Rayleigh, in his investigation of harmonic vibrations of a string perturbed by small inhomogeneities.

Equilibrium constants are determined in order to quantify chemical equilibria. When an equilibrium constant K is expressed as a concentration quotient,

In mathematics, some boundary value problems can be solved using the methods of stochastic analysis. Perhaps the most celebrated example is Shizuo Kakutani's 1944 solution of the Dirichlet problem for the Laplace operator using Brownian motion. However, it turns out that for a large class of semi-elliptic second-order partial differential equations the associated Dirichlet boundary value problem can be solved using an Itō process that solves an associated stochastic differential equation.

<span class="mw-page-title-main">Non-random two-liquid model</span>

The non-random two-liquid model is an activity coefficient model introduced by Renon and Prausnitz in 1968 that correlates the activity coefficients of a compound with its mole fractions in the liquid phase concerned. It is frequently applied in the field of chemical engineering to calculate phase equilibria. The concept of NRTL is based on the hypothesis of Wilson, who stated that the local concentration around a molecule in most mixtures is different from the bulk concentration. This difference is due to a difference between the interaction energy of the central molecule with the molecules of its own kind and that with the molecules of the other kind . The energy difference also introduces a non-randomness at the local molecular level. The NRTL model belongs to the so-called local-composition models. Other models of this type are the Wilson model, the UNIQUAC model, and the group contribution model UNIFAC. These local-composition models are not thermodynamically consistent for a one-fluid model for a real mixture due to the assumption that the local composition around molecule i is independent of the local composition around molecule j. This assumption is not true, as was shown by Flemr in 1976. However, they are consistent if a hypothetical two-liquid model is used. Models, which have consistency between bulk and the local molecular concentrations around different types of molecules are COSMO-RS, and COSMOSPACE.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

<span class="mw-page-title-main">UNIQUAC</span> Model of phase equilibrium in statistical thermodynamics

In statistical thermodynamics, UNIQUAC is an activity coefficient model used in description of phase equilibria. The model is a so-called lattice model and has been derived from a first order approximation of interacting molecule surfaces. The model is, however, not fully thermodynamically consistent due to its two-liquid mixture approach. In this approach the local concentration around one central molecule is assumed to be independent from the local composition around another type of molecule.

The adjoint state method is a numerical method for efficiently computing the gradient of a function or operator in a numerical optimization problem. It has applications in geophysics, seismic imaging, photonics and more recently in neural networks.

In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation . Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of and to transform into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur algorithm. It remains a standard approach for solving Sylvester equations when is of small to moderate size.

Progressive-iterative approximation method is an iterative method of data fitting with geometric meanings. Given the data points to be fitted, the method obtains a series of fitting curves (surfaces) by iteratively updating the control points, and the limit curve (surface) can interpolate or approximate the given data points. It avoids solving a linear system of equations directly and allows flexibility in adding constraints during the iterative process. Therefore, it has been widely used in geometric design and related fields.

References

  1. 1 2 3 4 5 Simoncini, V. (2016). "Computational Methods for Linear Matrix Equations". SIAM Review. 58 (3): 377–441. doi:10.1137/130912839. hdl: 11585/586011 . ISSN   0036-1445. S2CID   17271167.
  2. 1 2 Li, Jing-Rebecca; White, Jacob (2002). "Low Rank Solution of Lyapunov Equations". SIAM Journal on Matrix Analysis and Applications. 24 (1): 260–280. doi:10.1137/s0895479801384937. ISSN   0895-4798.
  3. 1 2 3 4 Benner, Peter; Li, Ren-Cang; Truhar, Ninoslav (2009). "On the ADI method for Sylvester equations". Journal of Computational and Applied Mathematics. 233 (4): 1035–1045. Bibcode:2009JCoAM.233.1035B. doi: 10.1016/j.cam.2009.08.108 . ISSN   0377-0427.
  4. 1 2 Peaceman, D. W.; Rachford Jr., H. H. (1955), "The numerical solution of parabolic and elliptic differential equations", Journal of the Society for Industrial and Applied Mathematics, 3 (1): 28–41, doi:10.1137/0103003, hdl: 10338.dmlcz/135399 , MR   0071874 .
  5. Wachspress, Eugene L. (2008). "Trail to a Lyapunov equation solver". Computers & Mathematics with Applications. 55 (8): 1653–1659. doi: 10.1016/j.camwa.2007.04.048 . ISSN   0898-1221.
  6. 1 2 3 Lu, An; Wachspress, E.L. (1991). "Solution of Lyapunov equations by alternating direction implicit iteration". Computers & Mathematics with Applications. 21 (9): 43–58. doi:10.1016/0898-1221(91)90124-m. ISSN   0898-1221.
  7. 1 2 3 4 5 Beckermann, Bernhard; Townsend, Alex (2017). "On the Singular Values of Matrices with Displacement Structure". SIAM Journal on Matrix Analysis and Applications. 38 (4): 1227–1248. arXiv: 1609.09494 . doi:10.1137/16m1096426. ISSN   0895-4798. S2CID   3828461.
  8. Golub, G.; Van Loan, C (1989). Matrix computations (Fourth ed.). Baltimore: Johns Hopkins University. ISBN   1421407949. OCLC   824733531.
  9. 1 2 3 Sabino, J (2007). Solution of large-scale Lyapunov equations via the block modified Smith method. PhD Diss., Rice Univ. (Thesis). hdl:1911/20641.
  10. Druskin, V.; Simoncini, V. (2011). "Adaptive rational Krylov subspaces for large-scale dynamical systems". Systems & Control Letters. 60 (8): 546–560. doi:10.1016/j.sysconle.2011.04.013. ISSN   0167-6911.
  11. Saff, E.B.; Totik, V. (2013-11-11). Logarithmic potentials with external fields. Berlin. ISBN   9783662033296. OCLC   883382758.{{cite book}}: CS1 maint: location missing publisher (link)
  12. Gonchar, A.A. (1969). "Zolotarev problems connected with rational functions". Mathematics of the USSR-Sbornik. 7 (4): 623–635. Bibcode:1969SbMat...7..623G. doi:10.1070/SM1969v007n04ABEH001107.
  13. Zolotarev, D.I. (1877). "Application of elliptic functions to questions of functions deviating least and most from zero". Zap. Imp. Akad. Nauk. St. Petersburg. 30: 1–59.
  14. Starke, Gerhard (July 1992). "Near-circularity for the rational Zolotarev problem in the complex plane". Journal of Approximation Theory. 70 (1): 115–130. doi: 10.1016/0021-9045(92)90059-w . ISSN   0021-9045.
  15. Starke, Gerhard (June 1993). "Fejér-Walsh points for rational functions and their use in the ADI iterative method". Journal of Computational and Applied Mathematics. 46 (1–2): 129–141. doi: 10.1016/0377-0427(93)90291-i . ISSN   0377-0427.
  16. 1 2 Penzl, Thilo (January 1999). "A Cyclic Low-Rank Smith Method for Large Sparse Lyapunov Equations". SIAM Journal on Scientific Computing. 21 (4): 1401–1418. Bibcode:1999SJSC...21.1401P. doi:10.1137/s1064827598347666. ISSN   1064-8275.
  17. Smith, R. A. (January 1968). "Matrix Equation XA + BX = C". SIAM Journal on Applied Mathematics. 16 (1): 198–201. doi:10.1137/0116017. ISSN   0036-1399.
  18. Douglas, J. Jr. (1955), "On the numerical integration of uxx+ uyy= ut by implicit methods", Journal of the Society for Industrial and Applied Mathematics, 3: 42–65, MR   0071875 .
  19. Douglas, Jim Jr. (1962), "Alternating direction methods for three space variables", Numerische Mathematik, 4 (1): 41–63, doi:10.1007/BF01386295, ISSN   0029-599X, S2CID   121455963 .
  20. Chang, M. J.; Chow, L. C.; Chang, W. S. (1991), "Improved alternating-direction implicit method for solving transient three-dimensional heat diffusion problems", Numerical Heat Transfer, Part B: Fundamentals, 19 (1): 69–84, Bibcode:1991NHTB...19...69C, doi:10.1080/10407799108944957, ISSN   1040-7790 .
  21. Hundsdorfer, Willem; Verwer, Jan (2003). Numerical Solution of Time-Dependent Advection-Diffusion-Reaction Equations. Berlin, Heidelberg: Springer Berlin Heidelberg. ISBN   978-3-662-09017-6.
  22. Lions, P. L.; Mercier, B. (December 1979). "Splitting Algorithms for the Sum of Two Nonlinear Operators". SIAM Journal on Numerical Analysis. 16 (6): 964–979. Bibcode:1979SJNA...16..964L. doi:10.1137/0716071.
  23. Tan, E. L. (2007). "Efficient Algorithm for the Unconditionally Stable 3-D ADI-FDTD Method" (PDF). IEEE Microwave and Wireless Components Letters. 17 (1): 7–9. doi:10.1109/LMWC.2006.887239. hdl:10356/138245. S2CID   29025478.
  24. 1 2 Tan, E. L. (2008). "Fundamental Schemes for Efficient Unconditionally Stable Implicit Finite-Difference Time-Domain Methods" (PDF). IEEE Transactions on Antennas and Propagation. 56 (1): 170–177. arXiv: 2011.14043 . Bibcode:2008ITAP...56..170T. doi:10.1109/TAP.2007.913089. hdl:10356/138249. S2CID   37135325.
  25. Tan, E. L. (2007). "Unconditionally Stable LOD-FDTD Method for 3-D Maxwell's Equations" (PDF). IEEE Microwave and Wireless Components Letters. 17 (2): 85–87. doi:10.1109/LMWC.2006.890166. hdl:10356/138296. S2CID   22940993.
  26. Gan, T. H.; Tan, E. L. (2013). "Unconditionally Stable Fundamental LOD-FDTD Method with Second-Order Temporal Accuracy and Complying Divergence" (PDF). IEEE Transactions on Antennas and Propagation. 61 (5): 2630–2638. Bibcode:2013ITAP...61.2630G. doi:10.1109/TAP.2013.2242036. S2CID   7578037.
  27. Tay, W. C.; Tan, E. L.; Heh, D. Y. (2014). "Fundamental Locally One-Dimensional Method for 3-D Thermal Simulation". IEICE Transactions on Electronics. E-97-C (7): 636–644. Bibcode:2014IEITE..97..636T. doi:10.1587/transele.E97.C.636. hdl: 10220/20410 .
  28. Heh, D. Y.; Tan, E. L.; Tay, W. C. (2016). "Fast Alternating Direction Implicit Method for Efficient Transient Thermal Simulation of Integrated Circuits". International Journal of Numerical Modelling: Electronic Networks, Devices and Fields. 29 (1): 93–108. doi:10.1002/jnm.2049. hdl: 10356/137201 . S2CID   61039449.