Iterative Stencil Loops

Last updated
The shape of a 7-point 3D von Neumann style stencil. 3D von Neumann Stencil Model.svg
The shape of a 7-point 3D von Neumann style stencil.

Iterative Stencil Loops (ISLs) or Stencil computations are a class of numerical data processing solution [1] which update array elements according to some fixed pattern, called a stencil. [2] They are most commonly found in computer simulations, e.g. for computational fluid dynamics in the context of scientific and engineering applications. Other notable examples include solving partial differential equations, [1] the Jacobi kernel, the Gauss–Seidel method, [2] image processing [1] and cellular automata. [3] The regular structure of the arrays sets stencil techniques apart from other modeling methods such as the Finite element method. Most finite difference codes which operate on regular grids can be formulated as ISLs.

Contents

Definition

ISLs perform a sequence of sweeps (called timesteps) through a given array. [2] Generally this is a 2- or 3-dimensional regular grid. [3] The elements of the arrays are often referred to as cells. In each timestep, all array elements are updated. [2] Using neighboring array elements in a fixed pattern (the stencil), each cell's new value is computed. In most cases boundary values are left unchanged, but in some cases (e.g. LBM codes) those need to be adjusted during the computation as well. Since the stencil is the same for each element, the pattern of data accesses is repeated. [4]

More formally, we may define ISLs as a 5-tuple with the following meaning: [3]

Since I is a k-dimensional integer interval, the array will always have the topology of a finite regular grid. The array is also called simulation space and individual cells are identified by their index . The stencil is an ordered set of relative coordinates. We can now obtain for each cell the tuple of its neighbors indices

Their states are given by mapping the tuple to the corresponding tuple of states , where is defined as follows:

This is all we need to define the system's state for the following time steps with :

Note that is defined on and not just on since the boundary conditions need to be set, too. Sometimes the elements of may be defined by a vector addition modulo the simulation space's dimension to realize toroidal topologies:

This may be useful for implementing periodic boundary conditions, which simplifies certain physical models.

Example: 2D Jacobi iteration

Data dependencies of a selected cell in the 2D array. 2D von Neumann Stencil.svg
Data dependencies of a selected cell in the 2D array.

To illustrate the formal definition, we'll have a look at how a two dimensional Jacobi iteration can be defined. The update function computes the arithmetic mean of a cell's four neighbors. In this case we set off with an initial solution of 0. The left and right boundary are fixed at 1, while the upper and lower boundaries are set to 0. After a sufficient number of iterations, the system converges against a saddle-shape.

2D Jacobi t 0000.png
2D Jacobi t 0200.png
2D Jacobi t 0400.png
2D Jacobi t 0600.png
2D Jacobi t 0800.png
2D Jacobi t 1000.png
2D Jacobi Iteration on a Array

Stencils

The shape of the neighborhood used during the updates depends on the application itself. The most common stencils are the 2D or 3D versions of the von Neumann neighborhood and Moore neighborhood. The example above uses a 2D von Neumann stencil while LBM codes generally use its 3D variant. Conway's Game of Life uses the 2D Moore neighborhood. That said, other stencils such as a 25-point stencil for seismic wave propagation [5] can be found, too.

Moore neighborhood (stencil diagram).gif
9-point 2D stencil
Von Neumann neighborhood.svg
5-point 2D stencil
3D von Neumann Stencil Model.svg
7-point 3D stencil
3D Earth Sciences Stencil Model.svg
25-point 3D stencil
A selection of stencils used in various scientific applications.

Implementation issues

Many simulation codes may be formulated naturally as ISLs. Since computing time and memory consumption grow linearly with the number of array elements, parallel implementations of ISLs are of paramount importance to research. [6] This is challenging since the computations are tightly coupled (because of the cell updates depending on neighboring cells) and most ISLs are memory bound (i.e. the ratio of memory accesses to calculations is high). [7] Virtually all current parallel architectures have been explored for executing ISLs efficiently; [8] at the moment GPGPUs have proven to be most efficient. [9]

Libraries

Due to both the importance of ISLs to computer simulations and their high computational requirements, there are a number of efforts which aim at creating reusable libraries to support scientists in performing stencil-based computations. The libraries are mostly concerned with the parallelization, but may also tackle other challenges, such as IO, steering and checkpointing. They may be classified by their API.

Patch-based libraries

This is a traditional design. The library manages a set of n-dimensional scalar arrays, which the user program may access to perform updates. The library handles the synchronization of the boundaries (dubbed ghost zone or halo). The advantage of this interface is that the user program may loop over the arrays, which makes it easy to integrate legacy code [10] . The disadvantage is that the library can not handle cache blocking (as this has to be done within the loops [11] ) or wrapping of the API-calls for accelerators (e.g. via CUDA or OpenCL). Implementations include Cactus, a physics problem solving environment, and waLBerla.

Cell-based libraries

These libraries move the interface to updating single simulation cells: only the current cell and its neighbors are exposed, e.g. via getter/setter methods. The advantage of this approach is that the library can control tightly which cells are updated in which order, which is useful not only to implement cache blocking, [9] but also to run the same code on multi-cores and GPUs. [12] This approach requires the user to recompile the source code together with the library. Otherwise a function call for every cell update would be required, which would seriously impair performance. This is only feasible with techniques such as class templates or metaprogramming, which is also the reason why this design is only found in newer libraries. Examples are Physis and LibGeoDecomp.

See also

Related Research Articles

<span class="mw-page-title-main">Abelian group</span> Commutative group (mathematics)

In mathematics, an abelian group, also called a commutative group, is a group in which the result of applying the group operation to two group elements does not depend on the order in which they are written. That is, the group operation is commutative. With addition as an operation, the integers and the real numbers form abelian groups, and the concept of an abelian group may be viewed as a generalization of these examples. Abelian groups are named after early 19th century mathematician Niels Henrik Abel.

In mathematics, a set is countable if either it is finite or it can be made in one to one correspondence with the set of natural numbers. Equivalently, a set is countable if there exists an injective function from it into the natural numbers; this means that each element in the set may be associated to a unique natural number, or that the elements of the set can be counted one at a time, although the counting may never finish due to an infinite number of elements.

<span class="mw-page-title-main">Lie algebra</span> Algebraic structure used in analysis

In mathematics, a Lie algebra is a vector space together with an operation called the Lie bracket, an alternating bilinear map , that satisfies the Jacobi identity. In other words, a Lie algebra is an algebra over a field for which the multiplication operation is alternating and satisfies the Jacobi identity. The Lie bracket of two vectors and is denoted . A Lie algebra is typically a non-associative algebra. However, every associative algebra gives rise to a Lie algebra, consisting of the same vector space with the commutator Lie bracket, .

Presburger arithmetic is the first-order theory of the natural numbers with addition, named in honor of Mojżesz Presburger, who introduced it in 1929. The signature of Presburger arithmetic contains only the addition operation and equality, omitting the multiplication operation entirely. The theory is computably axiomatizable; the axioms include a schema of induction.

In mathematics, homology is a general way of associating a sequence of algebraic objects, such as abelian groups or modules, with other mathematical objects such as topological spaces. Homology groups were originally defined in algebraic topology. Similar constructions are available in a wide variety of other contexts, such as abstract algebra, groups, Lie algebras, Galois theory, and algebraic geometry.

In mathematics, Hilbert's Nullstellensatz is a theorem that establishes a fundamental relationship between geometry and algebra. This relationship is the basis of algebraic geometry. It relates algebraic sets to ideals in polynomial rings over algebraically closed fields. This relationship was discovered by David Hilbert, who proved the Nullstellensatz in his second major paper on invariant theory in 1893.

In mathematics, K-theory is, roughly speaking, the study of a ring generated by vector bundles over a topological space or scheme. In algebraic topology, it is a cohomology theory known as topological K-theory. In algebra and algebraic geometry, it is referred to as algebraic K-theory. It is also a fundamental tool in the field of operator algebras. It can be seen as the study of certain kinds of invariants of large matrices.

A CW complex is a kind of a topological space that is particularly important in algebraic topology. It was introduced by J. H. C. Whitehead to meet the needs of homotopy theory. This class of spaces is broader and has some better categorical properties than simplicial complexes, but still retains a combinatorial nature that allows for computation. The C stands for "closure-finite", and the W for "weak" topology.

In mathematics, especially in the field of algebra, a polynomial ring or polynomial algebra is a ring formed from the set of polynomials in one or more indeterminates with coefficients in another ring, often a field.

<span class="mw-page-title-main">Spline (mathematics)</span> Mathematical function defined piecewise by polynomials

In mathematics, a spline is a function defined piecewise by polynomials. In interpolating problems, spline interpolation is often preferred to polynomial interpolation because it yields similar results, even when using low degree polynomials, while avoiding Runge's phenomenon for higher degrees.

<span class="mw-page-title-main">Lattice (group)</span> Periodic set of points

In geometry and group theory, a lattice in the real coordinate space is an infinite set of points in this space with the properties that coordinate-wise addition or subtraction of two points in the lattice produces another lattice point, that the lattice points are all separated by some minimum distance, and that every point in the space is within some maximum distance of a lattice point. Closure under addition and subtraction means that a lattice must be a subgroup of the additive group of the points in the space, and the requirements of minimum and maximum distance can be summarized by saying that a lattice is a Delone set. More abstractly, a lattice can be described as a free abelian group of dimension which spans the vector space . For any basis of , the subgroup of all linear combinations with integer coefficients of the basis vectors forms a lattice, and every lattice can be formed from a basis in this way. A lattice may be viewed as a regular tiling of a space by a primitive cell.

The density matrix renormalization group (DMRG) is a numerical variational technique devised to obtain the low-energy physics of quantum many-body systems with high accuracy. As a variational method, DMRG is an efficient algorithm that attempts to find the lowest-energy matrix product state wavefunction of a Hamiltonian. It was invented in 1992 by Steven R. White and it is nowadays the most efficient method for 1-dimensional systems.

In mathematics, differential algebra is, broadly speaking, the area of mathematics consisting in the study of differential equations and differential operators as algebraic objects in view of deriving properties of differential equations and operators without computing the solutions, similarly as polynomial algebras are used for the study of algebraic varieties, which are solution sets of systems of polynomial equations. Weyl algebras and Lie algebras may be considered as belonging to differential algebra.

In mathematics and computer algebra, factorization of polynomials or polynomial factorization expresses a polynomial with coefficients in a given field or in the integers as the product of irreducible factors with coefficients in the same domain. Polynomial factorization is one of the fundamental components of computer algebra systems.

In the mathematical discipline of matrix theory, a Jordan matrix, named after Camille Jordan, is a block diagonal matrix over a ring R, where each block along the diagonal, called a Jordan block, has the following form:

In computer science, the prefix sum, cumulative sum, inclusive scan, or simply scan of a sequence of numbers x0, x1, x2, ... is a second sequence of numbers y0, y1, y2, ..., the sums of prefixes of the input sequence:

In mathematics, the concept of graph dynamical systems can be used to capture a wide range of processes taking place on graphs or networks. A major theme in the mathematical and computational analysis of GDSs is to relate their structural properties and the global dynamics that result.

Quantum block codes are useful in quantum computing and in quantum communications. The encoding circuit for a large block code typically has a high complexity although those for modern codes do have lower complexity.

<span class="mw-page-title-main">Lie point symmetry</span>

Lie point symmetry is a concept in advanced mathematics. Towards the end of the nineteenth century, Sophus Lie introduced the notion of Lie group in order to study the solutions of ordinary differential equations (ODEs). He showed the following main property: the order of an ordinary differential equation can be reduced by one if it is invariant under one-parameter Lie group of point transformations. This observation unified and extended the available integration techniques. Lie devoted the remainder of his mathematical career to developing these continuous groups that have now an impact on many areas of mathematically based sciences. The applications of Lie groups to differential systems were mainly established by Lie and Emmy Noether, and then advocated by Élie Cartan.

Quantum optimization algorithms are quantum algorithms that are used to solve optimization problems. Mathematical optimization deals with finding the best solution to a problem from a set of possible solutions. Mostly, the optimization problem is formulated as a minimization problem, where one tries to minimize an error which depends on the solution: the optimal solution has the minimal error. Different optimization techniques are applied in various fields such as mechanics, economics and engineering, and as the complexity and amount of data involved rise, more efficient ways of solving optimization problems are needed. Quantum computing may allow problems which are not practically feasible on classical computers to be solved, or suggest a considerable speed up with respect to the best known classical algorithm.

References

  1. 1 2 3 Roth, Gerald et al. (1997) Proceedings of SC'97: High Performance Networking and Computing. Compiling Stencils in High Performance Fortran.
  2. 1 2 3 4 Sloot, Peter M.A. et al. (May 28, 2002) Computational Science – ICCS 2002: International Conference, Amsterdam, the Netherlands, April 21–24, 2002. Proceedings, Part I. Page 843. Publisher: Springer. ISBN   3-540-43591-3.
  3. 1 2 3 Fey, Dietmar et al. (2010) Grid-Computing: Eine Basistechnologie für Computational Science . Page 439. Publisher: Springer. ISBN   3-540-79746-7
  4. Yang, Laurence T.; Guo, Minyi. (August 12, 2005) High-Performance Computing : Paradigm and Infrastructure. Page 221. Publisher: Wiley-Interscience. ISBN   0-471-65471-X
  5. Micikevicius, Paulius et al. (2009) 3D finite difference computation on GPUs using CUDA Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units ISBN   978-1-60558-517-8
  6. Datta, Kaushik (2009) Auto-tuning Stencil Codes for Cache-Based Multicore Platforms Archived 2012-10-08 at the Wayback Machine , Ph.D. Thesis
  7. Wellein, G et al. (2009) Efficient temporal blocking for stencil computations by multicore-aware wavefront parallelization , 33rd Annual IEEE International Computer Software and Applications Conference, COMPSAC 2009
  8. Datta, Kaushik et al. (2008) Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures, SC '08 Proceedings of the 2008 ACM/IEEE conference on Supercomputing
  9. 1 2 Schäfer, Andreas and Fey, Dietmar (2011) High Performance Stencil Code Algorithms for GPGPUs , Proceedings of the International Conference on Computational Science, ICCS 2011
  10. S. Donath, J. Götz, C. Feichtinger, K. Iglberger and U. Rüde (2010) waLBerla: Optimization for Itanium-based Systems with Thousands of Processors , High Performance Computing in Science and Engineering, Garching/Munich 2009
  11. Nguyen, Anthony et al. (2010) 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , SC '10 Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  12. Naoya Maruyama, Tatsuo Nomura, Kento Sato, and Satoshi Matsuoka (2011) Physis: An Implicitly Parallel Programming Model for Stencil Computations on Large-Scale GPU-Accelerated Supercomputers, SC '11 Proceedings of the 2011 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis