Frameworks supporting the polyhedral model

Last updated

Use of the polyhedral model (also called the polytope model) within a compiler requires software to represent the objects of this framework (sets of integer-valued points in regions of various spaces) and perform operations upon them (e.g., testing whether the set is empty).

Contents

For more detail about the objects and operations in this model, and an example relating the model to the programs being compiled, see the polyhedral model page.

There are many frameworks supporting the polyhedral model. Some of these frameworks use one or more libraries for performing polyhedral operations. Others, notably Omega, combine everything in a single package. Some commonly used libraries are the Omega Library [1] (and a more recent fork), [2] piplib, [3] [4] PolyLib, [5] [6] PPL, [7] isl, [8] the Cloog polyhedral code generator, [9] [10] and the barvinok library for counting integer solutions. [11] Of these libraries, PolyLib and PPL focus mostly on rational values, while the other libraries focus on integer values. The polyhedral framework of gcc is called Graphite. [12] Polly [13] provides polyhedral optimizations for LLVM, and R-Stream [14] has had a polyhedral mapper since ca. 2006.

Common strengths

Polyhedral frameworks are designed to support compilers techniques for analysis and transformation of codes with nested loops, producing exact results for loop nests with affine loop bounds and subscripts ("Static Control Parts" of programs). They can be used to represent and reason about executions (iterations) of statements, rather than treating a statement as a single object representing properties of all executions of that statement. Polyhedral frameworks typically also allow the use of symbolic expressions.

Polyhedral frameworks can be used for dependence analysis for arrays, including both traditional alias analysis and more advanced techniques such as the analysis of data flow in arrays or identification of conditional dependencies. They can also be used to represent code transformation, and provide features to generate the transformed code in a high-level language. The transformation and generation systems can typically handle imperfectly nested loops.

An example to contrast polyhedral frameworks with prior work

To compare the constraint-based polyhedral model to prior approaches such as individual loop transformations and the unimodular approach, consider the question of whether we can parallelize (execute simultaneously) the iterations of following contrived but simple loop:

for i := 0 to N do       A(i) := (A(i) + A(N-i))/2

Approaches that cannot represent symbolic terms (such as the loop-invariant quantity N in the loop bound and subscript) cannot reason about dependencies in this loop. They will either conservatively refuse to run it in parallel, or in some cases speculatively run it completely in parallel, determine that this was invalid, and re-execute it sequentially.

Approaches that handle symbolic terms but represent dependencies via direction vectors or distance vectors will determine that the i loop carries a dependence (of unknown distance), since for example when N=10 iteration 0 of the loop writes an array element (A(0)) that will be read in iteration 10 (as A(10-10)) and reads an array element (A(10-0)) that will later be overwritten in iteration 10 (as A(10)). If all we know is that the i loop carries a dependence, we once again cannot safely run it in parallel.

In reality, there are only dependencies from the first N/2 iterations into the last N/2, so we can execute this loop as a sequence of two fully parallel loops (from 0...N/2 and from N/2+1...N). The characterization of this dependence, the analysis of parallelism, and the transformation of the code can be done in terms of the instance-wise information provided by any polyhedral framework.

Instance-wise analysis and transformation allows the polyhedral model to unify additional transformations (such as index set splitting, loop peeling, tiling, loop fusion or fission, and transformation of imperfectly nested loops) with those already unified by the unimodular framework (such as loop interchange, skewing, and reversal of perfectly nested loops). It has also stimulated the development of new transformations, such as Pugh and Rosser's iteration-space slicing (an instance-wise version of program slicing; note that the code was never released with the Omega Library).

A more interesting example

Authors of polyhedral frameworks have explored the simple 1-dimensional finite difference heat equation stencil calculation expressed by the following pseudocode:

for t := 0 to T dofor i := 1 to N-1 do         new(i) := (A(i-1) + A(i) + A(i) + A(i+1)) * .25   // explicit forward-difference with R = 0.25     endfor i := 1 to N-1 do         A(i) := new(i)     endend

This code confounds many of the transformation systems of the 20th century, due to the need to optimize an imperfect loop nest. Polyhedral frameworks can analyze the flow of information among different executions of statements in the loop nest, and transform this code to simultaneously exploit scalable parallelism and scalable locality.

A re-cap here, of the two approaches on this example, might be nice, but for now see the individual papers of Wonnacott, [15] [16] and Sadayappan et al. [17] as well as others who have studied this code using different frameworks, such as Song and Li. [18]

Differences in presentation or vocabulary

Comparison of work using different frameworks is complicated by both technical differences (discussed later) and differences in vocabulary and presentation. Examples are provided below to aid in translation:

Classification of dependences

Polyhedral Frameworks support dependence analysis in a variety of ways, helping to capture the impact of symbolic terms, identify conditional dependences, and separating out the effects of memory aliasing. The effects of memory aliasing, in particular, have been described in two ways: many authors distinguish between "true" data dependences (corresponding to actual flow of information) from false dependences arising from memory aliasing or limited precision of dependence analysis.

The Omega Project publications use specific terms to identify specific effects on analysis. They maintain the traditional distinction of flow-, output-, and anti-dependences, based on the types of array access (write to read, write to write, or read to write, respectively). Dependences can independently be classified as memory-based or value-based --- the former corresponds to memory aliasing, and the latter does not include dependences interrupted by intervening writes. A dependence test may produce information that is exact or approximate, depending on the nature of the program being analyzed and the algorithms used in the test. Finally, the results of dependence analysis will be reported in a dependence abstraction that provides a certain degree of precision.

For example, the "dependence relations" produced by the Omega Test, and the "quasts" produced by the algorithms of Feautrier or Maydan and Lam, contain precise information (though in different forms) about the loop iterations involved in a dependence. The results of either test can be converted into the more traditional "dependence vector" form, but since this abstraction provides less precision, much of the information about the dependence will be lost. Both techniques produce exact information for programs with affine control and subscript expressions, and must approximate for many programs outside this domain (i.e., in the presence of non-affine subscripts such as index arrays). The original work of Feautrier focused on describing true dependences, which would be referred to as exact value-based flow dependences by the Omega Project. The Omega Project also described the use of their algorithms for value-based output- and anti-dependences, though Feautrier's quasts could presumably be easily adapted to this as well.

Visualization of transformations and tiling

There are many ways to produce a visual depiction of the process of transforming and tiling an iteration space. Some authors depict transformations by changing the location of points on the page, essentially aligning the picture with the coordinate axes of the transformed space; in such diagrams, tiles appear as axis-aligned rectangles/rectangular solids containing iterations. Examples of this approach can be found in the publications and transformation-visualization software of Michelle Mills Strout. [19]

Other authors depict different transformations as different wavefronts of execution that move across the points of the original coordinate system at different angles. In such diagrams, tiles appear as parallelograms/parallelepipeds. Examples of this approach can be found in the time-skewing publications of David G. Wonnacott. [20]

Differences in approach or implementation status

Some of the libraries have seen more extensive development than the Omega Library in the early 2000s, and in many places has much more sophisticated algorithms. In particular, users have reported good results with the Cloog code generator (both in terms of the code generated, and in terms of ability to control trade-offs when generating code), and with the algorithms for counting integer solutions (Alexander Barvinok's [21] work requires a vertex description of the polytope, which is not supported in the Omega Library).

There are several other points on which the frameworks differ, specifically:

Precision and speed

Integer programming is NP-complete, and Maydan showed that the problem of checking array aliasing in nested loops with affine bounds and subscripts is equivalent to integer programming; other operations, such as array dataflow analysis, are even more complex (the algorithms of the Omega Library handle the full language of Presburger Arithmetic, which is O(2^2^2^n)). Thus, it is clearly unrealistic to expect exact fast results for arbitrary problems of array aliasing or array data flow, even over the affine domain. Fortunately, many problems fall into a subset of this domain where general algorithms can produce an exact answer in polynomial time. [22] [23]

Outside of this domain, the Omega Library, piplib and isl emphasize the production of an exact result (except in the cases of certain uses of uninterpreted function symbols in Omega), despite the high complexity. In some cases, such as variable elimination ("projection"), PolyLib and PPL primarily use algorithms for the rational domain, and thus produce an approximation of the result for integer variables. It may be the case that this reduces the common experience with the Omega Library in which a minor change to one coefficient can cause a dramatic shift in the response of the library's algorithms.

Polylib has some operations to produce exact results for Z-polyhedra (integer points bounded by polyhedra), but at the time of this writing, significant bugs have been reported. [24] Note that bugs also exist in the Omega Library, including reliance on hardware-supplied integer types and cases of the full Presburger Arithmetic algorithms that were not implemented in the library. Users who need exact results for integer variables may need to be wary with either library.

Barvinok's techniques for counting integer solutions require a description of the vertices (and bounding rays) of the polyhedron, but produce an exact answer in a way that can be far more efficient than the techniques described by Pugh. Barvinok's algorithm is always polynomial in the input size, for fixed dimension of the polytope and fixed degree of weights, whereas the "splintering" in Pugh's algorithm can grow with the coefficient values [25] (and thus exponentially in terms of input size, despite fixed dimension, unless there is some limit on coefficient sizes).

Vertex enumeration

Polyhedral libraries such as PolyLib and PPL exploit the double description of polyhedra and therefore naturally support vertex enumeration on (non-parametric) polytopes. The Omega Library internally performs vertex enumeration during the computation of the convex hull. PolyLib and isl provide vertex enumeration on parametric polytopes, which is essential for applying Barvinok's algorithm to parametric polytopes.

Indication of an approximate result

In some parts of a compiler, an approximate result is acceptable in certain cases. For example, when dependence analysis is used to guide loop transformation, it is generally acceptable to use a superset of the true dependencies—this can prevent an optimization but does not allow illegal code transformations. When the Omega Library produces an approximate answer, the answer is appropriately marked as an upper bound (e.g., via "and UNKNOWN") or a lower bound (e.g., via "or UNKNOWN"). Answers not marked this way are exact descriptions of sets of integer-valued points (except in cases of bugs in the software).

Handling nonlinear terms

When code contains a mixture of affine and non-affine terms, polyhedral libraries can, in principle, be used to produce approximate results, for example by simply omitting such terms when it is safe to do so. In addition to providing a way to flag such approximate results, the Omega Library allows restricted uses of "Uninterpreted Function Symbols" to stand in for any nonlinear term, providing a system that slightly improves the result of dependence analysis and (probably more significantly) provides a language for communication about these terms (to drive other analysis or communication with the programmer). Pugh and Wonnacott discussed a slightly less restricted domain than that allowed in the library, but this was never implemented (a description exists in Wonnacott's dissertation).

Transitive closure operation

Some kinds of analysis, such as Pugh and Rosser's iteration space slicing, can be most easily stated in terms of the transitive closure of the dependence information. Both the Omega Library and isl provide a transitive closure operation that is exact for many cases that arise in programs with simple dependence patterns. In other cases, the Omega Library produces a subset of the transitive closure, while isl produces a superset. In the case of the Omega Library, the subset itself may be approximate, resulting in an upper bound (tagged) of a lower bound (not tagged) of the transitive closure. Note that the computation of an exact transitive closure is undecidable. [26]

See also

Related Research Articles

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead reduction of the loop nests. One classical usage is to reduce memory access latency or the cache bandwidth necessary due to cache reuse for some common linear algebra algorithms.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

In compiler theory, loop interchange is the process of exchanging the order of two iteration variables used by a nested loop. The variable used in the inner loop switches to the outer loop, and vice versa. It is often done to ensure that the elements of a multi-dimensional array are accessed in the order in which they are present in memory, improving locality of reference.

In computer science, loop dependence analysis is a process which can be used to find dependencies within iterations of a loop with the goal of determining different relationships between statements. These dependent relationships are tied to the order in which different statements access memory locations. Using the analysis of these relationships, execution of the loop can be organized to allow multiple processors to work on different portions of the loop in parallel. This is known as parallel processing. In general, loops can consume a lot of processing time when executed as serial code. Through parallel processing, it is possible to reduce the total execution time of a program through sharing the processing load among multiple processors.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :

<span class="mw-page-title-main">Convex cone</span> Mathematical set closed under positive linear combinations

In linear algebra, a cone—sometimes called a linear cone for distinguishing it from other sorts of cones—is a subset of a vector space that is closed under positive scalar multiplication; that is, C is a cone if implies for every positive scalar s. A cone need not be convex, or even look like a cone in Euclidean space.

The polyhedral model is a mathematical framework for programs that perform large numbers of operations -- too large to be explicitly enumerated -- thereby requiring a compact representation. Nested loop programs are the typical, but not the only example, and the most common use of the model is for loop nest optimization in program optimization. The polyhedral method treats each loop iteration within nested loops as lattice points inside mathematical objects called polyhedra, performs affine transformations or more general non-affine transformations such as tiling on the polytopes, and then converts the transformed polytopes into equivalent, but optimized, loop nests through polyhedra scanning.

A manifest expression is a programming language construct that a compiler can analyse to deduce which values it can take without having to execute the program. This information can enable compiler optimizations, in particular loop nest optimization, and parallelization through data dependency analysis. An expression is called manifest if it is computed only from outer loop counters and constants.

In compiler theory, a greatest common divisor test is the test used in study of loop optimization and loop dependence analysis to test the dependency between loop statements.

Loop-level parallelism is a form of parallelism in software programming that is concerned with extracting parallel tasks from loops. The opportunity for loop-level parallelism often arises in computing programs where data is stored in random access data structures. Where a sequential program will iterate over the data structure and operate on indices one at a time, a program exploiting loop-level parallelism will use multiple threads or processes which operate on some or all of the indices at the same time. Such parallelism provides a speedup to overall execution time of the program, typically in line with Amdahl's law.

Software is said to exhibit scalable parallelism if it can make use of additional processors to solve larger problems, i.e. this term refers to software for which Gustafson's law holds. Consider a program whose execution time is dominated by one or more loops, each of that updates every element of an array --- for example, the following finite difference heat equation stencil calculation:

for t := 0 to T dofor i := 1 to N-1 do  new(i) := * .25  // explicit forward-difference with R = 0.25  endfor i := 1 to N-1 do  A(i) := new(i)  endend

Computer software is said to exhibit scalable locality if it can continue to make use of processors that out-pace their memory systems, to solve ever larger problems. This term is a high-performance uniprocessor analog of the use of scalable parallelism to refer to software for which increasing numbers of processors can be employed for larger problems.

<span class="mw-page-title-main">Polymake</span>

polymake is software for the algorithmic treatment of convex polyhedra.

<span class="mw-page-title-main">Integer points in convex polyhedra</span>

The study of integer points in convex polyhedra is motivated by questions such as "how many nonnegative integer-valued solutions does a system of linear equations with nonnegative coefficients have" or "how many solutions does an integer linear program have". Counting integer points in polyhedra or other questions about them arise in representation theory, commutative algebra, algebraic geometry, statistics, and computer science.

<span class="mw-page-title-main">Integral polytope</span> A convex polytope whose vertices all have integer Cartesian coordinates

In geometry and polyhedral combinatorics, an integral polytope is a convex polytope whose vertices all have integer Cartesian coordinates. That is, it is a polytope that equals the convex hull of its integer points. Integral polytopes are also called lattice polytopes or Z-polytopes. The special cases of two- and three-dimensional integral polytopes may be called polygons or polyhedra instead of polytopes, respectively.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

For several years parallel hardware was only available for distributed computing but recently it is becoming available for the low end computers as well. Hence it has become inevitable for software programmers to start writing parallel applications. It is quite natural for programmers to think sequentially and hence they are less acquainted with writing multi-threaded or parallel processing applications. Parallel programming requires handling various issues such as synchronization and deadlock avoidance. Programmers require added expertise for writing such applications apart from their expertise in the application domain. Hence programmers prefer to write sequential code and most of the popular programming languages support it. This allows them to concentrate more on the application. Therefore, there is a need to convert such sequential applications to parallel applications with the help of automated tools. The need is also non-trivial because large amount of legacy code written over the past few decades needs to be reused and parallelized.

isl is a portable C library for manipulating sets and relations of integer points bounded by linear constraints.

References

  1. "Frameworks and Algorithms for the Analysis and Transformation of Scientific Programs". Cs.umd.edu. Retrieved 2012-08-20.
  2. "Tools". Compiler Technology to Optimize Performance. University of Utah . Retrieved 2012-08-20.
  3. Cédric Bastoul. "www.PipLib.org the Parametric Integer Programming home". www.piplib.org. Retrieved 2014-06-04.
  4. Paul Feautrier. Parametric Integer Programming. 1988
  5. "Polylib". Icps.u-strasbg.fr. Retrieved 2012-08-20.
  6. Wilde, Doran K. (1993). "A Library for Doing Polyhedral Operations". tech report. Ftp.irisia.fr.
  7. "PPL". Bugseng. Retrieved 2012-08-20.
  8. "isl – Freecode". Freshmeat.net. Retrieved 2012-08-20.
  9. Cédric Bastoul. "www.CLooG.org the Chunky Loop Generator home". www.cloog.org. Retrieved 2014-06-04.
  10. Cedric Bastoul. Code Generation in the Polyhedral Model Is Easier Than You Think. PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques (2004)
  11. gvy (2007-04-28). "barvinok – Freecode". Freshmeat.net. Retrieved 2012-08-20.
  12. Sebastian Pop, Albert Cohen, Cedric Bastoul, Sylvain Girbal, Pierre Jouvelot, Georges-André Silber et Nicolas Vasilache. Graphite: Loop optimizations based on the polyhedral model for GCC. 4th GCC Developer's Summit. Ottawa, Canada, June 2006.
  13. "Polly - Polyhedral optimizations for LLVM". Polly.llvm.org. Retrieved 2014-06-04.
  14. Benoit Meister, Nicolas Vasilache, David Wohlford, Muthu Baskaran, Allen Leung and Richard Lethin. R-Stream Compiler. In Encyclopedia of Parallel Computing, David Padua Ed., pp 1756-1765, Springer, 2011.
  15. David Wonnacott. Achieving Scalable Locality with Time Skewing. International Journal of Parallel Programming 30.3 (2002)
  16. Wonnacott, D. (2000). "Using time skewing to eliminate idle time due to memory bandwidth and network limitations". Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000. pp. 171–180. doi:10.1109/IPDPS.2000.845979. ISBN   0-7695-0574-0. S2CID   9949169.
  17. Uday Bondhugula, Muthu Manikandan Baskaran, Sriram Krishnamoorthy, J. Ramanujam, Atanas Rountev, P. Sadayappan. Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model. CC 2008 - International Conference on Compiler Construction
  18. Yonghong Song, Zhiyuan Li. New Tiling Techniques to Improve Cache Temporal Locality. Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI)
  19. "Michelle Mills Strout". Cs.colostate.edu. Retrieved 2012-08-20.
  20. "David G. Wonnacott". Cs.haverford.edu. Retrieved 2012-08-20.
  21. "Alexander Barvinok". Math.lsa.umich.edu. 2012-06-16. Retrieved 2012-08-20.
  22. Pugh, William (1991). "The Omega test: A fast and practical integer programming algorithm for dependence analysis". Proceedings of the 1991 ACM/IEEE conference on Supercomputing - Supercomputing '91. pp. 4–13. doi:10.1145/125826.125848. ISBN   0897914597. S2CID   3174094.
  23. Seater, Robert; Wonnacott, David (2003). "Polynomial Time Array Dataflow Analysis". Languages and Compilers for Parallel Computing. Lecture Notes in Computer Science. Vol. 2624. pp. 411–426. doi:10.1007/3-540-35767-X_27. ISBN   978-3-540-04029-3.
  24. "Frameworks supporting the polyhedral model". lipforge.ens-lyon.fr.[ permanent dead link ]
  25. Verdoolaege, Sven; Seghir, Rachid; Beyls, Kristof; Loechner, Vincent; Bruynooghe, Maurice. "Counting Integer Points in Parametric Polytopes using Barvinok's Rational Functions]. Section 6.1 discusses Pugh's method and splintering" (PDF). Lirias.kuleuven.be.
  26. Wayne Kelly, William Pugh, Evan Rosser, Tatiana Shpeisman. Transitive Closure of Infinite Graphs and Its Applications. Languages and Compilers for Parallel Computing, 8th International Workshop (LCPC 1995)
  27. Jean-Francois Collard, Reasoning About Program Transformations,, 2003 Springer-Verlag
  28. Bastoul, Cedric. Improving Data Locality in Static Control Programs (PDF). icps.u-strasbg.fr (Thesis).
  29. "Encyclopedia of Parallel Computing". Springer.com. Retrieved 2012-08-20.
  30. Wonnacott, David G. "A Retrospective of the Omega Project" (PDF). Haverford Computer Science Tech Report 2010-01. Haverford College.
  31. "Reservoir Labs, Inc". Reservoir.com. Retrieved 2014-06-04.