Loop splitting

Last updated October 18, 2021

Loop splitting is a compiler optimization technique. It attempts to simplify a loop or eliminate dependencies by breaking it into multiple loops which have the same bodies but iterate over different contiguous portions of the index range.

Loop peeling

Loop peeling is a special case of loop splitting which splits any problematic first (or last) few iterations from the loop and performs them outside of the loop body.

Suppose a loop was written like this:

intp=10;for(inti=0;i<10;++i){y[i]=x[i]+x[p];p=i;}

Notice that p = 10 only for the first iteration, and for all other iterations, p = i - 1. A compiler can take advantage of this by unwinding (or "peeling") the first iteration from the loop.

After peeling the first iteration, the code would look like this:

y[0]=x[0]+x[10];for(inti=1;i<10;++i){y[i]=x[i]+x[i-1];}

This equivalent form eliminates the need for the variable p inside the loop body.

Loop peeling was introduced in gcc in version 3.4. More generalised loop splitting was added in GCC 7.^[1]

Brief history of the term

Apparently the term was for the first time used by Cannings, Thompson and Skolnick^[2] in their 1976 paper on computational models for (human) inheritance. There the term was used to denote a method for collapsing phenotypic information onto parents. From there the term was used again in their papers, including their seminal paper on probability functions on complex pedigrees.^[3]

In compiler technology, the term first turned up in late 1980s papers on VLIW and superscalar compilation, including ^[4] and.^[5]

Related Research Articles

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

Standard ML (SML) is a general-purpose modular functional programming language with compile-time type checking and type inference. It is popular among compiler writers and programming language researchers, as well as in the development of theorem provers.

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

The syntax of the C programming language is the set of rules governing writing of software in the C language. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of the platform.

In computer science, a loop invariant is a property of a program loop that is true before each iteration. It is a logical assertion, sometimes checked within the code by an assertion call. Knowing its invariant(s) is essential in understanding the effect of a loop.

Loop unswitching is a compiler optimization. It moves a conditional inside a loop outside of it by duplicating the loop's body, and placing a version of it inside each of the if and else clauses of the conditional. This can improve the parallelization of the loop. Since modern processors can operate quickly on vectors, this improvement increases the speed of the program.

Foreach loop is a control flow statement for traversing items in a collection. Foreach is usually used in place of a standard for loop statement. Unlike other for loop constructs, however, foreach loops usually maintain no explicit counter: they essentially say "do this to everything in this set", rather than "do this x times". This avoids potential off-by-one errors and makes code simpler to read. In object-oriented languages an iterator, even if implicit, is often used as the means of traversal.

In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated with all aliased names, which may not be expected by the programmer. As a result, aliasing makes it particularly difficult to understand, analyze and optimize programs. Aliasing analysers intend to make and compute useful information for understanding aliasing in programs.

In computer programming, loop-invariant code consists of statements or expressions which can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion is a compiler optimization which performs this movement automatically.

Cilk, Cilk++ and Cilk Plus are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops and the fork–join idiom.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

In computer programming, a nested function is a function which is defined within another function, the enclosing function. Due to simple recursive scope rules, a nested function is itself invisible outside of its immediately enclosing function, but can see (access) all local objects of its immediately enclosing function as well as of any function(s) which, in turn, encloses that function. The nesting is theoretically possible to unlimited depth, although only a few levels are normally used in practical programs.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

In computer science, loop fission is a compiler optimization in which a loop is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large loop body into smaller ones to achieve better utilization of locality of reference. This optimization is most efficient in multi-core processors that can split a task into multiple tasks for each processor.

Recursion (computer science) Use of functions that call themselves

In computer science, recursion is a method of solving a problem where the solution depends on solutions to smaller instances of the same problem. Such problems can generally be solved by iteration, but this needs to identify and index the smaller instances at programming time. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In computer programming, a pure function is a function that has the following properties:

The function return values are identical for identical arguments.
The function application has no side effects.

Use of the polyhedral model within a compiler requires software to represent the objects of this framework and perform operations upon them.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

C++14 is a version of the ISO/IEC 14882 standard for the C++ programming language. It is intended to be a small extension over C++11, featuring mainly bug fixes and small improvements. Its approval was announced on August 18, 2014. C++14 was published as ISO/IEC 14882:2014 in December 2014.

References

↑ GCC 7 Release Series — Changes, New Features, and Fixes - GNU Project
↑ Cannings, C.; Thompson, E. A.; Skolnick, H. H. (1976). "The recursive derivation of likelihoods on complex pedigrees". Advances in Applied Probability. 8 (4): 622–625. doi:10.2307/1425918. JSTOR 1425918.
↑ Cannings, C.; Thompson, E. A.; Skolnick, H. H. (1978). "Probability functions on complex pedigrees". Advances in Applied Probability. 10 (1): 26–61. doi:10.2307/1426718. JSTOR 1426718.
↑ Callahan, D.; Kennedy, Ken (1988). "Compiling Programs for Distributed-memory Multiprocessors". The Journal of Supercomputing. 2 (2): 151–169. doi:10.1007/BF00128175. S2CID 10214341.
↑ Mahlke, S. A.; Lin, D. C.; Chen, W. Y.; Hank, R. E.; Bringman, R. A. (1992). Effective compiler support for predicated execution using the hyperblock. 25th Annual International Symposium on Microarchitecture. pp. 45–54.

v t e Compiler optimizations
Basic block	Peephole optimization
Loop optimization	Induction variable Strength reduction Loop fusion Loop inversion Loop interchange Loop-invariant code motion Loop nest optimization Loop unrolling Loop splitting Loop unswitching Software pipelining Automatic parallelization
Data-flow analysis	Common subexpression elimination Constant folding Induction variable recognition and elimination Dead store elimination Use-define chain Live variable analysis Available expression
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Register allocation Instruction selection Instruction scheduling Rematerialization
Functional	Tail call elimination Deforestation
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead code elimination Inline expansion Jump threading Profile-guided optimization
Static analysis	Alias analysis Pointer analysis Shape analysis Escape analysis Array access analysis Dependence analysis Control flow analysis Data-flow analysis

Loop splitting

Contents

Loop peeling

Brief history of the term

Related Research Articles

References

Further reading