Normalized loop

Last updated

In computer science, a normalized loop (sometimes called well-behaved loop), is a loop in which the loop variable starts at 0 (or any constant) and gets incremented by one at every iteration until the exit condition is met. Normalized loops are very important for compiler theory, loop dependence analysis as they simplify the data dependence analysis.[ citation needed ] [1]

Contents

Well-behaved loops

A well behaved loop is normally of the form:

for(i=0;i<MAX;i++)a[i]=b[i]+5;

Because the increment is unitary and constant, it's very easy to see that, if both a and b are bigger than MAX, this loop will never access memory outside the allocated range.

Non-normalized loops

A non-normalized loop may begin at different indexes, increment by not-unitary amounts and have exit conditions complicated to define. Such loops are hard to optimize, vectorize and even traverse, especially if functions are executed on any part of the loop conditions.

A simple example, where it doesn't start at the beginning and increments by more than one:

// Example 1for(i=7;i<MAX;i+=3)a[i]=b[i]+5;

A more complicated example, with an additional exit condition:

// Example 2for(i=7;i<MAX||i>MIN;i+=3)a[i]=b[i]+5;

Loops can also have non-predictable behavior during compilation time, where the exit condition depends on the contents of the data being modified:

// Example 3for(i=7;i<MAX&&a[i];i+=3)a[i]=b[i]+5;

Or even dynamic calculations by means of function calls:

// Example 4for(i=start();i<max();i+=increment())a[i]=b[i]+5;

Reverse loops are also very simple, and can be easily normalized:

// Example 5for(i=MAX;i>0;i--)a[i]=b[i]+5;

Converting to a normalized loop

If the non-normalized doesn't have dynamic behaviour, it's normally very easy to transform it to a normalized one. For instance, the first example (Example 1) above can easily be converted to:

// Example 1 -> normalizedfor(i=0;i<(MAX-7)/3;i++)a[i*3+7]=b[i*3+7]+5;

While the third example can be partially normalized to allow some parallelization, but still lack the ability to know the loop span (how many iterations there will be), making it harder to vectorize by using multi-media hardware.

Starting at 7 is not so much of a problem, as long as the increment is regular, preferably one. When multiple statements inside the loop use the index, some private temporary variables may be created to cope with the different iteration paces.

The reverse loop (Example 5) is also easy to normalize:

// Example 5 -> normalizedfor(i=0;i<MAX;i++)a[MAX-i]=b[MAX-i]+5;

Note that the access is still backwards. In this case, it makes no sense to leave it backwards (as there is no data dependence), but where dependences exist, caution must be taken to revert the access as well, as it could disrupt the order of assignments.

Impossible conversions

The Example 4 above makes it impossible to predict anything from that loop. Unless the functions themselves are trivial (constant), there is no way to know where the loop will start, stop and how much it'll increment each iteration. Those loops are not only hard to parallelize, but they also perform horribly.

Each iteration, the loop will evaluate two functions (max() and increment()). Even if the functions are inlined, the condition becomes too complex to be worth optimizing. The programmer should take extra care not to create those loops unless strictly necessary (if ever).

Another danger of such loops appear if the evaluation depends on the data being modified. For instance, a normal error when using iterators is to remove items from a list while modifying it, or relying on sizes (for exit condition) that are not true any more.

See also

Related Research Articles

In computer science, an array is a data structure consisting of a collection of elements, of same memory size, each identified by at least one array index or key. An array is stored such that the position of each element can be computed from its index tuple by a mathematical formula. The simplest type of data structure is a linear array, also called one-dimensional array.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computer programming, an infinite loop is a sequence of instructions that, as written, will continue endlessly, unless an external intervention occurs. It may be intentional.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

<span class="mw-page-title-main">Lua (programming language)</span> Lightweight programming language

Lua is a lightweight, high-level, multi-paradigm programming language designed primarily for embedded use in applications. Lua is cross-platform, since the interpreter of compiled bytecode is written in ANSI C, and Lua has a relatively simple C API to embed it into applications.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SWAR Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

<span class="mw-page-title-main">For loop</span> Control flow statement for repeated execution

In computer science a for-loop or for loop is a control flow statement for specifying iteration. Specifically, a for loop functions by running a section of code repeatedly until a certain condition has been satisfied.

In computer science, a loop invariant is a property of a program loop that is true before each iteration. It is a logical assertion, sometimes checked with a code assertion. Knowing its invariant(s) is essential in understanding the effect of a loop.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2. Broadly, there are two classes of dependencies--control dependencies and data dependencies.

In computer science, loop dependence analysis is a process which can be used to find dependencies within iterations of a loop with the goal of determining different relationships between statements. These dependent relationships are tied to the order in which different statements access memory locations. Using the analysis of these relationships, execution of the loop can be organized to allow multiple processors to work on different portions of the loop in parallel. This is known as parallel processing. In general, loops can consume a lot of processing time when executed as serial code. Through parallel processing, it is possible to reduce the total execution time of a program through sharing the processing load among multiple processors.

Automatic parallelization, also auto parallelization, or autoparallelization refers to converting sequential code into multi-threaded and/or vectorized code in order to use multiple processors simultaneously in a shared-memory multiprocessor (SMP) machine. Fully automatic parallelization of sequential programs is a challenge because it requires complex program analysis and the best approach may depend upon parameter values that are not known at compilation time.

Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :

Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible.

In compiler theory, a greatest common divisor test is the test used in study of loop optimization and loop dependence analysis to test the dependency between loop statements.

Loop-level parallelism is a form of parallelism in software programming that is concerned with extracting parallel tasks from loops. The opportunity for loop-level parallelism often arises in computing programs where data is stored in random access data structures. Where a sequential program will iterate over the data structure and operate on indices one at a time, a program exploiting loop-level parallelism will use multiple threads or processes which operate on some or all of the indices at the same time. Such parallelism provides a speedup to overall execution time of the program, typically in line with Amdahl's law.

Use of the polyhedral model within a compiler requires software to represent the objects of this framework and perform operations upon them.

References

  1. "Normalized hysteresis loops".