Rematerialization

Last updated

In computer science, rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative to spilling registers to memory. It was conceived by Gregory Chaitin, Marc Auslander, Ashok Chandra, John Cocke, Martin Hopkins and Peter Markstein and implemented in the Pl.8 compiler for the 801 Minicomputer in the late 1970s. Later improvements were made by Preston Briggs, Keith D. Cooper, and Linda Torczon in 1992.

Traditional optimizations such as common subexpression elimination and loop invariant hoisting often focus on eliminating redundant computation. Since computation requires CPU cycles, this is usually a good thing, but it has the potentially devastating side effect that it can increase the live ranges of variables and create many new variables, resulting in spills during register allocation. Rematerialization is nearly the opposite: it decreases register pressure by increasing the amount of CPU computation. To avoid adding more computation time than necessary, rematerialization is done only when the compiler can be confident that it will be of benefit that is, when a register spill to memory would otherwise occur.

Rematerialization works by keeping track of the expression used to compute each variable, using the concept of available expressions. Sometimes the variables used to compute a value are modified, and so can no longer be used to rematerialize that value. The expression is then said to no longer be available. Other criteria must also be fulfilled, for example a maximum complexity on the expression used to rematerialize the value; it would do no good to rematerialize a value using a complex computation that takes more time than a load. Usually the expression must also have no side effects.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

C is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of the targeted CPUs. It has found lasting use in operating systems, device drivers, protocol stacks, though decreasingly for application software. C is commonly used on computer architectures that range from the largest supercomputers to the smallest microcontrollers and embedded systems.

PL/I is a procedural, imperative computer programming language developed and published by IBM. It is designed for scientific, engineering, business and system programming. It has been used by academic, commercial and industrial organizations since it was introduced in the 1960s, and is still used.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation.

The 801 was an experimental central processing unit (CPU) design developed by IBM during the 1970s. It is considered to be the first modern RISC design, relying on processor registers for all computations and eliminating the many variant addressing modes found in CISC designs. Originally developed as the processor for a telephone switch, it was later used as the basis for a minicomputer and a number of products for their mainframe line. The initial design was a 24-bit processor; that was soon replaced by 32-bit implementations of the same concepts and the original 24-bit 801 was used only into the early 1980s.

In computing, just-in-time (JIT) compilation is a way of executing computer code that involves compilation during execution of a program rather than before execution. This may consist of source code translation but is more commonly bytecode translation to machine code, which is then executed directly. A system implementing a JIT compiler typically continuously analyses the code being executed and identifies parts of the code where the speedup gained from compilation or recompilation would outweigh the overhead of compiling that code.

SSE2 is one of the Intel SIMD processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create SSE3 in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker AMD added support for SSE2 with the introduction of their Opteron and Athlon 64 ranges of AMD64 64-bit CPUs in 2003.

In compiler design, static single assignment form is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

In compiler theory, dead-code elimination is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions, and analyzes whether it is worthwhile replacing them with a single variable holding the computed value.

In computing, a memory barrier, also known as a membar, memory fence or fence instruction, is a type of barrier instruction that causes a central processing unit (CPU) or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction. This typically means that operations issued prior to the barrier are guaranteed to be performed before operations issued after the barrier.

In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

Chaitin's algorithm is a bottom-up, graph coloring register allocation algorithm that uses cost/degree as its spill metric. It is named after its designer, Gregory Chaitin. Chaitin's algorithm was the first register allocation algorithm that made use of coloring of the interference graph for both register allocations and spilling.

Alias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location.

Memory ordering describes the order of accesses to computer memory by a CPU. The term can refer either to the memory ordering generated by the compiler during compile time, or to the memory ordering generated by a CPU during runtime.

Value numbering is a technique of determining when two computations in a program are equivalent and eliminating one of them with a semantics-preserving optimization.