Code motion

Last updated

In computer science, code motion, also known as code hoisting, code sinking, loop-invariant code motion, or code factoring, is a blanket term for any process that moves code within a program for performance or size benefits, and is a common optimization performed in most optimizing compilers. It can be difficult to differentiate between different types of code motion, due to the inconsistent meaning of the terms surrounding it.

Contents

Uses

Code motion has a variety of uses and benefits, many of which overlap each other in their implementation.

Removing unused/useless operations

A diagram depicting an optimizing compiler removing a potentially useless call to assembly instruction "b" by sinking it to its point of use. Removing useless calls in compiled code with code sinking.jpg
A diagram depicting an optimizing compiler removing a potentially useless call to assembly instruction "b" by sinking it to its point of use.

Code Sinking, also known as lazy code motion, is a term for a technique that reduces wasted instructions by moving instructions to branches in which they are used: [1] If an operation is executed before a branch, and only one of the branch paths use the result of that operation, then code sinking entails moving that operation into the branch where it will be used.

This technique is a form of dead code elimination in the sense that it removes code when its results are discarded or unused, but in contrast to dead code elimination, it can remove pointless instructions even if there is a possible use of that instruction’s results in an execution code path.

Reducing the size of the program

A diagram that demonstrates optimizing for size using code factoring, assuming all operations are not dependent on other operations executing before it. Diagram of code factoring.jpg
A diagram that demonstrates optimizing for size using code factoring, assuming all operations are not dependent on other operations executing before it.

Code Factoring is a term for a size-optimization technique that merges common dependencies in branches into the branch above it. [2] Just like factorizing integers decomposes a number into its smallest possible forms (as factors), code factorization transforms the code into the smallest possible form, by merging common "factors" until no duplicates remain.

Reducing dependency stalls

An example of how a compiler might prevent dependency stalls in assembled code with code movement, by observing a dependency graph. Due to Out-of-order execution advancements, optimization may not have any benefit on modern CPUs. Preventing dependency stalls in assembled code with code movement.jpg
An example of how a compiler might prevent dependency stalls in assembled code with code movement, by observing a dependency graph. Due to Out-of-order execution advancements, optimization may not have any benefit on modern CPUs.

Global code motion, local code motion, code scheduling, Instruction scheduling and code hoisting/sinking are all terms for a technique where instructions are rearranged (or "scheduled") to improve the efficiency of execution within the CPU. [3] [4] Modern CPUs are able to schedule five or more instructions per clock cycle. However, a CPU cannot schedule an instruction that relies on data from a currently (or not yet executed) instruction. Compilers will interleave dependencies in a manner that maximizes the amount of instructions a CPU can process at any point in time. [5]

On the defunct Intel Itanium architecture, the branch predict (BRP) instruction is manually hoisted above branches by the compiler to enable the branch to be immediately taken by the CPU. Itanium relies on additional code scheduling from the CPU to maximize efficiency in the processor. [6]

Loop-invariant code motion

A diagram depicting loop-invariant code motion over an execution graph. This assumes that D is invariant between loop executions. Loop invariant code motion.jpg
A diagram depicting loop-invariant code motion over an execution graph. This assumes that D is invariant between loop executions.

Loop-invariant code motion is the process of moving loop-invariant code to a position outside the loop, which may reduce the execution time of the loop by preventing some computations from being done twice for the same result.

Compiler examples

LLVM

LLVM has a sinking pass in its single static assignment form. LLVM 15.0 will not sink an operation if any of its code paths include a store instruction, or if it may throw an error. [7] Additionally, LLVM will not sink an instruction into a loop.

GCC

The GNU Compiler Collection implements code motion under the name "code factoring", with the purpose of reducing the size of a compiled program. [8] GCC will move any code upwards or downwards if it "[does not] invalidate any existing dependences nor introduce new ones". [9]

LuaJIT

LuaJIT uses code sinking under the name "Allocation sinking", to reduce the amount of time compiled code spends allocating and collecting temporary objects within a loop. [10] Allocation sinking moves allocations to execution paths where the allocated object may escape the executing code, and will thus require heap allocation. All removed allocations are filled in with load-to-store forwarding over their fields. [11]

See also

Related Research Articles

<span class="mw-page-title-main">GNU Compiler Collection</span> Free and open-source compiler for various programming languages

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License. GCC is a key component of the GNU toolchain and the standard compiler for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of free software, as both a tool and an example.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.

In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.

In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.

<span class="mw-page-title-main">Instruction-level parallelism</span> Ability of computer instructions to be executed simultaneously with correct results

Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution.

In computer programming, the word trampoline has a number of meanings, and is generally associated with jump instructions.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.

In computer programming, loop-invariant code consists of statements or expressions that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion is a compiler optimization that performs this movement automatically.

Open64 is a free, open-source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called MIPSPro. It was initially released in 2000 as GNU GPL software under the name Pro64. The following year, University of Delaware adopted the project and renamed the compiler to Open64. It now mostly serves as a research platform for compiler and computer architecture research groups. Open64 supports Fortran 77/95 and C/C++, as well as the shared memory programming model OpenMP. It can conduct high-quality interprocedural analysis, data-flow analysis, data dependence analysis, and array region analysis. Development has ceased, although other projects can use the project's source.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.

In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform.

In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.

Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :

Trace scheduling is an optimization technique developed by Josh Fisher used in compilers for computer programs.

Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.

References

  1. Craft, Michael; Offut, Jefferson (1994). "Using compiler optimization techniques to detect equivalent mutants". Software Testing, Verification & Reliability. 4 (3): 131–154. doi:10.1002/stvr.4370040303. S2CID   35717348 . Retrieved 25 February 2022.
  2. Lóki, Gábor, et al. "Code factoring in GCC." Proceedings of the 2004 GCC Developers’ Summit. 2004.
  3. Fasse, Justus, et al. Code Transformations to Increase Prepass Scheduling Opportunities in CompCert. Diss. Master Thesis of Science. Université Grenoble Alpes. https://www-verimag. imag. fr/~ boulme/CPP_2022/FASSE-Justus-MSc-Thesis_2021. pdf, 2021.
  4. Gupta, Rajiv (1998). "A code motion framework for global instruction scheduling". Compiler Construction. Lecture Notes in Computer Science. 1383: 219–233. doi: 10.1007/BFb0026434 . ISBN   978-3-540-64304-3.
  5. Chang, Pohua P., et al. "The importance of prepass code scheduling for superscalar and superpipelined processors." IEEE Transactions on Computers 44.3 (1995): 353-370.
  6. Sharangpani, H.; Arora, H. (September 2000). "Itanium processor microarchitecture". IEEE Micro. 20 (5): 24–43. doi:10.1109/40.877948. ISSN   1937-4143.
  7. "LLVM: lib/Transforms/Scalar/Sink.cpp Source File". llvm.org. Retrieved 25 February 2022.
  8. "Code Factoring Optimizations - GNU Project". gcc.gnu.org. Retrieved 25 February 2022.
  9. "GCC Developer's Summit 2004 - Code Factoring.pdf" (PDF). gnu.org. Retrieved 25 February 2022.
  10. "Allocation sinking in git HEAD - luajit - FreeLists". www.freelists.org. Retrieved 25 February 2022.
  11. "Allocation Sinking Optimization". wiki.luajit.org.