In computer science, code motion, also known as code hoisting, code sinking, loop-invariant code motion, or code factoring, is a blanket term for any process that moves code within a program for performance or size benefits, and is a common optimization performed in most optimizing compilers. It can be difficult to differentiate between different types of code motion, due to the inconsistent meaning of the terms surrounding it.
Code motion has a variety of uses and benefits, many of which overlap each other in their implementation.
Code Sinking, also known as lazy code motion, is a term for a technique that reduces wasted instructions by moving instructions to branches in which they are used: [1] If an operation is executed before a branch, and only one of the branch paths use the result of that operation, then code sinking entails moving that operation into the branch where it will be used.
This technique is a form of dead code elimination in the sense that it removes code when its results are discarded or unused, but in contrast to dead code elimination, it can remove pointless instructions even if there is a possible use of that instruction’s results in an execution code path.
Code Factoring is a term for a size-optimization technique that merges common dependencies in branches into the branch above it. [2] Just like factorizing integers decomposes a number into its smallest possible forms (as factors), code factorization transforms the code into the smallest possible form, by merging common "factors" until no duplicates remain.
Global code motion, local code motion, code scheduling, Instruction scheduling and code hoisting/sinking are all terms for a technique where instructions are rearranged (or "scheduled") to improve the efficiency of execution within the CPU. [3] [4] Modern CPUs are able to schedule five or more instructions per clock cycle. However, a CPU cannot schedule an instruction that relies on data from a currently (or not yet executed) instruction. Compilers will interleave dependencies in a manner that maximizes the amount of instructions a CPU can process at any point in time. [5]
On the defunct Intel Itanium architecture, the branch predict (BRP) instruction is manually hoisted above branches by the compiler to enable the branch to be immediately taken by the CPU. Itanium relies on additional code scheduling from the CPU to maximize efficiency in the processor. [6]
Loop-invariant code motion is the process of moving loop-invariant code to a position outside the loop, which may reduce the execution time of the loop by preventing some computations from being done twice for the same result.
LLVM has a sinking pass in its single static assignment form. LLVM 15.0 will not sink an operation if any of its code paths include a store instruction, or if it may throw an error. [7] Additionally, LLVM will not sink an instruction into a loop.
The GNU Compiler Collection implements code motion under the name "code factoring", with the purpose of reducing the size of a compiled program. [8] GCC will move any code upwards or downwards if it "[does not] invalidate any existing dependences nor introduce new ones". [9]
LuaJIT uses code sinking under the name "Allocation sinking", to reduce the amount of time compiled code spends allocating and collecting temporary objects within a loop. [10] Allocation sinking moves allocations to execution paths where the allocated object may escape the executing code, and will thus require heap allocation. All removed allocations are filled in with load-to-store forwarding over their fields. [11]
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software under the GNU General Public License. GCC is a key component of the GNU toolchain which is used for most projects related to GNU and the Linux kernel. With roughly 15 million lines of code in 2019, GCC is one of the largest free programs in existence. It has played an important role in the growth of free software, as both a tool and an example.
An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect. It is typically CPU and memory intensive. In practice, factors such as available memory and a programmer's willingness to wait for compilation limit the optimizations that a compiler might provide. Research indicates that some optimization problems are NP-complete, or even undecidable.
Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction-level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.
In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.
In computing, inline expansion, or inlining, is a manual or compiler optimization that replaces a function call site with the body of the called function. Inline expansion is similar to macro expansion, but occurs during compilation, without changing the source code, while macro expansion occurs prior to compilation, and results in different text that is then processed by the compiler.
In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.
In computer science, program optimization, code optimization, or software optimization is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be optimized so that it executes more rapidly, or to make it capable of operating with less memory storage or other resources, or draw less power.
Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution.
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.
LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.
Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called Independence architectures. It was the basis for Intel and HP development of the Intel Itanium architecture, and HP later asserted that "EPIC" was merely an old term for the Itanium architecture. EPIC permits microprocessors to execute software instructions in parallel by using the compiler, rather than complex on-die circuitry, to control parallel instruction execution. This was intended to allow simple performance scaling without resorting to higher clock frequencies.
Open64 is a free, open-source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called MIPSPro. It was initially released in 2000 as GNU GPL software under the name Pro64. The following year, University of Delaware adopted the project and renamed the compiler to Open64. It now mostly serves as a research platform for compiler and computer architecture research groups. Open64 supports Fortran 77/95 and C/C++, as well as the shared memory programming model OpenMP. It can conduct high-quality interprocedural analysis, data-flow analysis, data dependence analysis, and array region analysis. Development has ceased, although other projects can use the project's source.
In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a higher-level language such as C or Ada.
In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" IR must be accurate – capable of representing the source code without loss of information – and independent of any particular source or target language. An IR may take one of several forms: an in-memory data structure, or a special tuple- or stack-based code readable by the program. In the latter case it is also called an intermediate language.
In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform.
In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capabilities. Most execution time of a scientific program is spent on loops; as such, many compiler optimization techniques have been developed to make them faster.
Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, which processes one operation on multiple pairs of operands at once. For example, modern conventional computers, including specialized supercomputers, typically have vector operations that simultaneously perform operations such as the following four additions :
Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.
This article needs additional or more specific categories .(March 2022) |