Instruction selection

Last updated

In computer science, instruction selection is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and register allocation; hence its output IR has an infinite set of pseudo-registers (often known as temporaries) and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target machine code, bytecode, or assembly language.

Contents

For example, for the following sequence of middle-level IR code

t1 = a t2 = b t3 = t1 + t2 a = t3 b = t1 

a good instruction sequence for the x86 architecture is

MOVEAX,aXCHGEAX,bADDa,EAX

For a comprehensive survey on instruction selection, see. [1] [2]

Macro expansion

The simplest approach to instruction selection is known as macro expansion [3] or interpretative code generation. [4] [5] [6] A macro-expanding instruction selector operates by matching templates over the middle-level IR. Upon a match the corresponding macro is executed, using the matched portion of the IR as input, which emits the appropriate target instructions. Macro expansion can be done either directly on the textual representation of the middle-level IR, [7] [8] or the IR can first be transformed into a graphical representation which is then traversed depth-first. [9] In the latter, a template matches one or more adjacent nodes in the graph.

Unless the target machine is very simple, macro expansion in isolation typically generates inefficient code. To mitigate this limitation, compilers that apply this approach typically combine it with peephole optimization to replace combinations of simple instructions with more complex equivalents that increase performance and reduce code size. This is known as the Davidson-Fraser approach and is currently applied in GCC. [10]

Graph covering

Another approach is to first transform the middle-level IR into a graph and then cover the graph using patterns. A pattern is a template that matches a portion of the graph and can be implemented with a single instruction provided by the target machine. The goal is to cover the graph such that the total cost of the selected patterns is minimized, where the cost typically represents the number of cycles it takes to execute the instruction. For tree-shaped graphs, the least-cost cover can be found in linear time using dynamic programming, [11] but for DAGs and full-fledged graphs the problem becomes NP-complete and thus is most often solved using either greedy algorithms or methods from combinatorial optimization. [12] [13] [14]

Related Research Articles

<span class="mw-page-title-main">Assembly language</span> Low-level programming language

In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

<span class="mw-page-title-main">Macro (computer science)</span> Rule for substituting a set input with a set output

In computer programming, a macro is a rule or pattern that specifies how a certain input should be mapped to a replacement output. Applying a macro to an input is known as macro expansion. The input and output may be a sequence of lexical tokens or characters, or a syntax tree. Character macros are supported in software applications to make it easy to invoke common command sequences. Token and tree macros are supported in some programming languages to enable code reuse or to extend the language, sometimes for domain-specific languages.

In computing, a virtual machine (VM) is the virtualization or emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve specialized hardware, software, or a combination of the two. Virtual machines differ and are organized by their function, shown here:

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computing, code generation is part of the process chain of a compiler and converts intermediate representation of source code into a form that can be readily executed by the target system.

In compiler design, static single assignment form is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

The structured program theorem, also called the Böhm–Jacopini theorem, is a result in programming language theory. It states that a class of control-flow graphs can compute any computable function if it combines subprograms in only three specific ways. These are

  1. Executing one subprogram, and then another subprogram (sequence)
  2. Executing one of two subprograms according to the value of a boolean expression (selection)
  3. Repeatedly executing a subprogram as long as a boolean expression is true (iteration)

In software engineering, profiling is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid program optimization, and more specifically, performance engineering.

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.

Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely optimal code, and while most standard compiler optimizations only improve code partly, a superoptimizer's goal is to find the optimal sequence, the canonical form. Superoptimizers can be used to improve conventional optimizers by highlighting missed opportunities so a human can write additional rules.

Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

A decompiler is a computer program that translates an executable file to high-level source code. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level language. While disassemblers translate an executable into assembly language, decompilers go a step further and translate the code into a higher level language such as C or Java, requiring more sophisticated techniques. Decompilers are usually unable to perfectly reconstruct the original source code, thus will frequently produce obfuscated code. Nonetheless, they remain an important tool in the reverse engineering of computer software.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

In computing, compiler correctness is the branch of computer science that deals with trying to show that a compiler behaves according to its language specification. Techniques include developing the compiler using formal methods and using rigorous testing on an existing compiler.

<span class="mw-page-title-main">Object code optimizer</span> Aspect of software compilation

An object code optimizer, sometimes also known as a post pass optimizer or, for small sections of code, peephole optimizer, forms part of a software compiler. It takes the output from the source language compile step - the object code or binary file - and tries to replace identifiable sections of the code with replacement code that is more algorithmically efficient.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

In computer science, a Program Dependence Graph (PDG) is a representation of graph notation that makes a program's dependencies explicit. These dependencies are used during dependence analysis in optimizing compilers to make transformations so that multiple cores are used, and parallelism is improved.

Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradient descent, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and artificial intelligence. One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the Advanced Concepts Team at the European Space Agency in early 2016.

References

  1. Blindell, Gabriel S. Hjort (2013). Survey on Instruction Selection: An Extensive and Modern Literature Review (Report). arXiv: 1306.4898 . ISBN   978-91-7501-898-0.
  2. Blindell, Gabriel S. Hjort (2016). Instruction Selection: Principles, Methods, & Applications. Springer. doi:10.1007/978-3-319-34019-7. ISBN   978-3-319-34017-3. S2CID   13390131.
  3. Brown, P. (1969). "A Survey of Macro Processors". Annual Review in Automatic Programming. 6 (2): 37–88. doi:10.1016/0066-4138(69)90001-9. ISSN   0066-4138.
  4. Cattell, R. G. G. (1979). "A Survey and Critique of Some Models of Code Generation" (PDF). School of Computer Science, Carnegie Mellon University (Technical report). Archived (PDF) from the original on May 23, 2019.
  5. Ganapathi, M.; Fischer, C. N.; Hennessy, J. L. (1982). "Retargetable Compiler Code Generation". Computing Surveys. 14 (4): 573–592. doi:10.1145/356893.356897. ISSN   0360-0300. S2CID   2361347.
  6. Lunell, H. (1983). Code Generator Writing Systems (Doctoral thesis). Linköping, Sweden: Linköping University.
  7. Ammann, U.; Nori, K. V.; Jensen, K.; Nägeli, H. (1974). "The PASCAL (P) Compiler Implementation Notes". Instituts für Informatik (Technical report).
  8. Orgass, R. J.; Waite, W. M. (1969). "A Base for a Mobile Programming System". Communications of the ACM. 12 (9): 507–510. doi: 10.1145/363219.363226 . S2CID   8164996.
  9. Wilcox, T. R. (1971). Generating Machine Code for High-Level Programming Languages (Doctoral thesis). Ithaca, New York, USA: Cornell University.
  10. Davidson, J. W.; Fraser, C. W. (1984). "Code Selection Through Object Code Optimization". ACM Transactions on Programming Languages and Systems. 6 (4): 505–526. CiteSeerX   10.1.1.76.3796 . doi:10.1145/1780.1783. ISSN   0164-0925. S2CID   10315537.
  11. Aho, A. V.; Ganapathi, M.; Tjiang, S. W. K. (1989). "Code Generation Using Tree Matching and Dynamic Programming". ACM Transactions on Programming Languages and Systems. 11 (4): 491–516. CiteSeerX   10.1.1.456.9102 . doi:10.1145/69558.75700. S2CID   1165995.
  12. Wilson, T.; Grewal, G.; Halley, B.; Banerji, D. (1994). "An integrated approach to retargetable code generation". Proceedings of 7th International Symposium on High-Level Synthesis. pp. 70–75. CiteSeerX   10.1.1.521.8288 . doi:10.1109/ISHLS.1994.302339. ISBN   978-0-8186-5785-6. S2CID   14384424.
  13. Bashford, Steven; Leupers, Rainer (1999). "Constraint driven code selection for fixed-point DSPS". Proceedings of the 36th ACM/IEEE conference on Design automation conference - DAC '99. pp. 817–822. CiteSeerX   10.1.1.331.390 . doi:10.1145/309847.310076. ISBN   978-1581331097. S2CID   5513238.
  14. Floch, A.; Wolinski, C.; Kuchcinski, K. (2010). "Combined Scheduling and Instruction Selection for Processors with Reconfigurable Cell Fabric". Proceedings of the 21st International Conference on Application-Specific Architectures and Processors (ASAP'10): 167–174.