Superoptimization

Last updated

Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely optimal code, and while most standard compiler optimizations only improve code partly, a superoptimizer's goal is to find the optimal sequence, the canonical form. Superoptimizers can be used to improve conventional optimizers by highlighting missed opportunities so a human can write additional rules.

Contents

History

The term superoptimization was first coined by Alexia Massalin in the 1987 paper Superoptimizer: A Look at the Smallest Program. [1] The label "program optimization" has been given to a field that does not aspire to optimize but only to improve. This misnomer forced Massalin to call her system a superoptimizer, which is actually an optimizer to find an optimal program. [2]

In 1992, the GNU Superoptimizer (GSO) was developed to integrate into the GNU Compiler Collection (GCC). [3] [4] Later work further developed and extended these ideas.

Techniques

Traditionally, superoptimizing is performed via exhaustive brute-force search in the space of valid instruction sequences. This is a costly method, and largely impractical for general-purpose compilers. Yet, it has been shown to be useful in optimizing performance-critical inner loops. It is also possible to use a SMT solver to approach the problem, vastly improving the search efficiency (although inputs more complex than a basic block remains out of reach). [5]

In 2001, goal-directed superoptimizing was demonstrated in the Denali project by Compaq research. [6] In 2006, answer set declarative programming was applied to superoptimization in the Total Optimisation using Answer Set Technology (TOAST) project [7] at the University of Bath. [8] [9]

Superoptimization can be used to automatically generate general-purpose peephole optimizers. [10]

Publicly available superoptimizers

Several superoptimizers are available for free download.

See also

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect.

In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.

In compiler design, static single assignment form is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform.

In computer science, instruction selection is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and register allocation; hence its output IR has an infinite set of pseudo-registers and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target machine code, bytecode, or assembly language.

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions, known as a peephole or window, that involves replacing the instructions with a logically equivalent set that has better performance.

This is an incomplete comparison of assemblers. Some assemblers are components of a compiler system for a high-level programming language and may have limited or no usable functionality outside of the compiler system. Some assemblers are hosted on the target processor and operating system, while other assemblers (cross-assemblers) may run under an unrelated operating system or processor. For example, assemblers for embedded systems are not usually hosted on the target system since it would not have the storage and terminal I/O to permit entry of a program from a keyboard. An assembler may have a single target processor or may have options to support multiple processor types.

<span class="mw-page-title-main">Tiny C Compiler</span> Compiler for the C programming language

The Tiny C Compiler is an x86, X86-64 and ARM processor C compiler initially written by Fabrice Bellard. It is designed to work for slow computers with little disk space. Windows operating system support was added in version 0.9.23. TCC is distributed under the GNU Lesser General Public License.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Alexia Massalin is an American computer scientist and programmer. She pioneered the concept of superoptimization, and designed the Synthesis kernel, a small kernel with a Unix compatibility layer that makes heavy use of self-modifying code for efficiency.

<span class="mw-page-title-main">Clojure</span> Dialect of the Lisp programming language on the Java platform

Clojure is a dynamic and functional dialect of the programming language Lisp on the Java platform.

CompCert is a formally verified optimizing compiler for a large subset of the C99 programming language which currently targets PowerPC, ARM, RISC-V, x86 and x86-64 architectures. This project, led by Xavier Leroy, started officially in 2005, funded by the French institutes ANR and INRIA. The compiler is specified, programmed and proven in Coq. It aims to be used for programming embedded systems requiring reliability. The performance of its generated code is often close to that of GCC at optimization level -O1, and always better than that of GCC without optimizations.

Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions, are extensions to the x86 instruction set architecture (ISA) for microprocessors from Intel designed to work on matrices to accelerate artificial intelligence (AI) and machine learning (ML) workloads.

Cranelift is an optimizing compiler backend that converts a target-independent intermediate representation into executable machine code. It is written in Rust. The project started in 2016 and is currently developed by Bytecode Alliance. Unlike compiler backends such as LLVM that focus more on ahead-of-time compilation, Cranelift instead focuses on just-in-time compilation with short compile time being an explicit goal of the project.

AlphaDev is an artificial intelligence system developed by Google DeepMind to discover enhanced computer science algorithms using reinforcement learning. AlphaDev is based on AlphaZero, a system that mastered the games of chess, shogi and go by self-play. AlphaDev applies the same approach to finding faster algorithms for fundamental tasks such as sorting and hashing.

References

  1. Massalin, Henry (1987). "Superoptimizer: A look at the smallest program" (PDF). ACM SIGARCH Computer Architecture News. 15 (5): 122–126. doi:10.1145/36177.36194 . Retrieved 2023-05-01. Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.
  2. Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002). "Denali: A Goal-directed Superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566.
  3. 1 2 Granlund, Torbjörn; Kenner, Richard. "Eliminating branches using a superoptimizer and the GNU C compiler". Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92. CiteSeerX   10.1.1.58.3509 . doi:10.1145/143095.14314. ISBN   978-0-89791475-8. S2CID   8825539.
  4. 1 2 "Index of /gnu/superopt". GNU Operating System. Free Software Foundation, Inc. 1995-06-14. Retrieved 2023-05-01.
  5. 1 2 Jangda, Abhinav; Yorsh, Greta (2017-10-25). Unbounded superoptimization. Onward!’17, October 25–27, 2017, Vancouver, Canada. pp. 78–88. doi:10.1145/3133850.3133856.
  6. Joshi, Rajeev; Nelson, Greg; Randall, Keith (2001-07-30). "Denali: a goal-directed superoptimizer". Compaq Systems Research Center. HP Labs. Hewlett-Packard Co. Retrieved 2023-05-01.
  7. "TOAST – KRRwiki". Department of Computer Science, Mathematical Foundations Group. Knowledge Representation and Reasoning (KRR) group. University of Bath. 2007-08-07. Archived from the original on 2012-11-28. Retrieved 2016-09-03.
  8. Brain, Martin; Crick, Tom; De Vos, Marina; Fitch, John (2006-08-17). "TOAST: Applying Answer Set Programming to Superoptimisation". In Etalle, Sandro; Truszczyński, Mirosław (eds.). Logic Programming. Springer-Verlag. pp. 270–284. doi:10.1007/11799573_21. ISBN   978-3-540-36636-2.
  9. Crick, Tom (2009). Superoptimisation: Provably Optimal Code Generation using Answer Set Programming (PhD thesis). University of Bath. Retrieved 2024-11-15.
  10. Bansal, Sorav; Aiken, Alex (2006). "Automatic Generation of Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
  11. "GNU Superoptimizer".
  12. StanfordPL. "STOKE". GitHub .
  13. Bansal, Sorav; Aiken, Alex (2008). "Binary Translation Using Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
  14. Serpell, Daniel (2003). "SuperOptimizer for Microchip's PIC microcontrollers". Google Sites. Archived from the original on 2016-10-11. Retrieved 2016-09-06.
  15. Serpell, Daniel (2003-06-21). "PIC Microcontroller SuperOptimizer". Freecode. Slashdot Media. Archived from the original on 2016-09-17. Retrieved 2016-09-06.
  16. "A feasibility study by Embecosm".
  17. Superoptimization – Embecosm Source Code
  18. Hume, Tom (2012-08-21). "Clojure program to exhaustively search for optimal Java programs". GitHub. Archived from the original on 2018-06-10. Retrieved 2016-09-06.
  19. Sasnauskas, Raimondas; Chen, Yang; Collingbourne, Peter; Ketema, Jeroen; Lup, Gratian; Taneja, Jubi; Regehr, John (2017). "Souper: A Synthesizing Superoptimizer". arXiv: 1711.04422 [cs.PL]. GitHub source code
  20. Cabrera Arteaga, Javier; Donde, Shrinish; Gu, Jian; Floros, Orestis; Satabin, Lucas; Baudry, Benoit; Monperrus, Martin (2020-03-23). Superoptimization of WebAssembly bytecode. MoreVMs: Workshop on Modern Language Runtimes, Ecosystems, and VMs. pp. 36–40. arXiv: 2002.10213 . doi:10.1145/3397537.3397567. GitHub source code