Superoptimization

Last updated November 16, 2024

Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely optimal code, and while most standard compiler optimizations only improve code partly, a superoptimizer's goal is to find the optimal sequence, the canonical form. Superoptimizers can be used to improve conventional optimizers by highlighting missed opportunities so a human can write additional rules.

History

The term superoptimization was first coined by Alexia Massalin in the 1987 paper Superoptimizer: A Look at the Smallest Program.^[1] The label "program optimization" has been given to a field that does not aspire to optimize but only to improve. This misnomer forced Massalin to call her system a superoptimizer, which is actually an optimizer to find an optimal program.^[2]

In 1992, the GNU Superoptimizer (GSO) was developed to integrate into the GNU Compiler Collection (GCC).^[3]^[4] Later work further developed and extended these ideas.

Techniques

Traditionally, superoptimizing is performed via exhaustive brute-force search in the space of valid instruction sequences. This is a costly method, and largely impractical for general-purpose compilers. Yet, it has been shown to be useful in optimizing performance-critical inner loops. It is also possible to use a SMT solver to approach the problem, vastly improving the search efficiency (although inputs more complex than a basic block remains out of reach).^[5]

In 2001, goal-directed superoptimizing was demonstrated in the Denali project by Compaq research.^[6] In 2006, answer set declarative programming was applied to superoptimization in the Total Optimisation using Answer Set Technology (TOAST) project^[7] at the University of Bath.^[8]^[9]

Superoptimization can be used to automatically generate general-purpose peephole optimizers.^[10]

Publicly available superoptimizers

Several superoptimizers are available for free download.

For the x86 family of instruction sets:
- GNU superoptimizer (superopt)^[11] (GSO) (1992)^[3]^[4]– also supports many other ISAs
- STOKE,^[12] a stochastic optimizer^[13] for x86-64 x86 assembly language.
For ARM:
- Unbounded Superoptimizer,^[5] transforming LLVM IR into ARMv7-A assembly
For embedded systems:
- PIC microcontroller SuperOptimizer (2003)^[14]^[15]
- A feasibility study by Embecosm (2014) for AVR, based on GSO^[16]^[17]
For the JVM:
- Clojure superoptimizer for the Java virtual machine (2012)^[18]
For LLVM IR:
- souper^[19] superoptimizer for programs in the LLVM intermediate language.
For WebAssembly
- slumps^[20] provides superoptimization for WASM programs based on souper.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect.

In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.

In compiler design, static single assignment form is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform.

In computer science, instruction selection is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and register allocation; hence its output IR has an infinite set of pseudo-registers and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target machine code, bytecode, or assembly language.

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions, known as a peephole or window, that involves replacing the instructions with a logically equivalent set that has better performance.

This is an incomplete comparison of assemblers. Some assemblers are components of a compiler system for a high-level programming language and may have limited or no usable functionality outside of the compiler system. Some assemblers are hosted on the target processor and operating system, while other assemblers (cross-assemblers) may run under an unrelated operating system or processor. For example, assemblers for embedded systems are not usually hosted on the target system since it would not have the storage and terminal I/O to permit entry of a program from a keyboard. An assembler may have a single target processor or may have options to support multiple processor types.

The Tiny C Compiler is an x86, X86-64 and ARM processor C compiler initially written by Fabrice Bellard. It is designed to work for slow computers with little disk space. Windows operating system support was added in version 0.9.23. TCC is distributed under the GNU Lesser General Public License.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Alexia Massalin is an American computer scientist and programmer. She pioneered the concept of superoptimization, and designed the Synthesis kernel, a small kernel with a Unix compatibility layer that makes heavy use of self-modifying code for efficiency.

Clojure is a dynamic and functional dialect of the programming language Lisp on the Java platform.

CompCert is a formally verified optimizing compiler for a large subset of the C99 programming language which currently targets PowerPC, ARM, RISC-V, x86 and x86-64 architectures. This project, led by Xavier Leroy, started officially in 2005, funded by the French institutes ANR and INRIA. The compiler is specified, programmed and proven in Coq. It aims to be used for programming embedded systems requiring reliability. The performance of its generated code is often close to that of GCC at optimization level -O1, and always better than that of GCC without optimizations.

Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions, are extensions to the x86 instruction set architecture (ISA) for microprocessors from Intel designed to work on matrices to accelerate artificial intelligence (AI) and machine learning (ML) workloads.

Cranelift is an optimizing compiler backend that converts a target-independent intermediate representation into executable machine code. It is written in Rust. The project started in 2016 and is currently developed by Bytecode Alliance. Unlike compiler backends such as LLVM that focus more on ahead-of-time compilation, Cranelift instead focuses on just-in-time compilation with short compile time being an explicit goal of the project.

AlphaDev is an artificial intelligence system developed by Google DeepMind to discover enhanced computer science algorithms using reinforcement learning. AlphaDev is based on AlphaZero, a system that mastered the games of chess, shogi and go by self-play. AlphaDev applies the same approach to finding faster algorithms for fundamental tasks such as sorting and hashing.

References

↑ Massalin, Henry (1987). "Superoptimizer: A look at the smallest program" (PDF). ACM SIGARCH Computer Architecture News. 15 (5): 122–126. doi:10.1145/36177.36194 . Retrieved 2023-05-01. Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.
↑ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002). "Denali: A Goal-directed Superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566.
1 2 Granlund, Torbjörn; Kenner, Richard. "Eliminating branches using a superoptimizer and the GNU C compiler". Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92. CiteSeerX 10.1.1.58.3509 . doi:10.1145/143095.14314. ISBN 978-0-89791475-8. S2CID 8825539.
1 2 "Index of /gnu/superopt". GNU Operating System. Free Software Foundation, Inc. 1995-06-14. Retrieved 2023-05-01.
1 2 Jangda, Abhinav; Yorsh, Greta (2017-10-25). Unbounded superoptimization. Onward!’17, October 25–27, 2017, Vancouver, Canada. pp. 78–88. doi:10.1145/3133850.3133856.
↑ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2001-07-30). "Denali: a goal-directed superoptimizer". Compaq Systems Research Center. HP Labs. Hewlett-Packard Co. Retrieved 2023-05-01.
↑ "TOAST – KRRwiki". Department of Computer Science, Mathematical Foundations Group. Knowledge Representation and Reasoning (KRR) group. University of Bath. 2007-08-07. Archived from the original on 2012-11-28. Retrieved 2016-09-03.
↑ Brain, Martin; Crick, Tom; De Vos, Marina; Fitch, John (2006-08-17). "TOAST: Applying Answer Set Programming to Superoptimisation". In Etalle, Sandro; Truszczyński, Mirosław (eds.). Logic Programming. Springer-Verlag. pp. 270–284. doi:10.1007/11799573_21. ISBN 978-3-540-36636-2.
↑ Crick, Tom (2009). Superoptimisation: Provably Optimal Code Generation using Answer Set Programming (PhD thesis). University of Bath. Retrieved 2024-11-15.
↑ Bansal, Sorav; Aiken, Alex (2006). "Automatic Generation of Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
↑ "GNU Superoptimizer".
↑ StanfordPL. "STOKE". GitHub .
↑ Bansal, Sorav; Aiken, Alex (2008). "Binary Translation Using Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
↑ Serpell, Daniel (2003). "SuperOptimizer for Microchip's PIC microcontrollers". Google Sites. Archived from the original on 2016-10-11. Retrieved 2016-09-06.
↑ Serpell, Daniel (2003-06-21). "PIC Microcontroller SuperOptimizer". Freecode. Slashdot Media. Archived from the original on 2016-09-17. Retrieved 2016-09-06.
↑ "A feasibility study by Embecosm".
↑ Superoptimization – Embecosm Source Code
↑ Hume, Tom (2012-08-21). "Clojure program to exhaustively search for optimal Java programs". GitHub. Archived from the original on 2018-06-10. Retrieved 2016-09-06.
↑ Sasnauskas, Raimondas; Chen, Yang; Collingbourne, Peter; Ketema, Jeroen; Lup, Gratian; Taneja, Jubi; Regehr, John (2017). "Souper: A Synthesizing Superoptimizer". arXiv: 1711.04422 [cs.PL]. GitHub source code
↑ Cabrera Arteaga, Javier; Donde, Shrinish; Gu, Jian; Floros, Orestis; Satabin, Lucas; Baudry, Benoit; Monperrus, Martin (2020-03-23). Superoptimization of WebAssembly bytecode. MoreVMs: Workshop on Modern Language Runtimes, Ecosystems, and VMs. pp. 36–40. arXiv: 2002.10213 . doi:10.1145/3397537.3397567. GitHub source code

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Massalin_1987_Superoptimizer-1] Massalin, Henry (1987). "Superoptimizer: A look at the smallest program" (PDF). ACM SIGARCH Computer Architecture News. 15 (5): 122–126. doi:10.1145/36177.36194 . Retrieved 2023-05-01. Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.

[Denali-2] Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002). "Denali: A Goal-directed Superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566.

[GCCsuperACM-3] 1 2 Granlund, Torbjörn; Kenner, Richard. "Eliminating branches using a superoptimizer and the GNU C compiler". Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation - PLDI '92. CiteSeerX 10.1.1.58.3509 . doi:10.1145/143095.14314. ISBN 978-0-89791475-8. S2CID 8825539.

[GCCsuperopt-4] 1 2 "Index of /gnu/superopt". GNU Operating System. Free Software Foundation, Inc. 1995-06-14. Retrieved 2023-05-01.

[jangda-5] 1 2 Jangda, Abhinav; Yorsh, Greta (2017-10-25). Unbounded superoptimization. Onward!’17, October 25–27, 2017, Vancouver, Canada. pp. 78–88. doi:10.1145/3133850.3133856.

[Joshi_2001-6] Joshi, Rajeev; Nelson, Greg; Randall, Keith (2001-07-30). "Denali: a goal-directed superoptimizer". Compaq Systems Research Center. HP Labs. Hewlett-Packard Co. Retrieved 2023-05-01.

[TOAST-KRR-7] "TOAST – KRRwiki". Department of Computer Science, Mathematical Foundations Group. Knowledge Representation and Reasoning (KRR) group. University of Bath. 2007-08-07. Archived from the original on 2012-11-28. Retrieved 2016-09-03.

[TOAST-book-8] Brain, Martin; Crick, Tom; De Vos, Marina; Fitch, John (2006-08-17). "TOAST: Applying Answer Set Programming to Superoptimisation". In Etalle, Sandro; Truszczyński, Mirosław (eds.). Logic Programming. Springer-Verlag. pp. 270–284. doi:10.1007/11799573_21. ISBN 978-3-540-36636-2.

[crick-phd-9] Crick, Tom (2009). Superoptimisation: Provably Optimal Code Generation using Answer Set Programming (PhD thesis). University of Bath. Retrieved 2024-11-15.

[Bansal_2006-10] Bansal, Sorav; Aiken, Alex (2006). "Automatic Generation of Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.

[GSO_1992-11] "GNU Superoptimizer".

[STOKE-12] StanfordPL. "STOKE". GitHub .

[Bansal_2008-13] Bansal, Sorav; Aiken, Alex (2008). "Binary Translation Using Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.

[Serpell_2003_1-14] Serpell, Daniel (2003). "SuperOptimizer for Microchip's PIC microcontrollers". Google Sites. Archived from the original on 2016-10-11. Retrieved 2016-09-06.

[Serpell_2003_2-15] Serpell, Daniel (2003-06-21). "PIC Microcontroller SuperOptimizer". Freecode. Slashdot Media. Archived from the original on 2016-09-17. Retrieved 2016-09-06.

[Embecosm_2014-16] "A feasibility study by Embecosm".

[17] Superoptimization – Embecosm Source Code

[Hume_2012-18] Hume, Tom (2012-08-21). "Clojure program to exhaustively search for optimal Java programs". GitHub. Archived from the original on 2018-06-10. Retrieved 2016-09-06.

[souper-19] Sasnauskas, Raimondas; Chen, Yang; Collingbourne, Peter; Ketema, Jeroen; Lup, Gratian; Taneja, Jubi; Regehr, John (2017). "Souper: A Synthesizing Superoptimizer". arXiv: 1711.04422 [cs.PL]. GitHub source code

[slumps-20] Cabrera Arteaga, Javier; Donde, Shrinish; Gu, Jian; Floros, Orestis; Satabin, Lucas; Baudry, Benoit; Monperrus, Martin (2020-03-23). Superoptimization of WebAssembly bytecode. MoreVMs: Workshop on Modern Language Runtimes, Ecosystems, and VMs. pp. 36–40. arXiv: 2002.10213 . doi:10.1145/3397537.3397567. GitHub source code

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead store elimination Induction variable recognition and elimination Live-variable analysis Upwards exposed uses Use-define chain Reaching definitions
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis