Superoptimization

Last updated February 07, 2024

Superoptimization is the process where a compiler automatically finds the optimal sequence for a loop-free sequence of instructions. Real-world compilers generally cannot produce genuinely optimal code, and while most standard compiler optimizations only improve code partly, a superoptimizer's goal is to find the optimal sequence, the canonical form. Superoptimizers can be used to improve conventional optimizers by highlighting missed opportunities so a human can write additional rules.

History

The term superoptimization was first coined by Alexia Massalin in the 1987 paper Superoptimizer: A Look at the Smallest Program.^[1] The label "program optimization" has been given to a field that does not aspire to optimize but only to improve. This misnomer forced Massalin to call her system a superoptimizer, which is actually an optimizer to find an optimal program.^[2]

In 1992, the GNU Superoptimizer (GSO) was developed to integrate into the GNU Compiler Collection (GCC).^[3]^[4] Later work further developed and extended these ideas.

Techniques

Traditionally, superoptimizing is performed via exhaustive brute-force search in the space of valid instruction sequences. This is a costly method, and thus impractical for general-purpose compilers. Yet, it has been shown to be useful in optimizing performance-critical inner loops. It is also possible to use a SMT solver to approach the problem, vastly improving the search efficiency (although inputs more complex than a basic block remains out of reach).^[5]

In 2001, goal-directed superoptimizing was demonstrated in the Denali project by Compaq research.^[6] In 2006, answer set declarative programming was used for superoptimising in the Total Optimisation using Answer Set Technology (TOAST) project at the University of Bath.^[7]^[8]

Superoptimization can be used to automatically generate general-purpose peephole optimizers.^[9]

Publicly available superoptimizers

Several superoptimizers are available for free download.

For the x86 family of instruction sets:
- GNU superoptimizer (superopt)^[10] (GSO) (1992)^[3]^[4]– also supports many other ISAs
- STOKE,^[11] a stochastic optimizer^[12] for x86-64 x86 assembly language.
For ARM:
- Unbounded Superoptimizer,^[5] transforming LLVM IR into ARMv7-A assembly
For embedded systems:
- PIC microcontroller SuperOptimizer (2003)^[13]^[14]
- A feasibility study by Embecosm (2014) for AVR, based on GSO^[15]^[16]
For the JVM:
- Clojure superoptimizer for the Java virtual machine (2012)^[17]
For LLVM IR:
- souper^[18] superoptimizer for programs in the LLVM intermediate language.
For WebAssembly
- slumps^[19] provides superoptimization for WASM programs based on souper.

Related Research Articles

In computing, a compiler is a computer program that translates computer code written in one programming language into another language. The name "compiler" is primarily used for programs that translate source code from a high-level programming language to a low-level programming language to create an executable program.

In computing, binary translation is a form of binary recompilation where sequences of instructions are translated from a source instruction set to the target instruction set. In some cases such as instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and hot spot detection.

In compiler design, static single assignment form is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.

In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code:

In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform.

In computer science, instruction selection is the stage of a compiler backend that transforms its middle-level intermediate representation (IR) into a low-level IR. In a typical compiler, instruction selection precedes both instruction scheduling and register allocation; hence its output IR has an infinite set of pseudo-registers and may still be – and typically is – subject to peephole optimization. Otherwise, it closely resembles the target machine code, bytecode, or assembly language.

Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions; the small set is known as the peephole or window.

This is an incomplete list of assemblers: computer programs that translate assembly language source code into binary programs. Some assemblers are components of a compiler system for a high level language and may have limited or no usable functionality outside of the compiler system. Some assemblers are hosted on the target processor and operating system, while other assemblers (cross-assemblers) may run under an unrelated operating system or processor. For example, assemblers for embedded systems are not usually hosted on the target system since it would not have the storage and terminal I/O to permit entry of a program from a keyboard. An assembler may have a single target processor or may have options to support multiple processor types. Very simple assemblers may lack features, such as macros, present in more powerful versions.

The Tiny C Compiler is an x86, X86-64 and ARM processor C compiler initially written by Fabrice Bellard. It is designed to work for slow computers with little disk space. Windows operating system support was added in version 0.9.23. TCC is distributed under the GNU Lesser General Public License.

In C and related programming languages, long double refers to a floating-point data type that is often more precise than double precision though the language standard only requires it to be at least as precise as double. As with C's other floating-point types, it may not necessarily map to an IEEE format.

In computer science, ahead-of-time compilation is the act of compiling an (often) higher-level programming language into an (often) lower-level language before execution of a program, usually at build-time, to reduce the amount of work needed to be performed at run time.

Alexia Massalin is an American computer scientist and programmer. She pioneered the concept of superoptimization, and designed the Synthesis kernel, a small kernel with a Unix compatibility layer that makes heavy use of self-modifying code for efficiency.

Clojure is a dynamic and functional dialect of the Lisp programming language on the Java platform.

CompCert is a formally verified optimizing compiler for a large subset of the C99 programming language which currently targets PowerPC, ARM, RISC-V, x86 and x86-64 architectures. This project, led by Xavier Leroy, started officially in 2005, funded by the French institutes ANR and INRIA. The compiler is specified, programmed and proven in Coq. It aims to be used for programming embedded systems requiring reliability. The performance of its generated code is often close to that of GCC at optimization level -O1, and always better than that of GCC without optimizations.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

HipHop Virtual Machine (HHVM) is an open-source virtual machine based on just-in-time (JIT) compilation that serves as an execution engine for the Hack programming language. By using the principle of JIT compilation, Hack code is first transformed into intermediate HipHop bytecode (HHBC), which is then dynamically translated into x86-64 machine code, optimized, and natively executed. This contrasts with PHP's usual interpreted execution, in which the Zend Engine transforms PHP source code into opcodes that serve as a form of bytecode, and executes the opcodes directly on the Zend Engine's virtual CPU.

Advanced Matrix Extensions (AMX), also known as Intel Advanced Matrix Extensions, are extensions to the x86 instruction set architecture (ISA) for microprocessors from Intel designed to work on matrices to accelerate artificial intelligence (AI) and machine learning (ML) workloads.

Cranelift is an optimizing compiler backend that converts a target-independent intermediate representation into executable machine code. It is written in Rust. The project started in 2016 and is currently developed by Bytecode Alliance. Unlike compiler backends such as LLVM that focus more on ahead-of-time compilation, Cranelift instead focuses on just-in-time compilation with short compile time being an explicit goal of the project.

References

↑ Massalin, Henry (1987). "Superoptimizer: A look at the smallest program" (PDF). ACM SIGARCH Computer Architecture News. 15 (5): 122–126. doi:10.1145/36177.36194 . Retrieved 2023-05-01. Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.
↑ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002). "Denali: A Goal-directed Superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566.
1 2 Granlund, Torbjörn; Kenner, Richard (July 1992). "Eliminating branches using a superoptimizer and the GNU C compiler". ACM SIGPLAN Notices. 27 (7): 341–352. CiteSeerX 10.1.1.58.3509 . doi:10.1145/143095.143146. ISBN 978-0-89791475-8. S2CID 8825539.
1 2 "Index of /gnu/superopt". GNU Operating System. Free Software Foundation, Inc. 1995-06-14. Retrieved 2023-05-01.
1 2 Jangda, Abhinav; Yorsh, Greta (2017-10-25). Unbounded superoptimization. Onward!’17, October 25–27, 2017, Vancouver, Canada. pp. 78–88. doi:10.1145/3133850.3133856.
↑ Joshi, Rajeev; Nelson, Greg; Randall, Keith (2001-07-30). "Denali: a goal-directed superoptimizer". Compaq Systems Research Center. HP Labs. Hewlett-Packard Co. Retrieved 2023-05-01.
↑ Brain, Martin; Crick, Tom; De Vos, Marina; Fitch, John (2006-08-17). "TOAST: Applying Answer Set Programming to Superoptimisation". In Etalle, Sandro; Truszczyński, Mirosław (eds.). Logic Programming. Springer-Verlag, Berlin / Heidelberg. pp. 270–284. ISBN 978-3-540-36636-2.
↑ "TOAST – KRRwiki". Department of Computer Science, Mathematical Foundations Group. Knowledge Representation and Reasoning (KRR) group. University of Bath. 2007-08-07. Archived from the original on 2012-11-28. Retrieved 2016-09-03.
↑ Bansal, Sorav; Aiken, Alex (2006). "Automatic Generation of Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
↑ "GNU Superoptimizer".
↑ StanfordPL. "STOKE". GitHub .
↑ Bansal, Sorav; Aiken, Alex (2008). "Binary Translation Using Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.
↑ Serpell, Daniel (2003). "SuperOptimizer for Microchip's PIC microcontrollers". Google Sites. Archived from the original on 2016-10-11. Retrieved 2016-09-06.
↑ Serpell, Daniel (2003-06-21). "PIC Microcontroller SuperOptimizer". Freecode. Slashdot Media. Archived from the original on 2016-09-17. Retrieved 2016-09-06.
↑ "A feasibility study by Embecosm".
↑ Superoptimization – Embecosm Source Code
↑ Hume, Tom (2012-08-21). "Clojure program to exhaustively search for optimal Java programs". GitHub. Archived from the original on 2018-06-10. Retrieved 2016-09-06.
↑ Sasnauskas, Raimondas; Chen, Yang; Collingbourne, Peter; Ketema, Jeroen; Lup, Gratian; Taneja, Jubi; Regehr, John (2017). "Souper: A Synthesizing Superoptimizer". arXiv: 1711.04422 . GitHub source code
↑ Cabrera Arteaga, Javier; Donde, Shrinish; Gu, Jian; Floros, Orestis; Satabin, Lucas; Baudry, Benoit; Monperrus, Martin (2020-03-23). Superoptimization of WebAssembly bytecode. MoreVMs: Workshop on Modern Language Runtimes, Ecosystems, and VMs. pp. 36–40. arXiv: 2002.10213 . doi:10.1145/3397537.3397567. GitHub source code

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Massalin_1987_Superoptimizer-1] Massalin, Henry (1987). "Superoptimizer: A look at the smallest program" (PDF). ACM SIGARCH Computer Architecture News. 15 (5): 122–126. doi:10.1145/36177.36194 . Retrieved 2023-05-01. Given an instruction set, the superoptimizer finds the shortest program to compute a function. Startling programs have been generated, many of them engaging in convoluted bit-fiddling bearing little resemblance to the source programs which defined the functions. The key idea in the superoptimizer is a probabilistic test that makes exhaustive searches practical for programs of useful size.

[Denali-2] Joshi, Rajeev; Nelson, Greg; Randall, Keith (2002). "Denali: A Goal-directed Superoptimizer". ACM SIGPLAN Notices. 37 (5): 304–314. doi:10.1145/543552.512566.

[GCCsuperACM-3] 1 2 Granlund, Torbjörn; Kenner, Richard (July 1992). "Eliminating branches using a superoptimizer and the GNU C compiler". ACM SIGPLAN Notices. 27 (7): 341–352. CiteSeerX 10.1.1.58.3509 . doi:10.1145/143095.143146. ISBN 978-0-89791475-8. S2CID 8825539.

[GCCsuperopt-4] 1 2 "Index of /gnu/superopt". GNU Operating System. Free Software Foundation, Inc. 1995-06-14. Retrieved 2023-05-01.

[jangda-5] 1 2 Jangda, Abhinav; Yorsh, Greta (2017-10-25). Unbounded superoptimization. Onward!’17, October 25–27, 2017, Vancouver, Canada. pp. 78–88. doi:10.1145/3133850.3133856.

[Joshi_2001-6] Joshi, Rajeev; Nelson, Greg; Randall, Keith (2001-07-30). "Denali: a goal-directed superoptimizer". Compaq Systems Research Center. HP Labs. Hewlett-Packard Co. Retrieved 2023-05-01.

[TOAST-book-7] Brain, Martin; Crick, Tom; De Vos, Marina; Fitch, John (2006-08-17). "TOAST: Applying Answer Set Programming to Superoptimisation". In Etalle, Sandro; Truszczyński, Mirosław (eds.). Logic Programming. Springer-Verlag, Berlin / Heidelberg. pp. 270–284. ISBN 978-3-540-36636-2.

[TOAST-KRR-8] "TOAST – KRRwiki". Department of Computer Science, Mathematical Foundations Group. Knowledge Representation and Reasoning (KRR) group. University of Bath. 2007-08-07. Archived from the original on 2012-11-28. Retrieved 2016-09-03.

[Bansal_2006-9] Bansal, Sorav; Aiken, Alex (2006). "Automatic Generation of Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.

[GSO_1992-10] "GNU Superoptimizer".

[STOKE-11] StanfordPL. "STOKE". GitHub .

[Bansal_2008-12] Bansal, Sorav; Aiken, Alex (2008). "Binary Translation Using Peephole Superoptimizers" (PDF). Retrieved 2023-05-01.

[Serpell_2003_1-13] Serpell, Daniel (2003). "SuperOptimizer for Microchip's PIC microcontrollers". Google Sites. Archived from the original on 2016-10-11. Retrieved 2016-09-06.

[Serpell_2003_2-14] Serpell, Daniel (2003-06-21). "PIC Microcontroller SuperOptimizer". Freecode. Slashdot Media. Archived from the original on 2016-09-17. Retrieved 2016-09-06.

[Embecosm_2014-15] "A feasibility study by Embecosm".

[16] Superoptimization – Embecosm Source Code

[Hume_2012-17] Hume, Tom (2012-08-21). "Clojure program to exhaustively search for optimal Java programs". GitHub. Archived from the original on 2018-06-10. Retrieved 2016-09-06.

[souper-18] Sasnauskas, Raimondas; Chen, Yang; Collingbourne, Peter; Ketema, Jeroen; Lup, Gratian; Taneja, Jubi; Regehr, John (2017). "Souper: A Synthesizing Superoptimizer". arXiv: 1711.04422 . GitHub source code

[slumps-19] Cabrera Arteaga, Javier; Donde, Shrinish; Gu, Jian; Floros, Orestis; Satabin, Lucas; Baudry, Benoit; Monperrus, Martin (2020-03-23). Superoptimization of WebAssembly bytecode. MoreVMs: Workshop on Modern Language Runtimes, Ecosystems, and VMs. pp. 36–40. arXiv: 2002.10213 . doi:10.1145/3397537.3397567. GitHub source code

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop optimization	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead-store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis