Static single-assignment form

Last updated

In compiler design, static single assignment form (often abbreviated as SSA form or simply SSA) is a property of an intermediate representation (IR) that requires each variable to be assigned exactly once and defined before it is used. Existing variables in the original IR are split into versions, new variables typically indicated by the original name with a subscript in textbooks, so that every definition gets its own version. In SSA form, use-def chains are explicit and each contains a single element.

Contents

One can expect to find SSA in a compiler for Fortran, C, C++, [1] or Java (Android Runtime); [2] [3] whereas in functional language compilers, such as those for Scheme and ML, continuation-passing style (CPS) is generally used. SSA is formally equivalent to a well-behaved subset of CPS excluding non-local control flow, which does not occur when CPS is used as intermediate representation. [1] So optimizations and transformations formulated in terms of one immediately apply to the other.

History

SSA was developed in the 1980s by several researchers at IBM and Kenneth Zadeck of Brown University. [4] [5] A 1986 paper introduced birthpoints, identity assignments, and variable renaming such that variables had a single static assignment. [6] A subsequent 1987 paper by Jeanne Ferrante and Ronald Kaplan [7] proved that the renaming done in the previous paper removes all false dependencies for scalars. [5] In 1988, Barry Rosen, Mark N. Wegman, and Kenneth Zadeck replaced the identity assignments with Φ-functions, introduced the name "static single-assignment form", and demonstrated a now-common SSA optimization. [8] The name Φ-function was chosen by Rosen to be a more publishable version of "phony function". [5] Alpern, Wegman, and Zadeck presented another optimization, but using the name "single static assignment". [9] Finally, in 1989, Rosen, Wegman, and Zadeck and (independently) Cytron and Ferrante worked on methods of computing SSA form. A few days before the submission deadline, the researchers realized they had substantially the same algorithm, [5] and a 5-author paper was the result. [10]

Benefits

The primary usefulness of SSA comes from how it simultaneously simplifies and improves the results of a variety of compiler optimizations, by simplifying the properties of variables. For example, consider this piece of code:

y := 1 y := 2 x := y

Humans can see that the first assignment is not necessary, and that the value of y being used in the third line comes from the second assignment of y. A program would have to perform reaching definition analysis to determine this. But if the program is in SSA form, both of these are immediate:

y1 := 1 y2 := 2 x1 := y2

Compiler optimization algorithms that are either enabled or strongly enhanced by the use of SSA include:

Converting to SSA

Converting ordinary code into SSA form is primarily a matter of replacing the target of each assignment with a new variable, and replacing each use of a variable with the "version" of the variable reaching that point. For example, consider the following control-flow graph:

An example control-flow graph, before conversion to SSA SSA example1.1.png
An example control-flow graph, before conversion to SSA

Changing the name on the left hand side of "x x - 3" and changing the following uses of x to that new name would leave the program unaltered. This can be exploited in SSA by creating two new variables: x1 and x2, each of which is assigned only once. Likewise, giving distinguishing subscripts to all the other variables yields:

An example control-flow graph, partially converted to SSA SSA example1.2.png
An example control-flow graph, partially converted to SSA

It is clear which definition each use is referring to, except for one case: both uses of y in the bottom block could be referring to either y1 or y2, depending on which path the control flow took.

To resolve this, a special statement is inserted in the last block, called a Φ (Phi) function. This statement will generate a new definition of y called y3 by "choosing" either y1 or y2, depending on the control flow in the past.

An example control-flow graph, fully converted to SSA SSA example1.3.png
An example control-flow graph, fully converted to SSA

Now, the last block can simply use y3, and the correct value will be obtained either way. A Φ function for x is not needed: only one version of x, namely x2 is reaching this place, so there is no problem (in other words, Φ(x2,x2)=x2).

Given an arbitrary control-flow graph, it can be difficult to tell where to insert Φ functions, and for which variables. This general question has an efficient solution that can be computed using a concept called dominance frontiers (see below).

Φ functions are not implemented as machine operations on most machines. A compiler can implement a Φ function by inserting "move" operations at the end of every predecessor block. In the example above, the compiler might insert a move from y1 to y3 at the end of the middle-left block and a move from y2 to y3 at the end of the middle-right block. These move operations might not end up in the final code based on the compiler's register allocation procedure. However, this approach may not work when simultaneous operations are speculatively producing inputs to a Φ function, as can happen on wide-issue machines. Typically, a wide-issue machine has a selection instruction used in such situations by the compiler to implement the Φ function.

Computing minimal SSA using dominance frontiers

In a control-flow graph, a node A is said to strictly dominate a different node B if it is impossible to reach B without passing through A first. In other words, if node B is reached, then it can be assumed that A has run. A is said to dominate B (or B to be dominated by A) if either A strictly dominates B or A = B.

A node which transfers control to a node A is called an immediate predecessor of A.

The dominance frontier of node A is the set of nodes B where A does not strictly dominate B, but does dominate some immediate predecessor of B. These are the points at which multiple control paths merge back together into a single path.

For example, in the following code:

[1] x = random() if x < 0.5     [2] result = "heads" else     [3] result = "tails" end [4] print(result) 

node 1 strictly dominates 2, 3, and 4 and the immediate predecessors of node 4 are nodes 2 and 3.

Dominance frontiers define the points at which Φ functions are needed. In the above example, when control is passed to node 4, the definition of result used depends on whether control was passed from node 2 or 3. Φ functions are not needed for variables defined in a dominator, as there is only one possible definition that can apply.

There is an efficient algorithm for finding dominance frontiers of each node. This algorithm was originally described in “Efficiently Computing Static Single Assignment Form and the Control Graph” by Ron Cytron, Jeanne Ferrante, et al in 1991. [12]

Keith D. Cooper, Timothy J. Harvey, and Ken Kennedy of Rice University describe an algorithm in their paper titled A Simple, Fast Dominance Algorithm: [13]

for each node b     dominance_frontier(b) := {} for each node b     if the number of immediate predecessors of b ≥ 2         for each p in immediate predecessors of b             runner := p             while runner ≠ idom(b)                 dominance_frontier(runner) := dominance_frontier(runner) ∪ { b }                 runner := idom(runner)

In the code above, idom(b) is the immediate dominator of b, the unique node that strictly dominates b but does not strictly dominate any other node that strictly dominates b.

Variations that reduce the number of Φ functions

"Minimal" SSA inserts the minimal number of Φ functions required to ensure that each name is assigned a value exactly once and that each reference (use) of a name in the original program can still refer to a unique name. (The latter requirement is needed to ensure that the compiler can write down a name for each operand in each operation.)

However, some of these Φ functions could be dead . For this reason, minimal SSA does not necessarily produce the fewest Φ functions that are needed by a specific procedure. For some types of analysis, these Φ functions are superfluous and can cause the analysis to run less efficiently.

Pruned SSA

Pruned SSA form is based on a simple observation: Φ functions are only needed for variables that are "live" after the Φ function. (Here, "live" means that the value is used along some path that begins at the Φ function in question.) If a variable is not live, the result of the Φ function cannot be used and the assignment by the Φ function is dead.

Construction of pruned SSA form uses live-variable information in the Φ function insertion phase to decide whether a given Φ function is needed. If the original variable name isn't live at the Φ function insertion point, the Φ function isn't inserted.

Another possibility is to treat pruning as a dead-code elimination problem. Then, a Φ function is live only if any use in the input program will be rewritten to it, or if it will be used as an argument in another Φ function. When entering SSA form, each use is rewritten to the nearest definition that dominates it. A Φ function will then be considered live as long as it is the nearest definition that dominates at least one use, or at least one argument of a live Φ.

Semi-pruned SSA

Semi-pruned SSA form [14] is an attempt to reduce the number of Φ functions without incurring the relatively high cost of computing live-variable information. It is based on the following observation: if a variable is never live upon entry into a basic block, it never needs a Φ function. During SSA construction, Φ functions for any "block-local" variables are omitted.

Computing the set of block-local variables is a simpler and faster procedure than full live-variable analysis, making semi-pruned SSA form more efficient to compute than pruned SSA form. On the other hand, semi-pruned SSA form will contain more Φ functions.

Block arguments

Block arguments are an alternative to Φ functions that is representationally identical but in practice can be more convenient during optimization. Blocks are named and take a list of block arguments, notated as function parameters. When calling a block the block arguments are bound to specified values. Swift SIL and MLIR use block arguments. [15]

Converting out of SSA form

SSA form is not normally used for direct execution (although it is possible to interpret SSA [16] ), and it is frequently used "on top of" another IR with which it remains in direct correspondence. This can be accomplished by "constructing" SSA as a set of functions that map between parts of the existing IR (basic blocks, instructions, operands, etc.) and its SSA counterpart. When the SSA form is no longer needed, these mapping functions may be discarded, leaving only the now-optimized IR.

Performing optimizations on SSA form usually leads to entangled SSA-Webs, meaning there are Φ instructions whose operands do not all have the same root operand. In such cases color-out algorithms are used to come out of SSA. Naive algorithms introduce a copy along each predecessor path that caused a source of different root symbol to be put in Φ than the destination of Φ. There are multiple algorithms for coming out of SSA with fewer copies, most use interference graphs or some approximation of it to do copy coalescing. [17]

Extensions

Extensions to SSA form can be divided into two categories.

Renaming scheme extensions alter the renaming criterion. Recall that SSA form renames each variable when it is assigned a value. Alternative schemes include static single use form (which renames each variable at each statement when it is used) and static single information form (which renames each variable when it is assigned a value, and at the post-dominance frontier).

Feature-specific extensions retain the single assignment property for variables, but incorporate new semantics to model additional features. Some feature-specific extensions model high-level programming language features like arrays, objects and aliased pointers. Other feature-specific extensions model low-level architectural features like speculation and predication.

Compilers using SSA form

SSA form is a relatively recent development in the compiler community. As such, many older compilers only use SSA form for some part of the compilation or optimization process, but most do not rely on it. Examples of compilers that rely heavily on SSA form include:

Related Research Articles

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

<span class="mw-page-title-main">Control-flow graph</span> Graphical representation of a computer program or algorithm

In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that Reese T. Prosser used boolean connectivity matrices for flow analysis before.

In computer programming, the scope of a name binding is the part of a program where the name binding is valid; that is, where the name can be used to refer to the entity. In other parts of the program, the name may refer to a different entity, or to nothing at all. Scope helps prevent name collisions by allowing the same name to refer to different objects – as long as the names have separate scopes. The scope of a name binding is also known as the visibility of an entity, particularly in older or more technical literature—this is in relation to the referenced entity, not the referencing name.

Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.

<span class="mw-page-title-main">Dominator (graph theory)</span> When every path in a control-flow graph must go through one node to reach another

In computer science, a node d of a control-flow graph dominates a node n if every path from the entry node to n must go through d. Notationally, this is written as d dom n. By definition, every node dominates itself.

In compiler theory, dead-code elimination is a compiler optimization to remove dead code. Removing such code has several benefits: it shrinks program size, an important consideration in some contexts, it reduces resource usage such as the number of bytes to be transferred and it allows the running program to avoid executing irrelevant operations, which reduces its running time. It can also enable further optimizations by simplifying program structure. Dead code includes code that can never be executed, and code that only affects dead variables, that is, irrelevant to the program.

In computer science, three-address code is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is typically a combination of assignment and a binary operator. For example, t1 := t2 + t3. The name derives from the use of three operands in these statements even though instructions with fewer operands may occur.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

In functional programming, continuation-passing style (CPS) is a style of programming in which control is passed explicitly in the form of a continuation. This is contrasted with direct style, which is the usual style of programming. Gerald Jay Sussman and Guy L. Steele, Jr. coined the phrase in AI Memo 349 (1975), which sets out the first version of the Scheme programming language. John C. Reynolds gives a detailed account of the numerous discoveries of continuations.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

In computer programming languages, a switch statement is a type of selection control mechanism used to allow the value of a variable or expression to change the control flow of program execution via search and map.

In computer science, sparse conditional constant propagation (SCCP) is an optimization frequently applied in compilers after conversion to static single assignment form (SSA). It simultaneously removes some kinds of dead code and propagates constants throughout a program. Moreover, it can find more constant values, and thus more opportunities for improvement, than separately applying dead code elimination and constant propagation in any order or any number of repetitions.

In static program analysis, Soot is a bytecode manipulation and optimization framework consisting of intermediate languages for Java. It has been developed by the Sable Research Group at McGill University. Soot is currently maintained by the Secure Software Engineering Group at Paderborn University. Soot provides four intermediate representations for use through its API for other analysis programs to access and build upon:

A data dependency in computer science is a situation in which a program statement (instruction) refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements is called dependence analysis.

Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimizations by analyzing the entire program as opposed to a single function or block of code.

<span class="mw-page-title-main">History of compiler construction</span>

In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.

Tracing just-in-time compilation is a technique used by virtual machines to optimize the execution of a program at runtime. This is done by recording a linear sequence of frequently executed operations, compiling them to native machine code and executing them. This is opposed to traditional just-in-time (JIT) compilers that work on a per-method basis.

Value numbering is a technique of determining when two computations in a program are equivalent and eliminating one of them with a semantics-preserving optimization.

A sea of nodes is a graph representation of single-static assignment (SSA) representation of a program that combines data flow and control flow, and relaxes the control flow from a total order to a partial order, keeping only the orderings required by data flow. It is similar to a value dependency graph (VDG).

References

Notes

  1. 1 2 Kelsey, Richard A. (1995). "A correspondence between continuation passing style and static single assignment form" (PDF). Papers from the 1995 ACM SIGPLAN workshop on Intermediate representations. pp. 13–22. doi:10.1145/202529.202532. ISBN   0897917545. S2CID   6207179.{{cite book}}: CS1 maint: date and year (link)
  2. Rogers, Ian (2020). "Efficient global register allocation". arXiv: 2011.05608 [cs.PL].
  3. The Evolution of ART - Google I/O 2016. Google. 25 May 2016. Event occurs at 3m47s.
  4. Rastello & Tichadou 2022, sec. 1.4.
  5. 1 2 3 4 Zadeck, Kenneth (April 2009). The Development of Static Single Assignment Form (PDF). Static Single-Assignment Form Seminar. Autrans, France.
  6. Cytron, Ron; Lowry, Andy; Zadeck, F. Kenneth (1986). "Code motion of control structures in high-level languages". Proceedings of the 13th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages - POPL '86: 70–85. doi:10.1145/512644.512651. S2CID   9099471.
  7. Cytron, Ronald Kaplan, and Jeanne Ferrante. What's in a name? Or, the value of renaming for parallelism detection and storage allocation. IBM TJ Watson Research Center, 1987.
  8. Barry Rosen; Mark N. Wegman; F. Kenneth Zadeck (1988). "Global value numbers and redundant computations" (PDF). Proceedings of the 15th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages.
  9. Alpern, B.; Wegman, M. N.; Zadeck, F. K. (1988). "Detecting equality of variables in programs". Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '88. pp. 1–11. doi:10.1145/73560.73561. ISBN   0897912527. S2CID   18384941.
  10. Cytron, Ron; Ferrante, Jeanne; Rosen, Barry K.; Wegman, Mark N. & Zadeck, F. Kenneth (1991). "Efficiently computing static single assignment form and the control dependence graph" (PDF). ACM Transactions on Programming Languages and Systems. 13 (4): 451–490. CiteSeerX   10.1.1.100.6361 . doi:10.1145/115372.115320. S2CID   13243943.
  11. value range propagation
  12. Cytron, Ron; Ferrante, Jeanne; Rosen, Barry K.; Wegman, Mark N.; Zadeck, F. Kenneth (1 October 1991). "Efficiently computing static single assignment form and the control dependence graph". ACM Transactions on Programming Languages and Systems. 13 (4): 451–490. doi: 10.1145/115372.115320 . S2CID   13243943.
  13. Cooper, Keith D.; Harvey, Timothy J.; Kennedy, Ken (2001). "A Simple, Fast Dominance Algorithm" (PDF). Archived from the original (PDF) on 2022-03-26.{{cite journal}}: Cite journal requires |journal= (help)
  14. Briggs, Preston; Cooper, Keith D.; Harvey, Timothy J.; Simpson, L. Taylor (1998). "Practical Improvements to the Construction and Destruction of Static Single Assignment Form" (PDF). Archived from the original (PDF) on 2010-06-07.{{cite journal}}: Cite journal requires |journal= (help)
  15. "Block Arguments vs PHI nodes - MLIR Rationale". mlir.llvm.org. Retrieved 4 March 2022.
  16. von Ronne, Jeffery; Ning Wang; Michael Franz (2004). "Interpreting programs in static single assignment form". Proceedings of the 2004 workshop on Interpreters, virtual machines and emulators - IVME '04. p. 23. doi:10.1145/1059579.1059585. ISBN   1581139098. S2CID   451410.
  17. Boissinot, Benoit; Darte, Alain; Rastello, Fabrice; Dinechin, Benoît Dupont de; Guillon, Christophe (2008). "Revisiting Out-of-SSA Translation for Correctness, Code Quality, and Efficiency". HAL-Inria Cs.DS: 14.
  18. "The Java HotSpot Performance Engine Architecture". Oracle Corporation.
  19. "Introducing a new, advanced Visual C++ code optimizer". 4 May 2016.
  20. "Illinois Concert Project".
  21. "IonMonkey Overview".,
  22. "Bytecode Optimizations". the LuaJIT project.
  23. "HipHop Intermediate Representation (HHIR)". GitHub . 30 October 2021.
  24. Ananian, C. Scott; Rinard, Martin (1999). "Static Single Information Form". CiteSeerX   10.1.1.1.9976 .{{cite journal}}: Cite journal requires |journal= (help)
  25. Encyclopedia of Parallel Computing.
  26. "Go 1.7 Release Notes - The Go Programming Language". golang.org. Retrieved 2016-08-17.
  27. "Go 1.8 Release Notes - The Go Programming Language". golang.org. Retrieved 2017-02-17.
  28. "SPIR-V spec" (PDF).
  29. Ekstrand, Jason (16 December 2014). "Reintroducing NIR, a new IR for mesa".
  30. "Introducing the WebKit FTL JIT". 13 May 2014.
  31. "Introducing the B3 JIT Compiler". 15 February 2016.
  32. "Swift Intermediate Language (GitHub)". GitHub . 30 October 2021.
  33. "Swift's High-Level IR: A Case Study of Complementing LLVM IR with Language-Specific Optimization, LLVM Developers Meetup 10/2015". YouTube . Archived from the original on 2021-12-21.
  34. "OTP 22.0 Release Notes".

General references