Value numbering

Last updated October 06, 2024

Value numbering is a technique of determining when two computations in a program are equivalent and eliminating one of them with a semantics-preserving optimization.

Global value numbering

Global value numbering (GVN) is a compiler optimization based on the static single assignment form (SSA) intermediate representation. It sometimes helps eliminate redundant code that common subexpression elimination (CSE) does not. At the same time, however, CSE may eliminate code that GVN does not, so both are often found in modern compilers. Global value numbering is distinct from local value numbering in that the value-number mappings hold across basic block boundaries as well, and different algorithms are used to compute the mappings.

Global value numbering works by assigning a value number to variables and expressions. The same value number is assigned to those variables and expressions which are probably equivalent. For instance, in the following code:

w := 3 x := 3 y := x + 4 z := w + 4

a good GVN routine would assign the same value number to w and x, and the same value number to y and z. For instance, the map $\left\{{w}\mapsto 1,{x}\mapsto 1,{y}\mapsto 2,{z}\mapsto 2\right\}$ would constitute an optimal value-number mapping for this block. Using this information, the previous code fragment may be safely transformed into:

w := 3 x := w y := w + 4 z := y

Depending on the code following this fragment, copy propagation may be able to remove the assignments to x and to z.

The reason that GVN is sometimes more powerful than CSE comes from the fact that CSE matches lexically identical expressions whereas the GVN tries to determine an underlying equivalence. For instance, in the code:

a := c × d e := c f := e × d

Without copy propagation, CSE would not eliminate the recomputation assigned to f, but even a poor GVN algorithm should discover and eliminate this redundancy.

In IRs and source languages where rebinding (assigning to the same variable more than once) is possible, SSA form is required to perform GVN so that false $\left\{{\text{variable name}}\mapsto {\text{value number}}\right\}$ mappings are not created.

Local value numbering

Local value numbering (LVN) is a compiler optimization that aims to find multiple instances of equivalent expressions (i.e. expressions which yield the same result) and replace them with the first occurrence. LVN is a local optimization, meaning that unlike global value numbering, it operates on a single basic block at a time.

Local value numbering works by assigning a unique number to each operation and remembering these associations. Subsequent instructions are then looked up and, in case an identical instruction has already been registered, replaced with the previous instruction's result. For example:

a ← 4         a is tagged as #1 b ← 5         b is tagged as #2 c ← a + b     c (#1 + #2) is tagged as #3 d ← 5         d is tagged as #2, the same as b e ← a + d     e, being '#1 + #2' is tagged as #3

By assigning numbers to instructions, comparing for duplicates is turned into simple integer comparisons. In this particular example, c and e are assigned the same number (#3), thus signalling to the compiler that any references to e may simply be replaced with ones to c.

Difficulties and extensions

Issues when not using SSA

A naive implementation might attempt to perform the optimization by directly using the variable names instead of numbers. However, this approach does not work when the values of variables can change. Consider the pseudocode:

a ← 1         a is tagged as #1 b ← 2         b is tagged as #2 c ← a + b     c is tagged as #3 b ← 3 d ← a + b     d is incorrectly tagged as #3

In this scenario, d is incorrectly assigned the number 3 because the arguments match those of c. This is incorrect, however, because b has changed value from 2 to 3, making the actual results differ. Using the SSA representation resolves this disparity.

Using mathematical identities

A simple implementation might also be unable to catch all equivalent expressions, even when they only differ by the order of their operands. In the following example, a and b could be assigned the same number:

a ← 1 + 2 b ← 2 + 1

This issue can easily be resolved either by assigning the same number to both cases (i.e. a + b and b + a are both recorded with the same number) or by sorting the operands before checking for equivalents.^[1]

Local value numbering optimizers may also be aware of mathematical identities. Assuming a is an integer, all of the following expressions can be assigned the same value:^[2]

b ← a + 0 c ← a * 1 d ← min(a, MAX_INT) e ← max(a, a) f ← a & 0xFF..FF       (assuming '&' denotes bitwise AND)

Related Research Articles

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory use, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, algorithms that transform code to produce semantically equivalent code optimized for some aspect. It is typically CPU and memory intensive. In practice, factors such as available memory and a programmer's willingness to wait for compilation limit the optimizations that a compiler might provide. Research indicates that some optimization problems are NP-complete, or even undecidable.

In logic and computer science, specifically automated reasoning, unification is an algorithmic process of solving equations between symbolic expressions, each of the form Left-hand side = Right-hand side. For example, using x,y,z as variables, and taking f to be an uninterpreted function, the singleton equation set { f(1,y) = f(x,2) } is a syntactic first-order unification problem that has the substitution { x ↦ 1, y ↦ 2 } as its only solution.

In mathematics, equality is a relationship between two quantities or, more generally, two mathematical expressions, asserting that the quantities have the same value, or that the expressions represent the same mathematical object. Equality between $A$ and $B$ is written $A = B$ , and pronounced " $A$ equals $B$ ". In this equality, $A$ and $B$ are the members of the equality and are distinguished by calling them left-hand side or left member, and right-hand side or right member. Two objects that are not equal are said to be distinct.

Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.

In computer programming, the exclusive or swap is an algorithm that uses the exclusive or bitwise operation to swap the values of two variables without using the temporary variable which is normally required.

In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every term. Usually the terms are various language constructs of a computer program, such as variables, expressions, functions, or modules. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term.

In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.

In compiler design, static single assignment form is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.

BLISS is a system programming language developed at Carnegie Mellon University (CMU) by W. A. Wulf, D. B. Russell, and A. N. Habermann around 1970. It was perhaps the best known system language until C debuted a few years later. Since then, C became popular and common, and BLISS faded into obscurity. When C was in its infancy, a few projects within Bell Labs debated the merits of BLISS vs. C.

The syntax of the C programming language is the set of rules governing writing of software in C. It is designed to allow for programs that are extremely terse, have a close relationship with the resulting object code, and yet provide relatively high-level data abstraction. C was the first widely successful high-level language for portable operating-system development.

In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.

In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions, and analyzes whether it is worthwhile replacing them with a single variable holding the computed value.

In computer programming, the ternary conditional operator is a ternary operator that is part of the syntax for basic conditional expressions in several programming languages. It is commonly referred to as the conditional operator, conditional expression, ternary if, or inline if. An expression if a then b else c or a ? b : c evaluates to b if the value of a is true, and otherwise to c. One can read it aloud as "if a then b otherwise c". The form a ? b : c is the most common, but alternative syntax do exist; for example, Raku uses the syntax a ?? b !! c to avoid confusion with the infix operators ? and !, whereas in Visual Basic .NET, it instead takes the form If(a, b, c).

In functional programming, continuation-passing style (CPS) is a style of programming in which control is passed explicitly in the form of a continuation. This is contrasted with direct style, which is the usual style of programming. Gerald Jay Sussman and Guy L. Steele, Jr. coined the phrase in AI Memo 349 (1975), which sets out the first version of the Scheme programming language. John C. Reynolds gives a detailed account of the numerous discoveries of continuations.

In computer programming languages, a switch statement is a type of selection control mechanism used to allow the value of a variable or expression to change the control flow of program execution via search and map.

The computer programming languages C and Pascal have similar times of origin, influences, and purposes. Both were used to design their own compilers early in their lifetimes. The original Pascal definition appeared in 1969 and a first compiler in 1970. The first version of C appeared in 1972.

In compiler theory, partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. PRE is a form of common subexpression elimination.

In the Java programming language, the final keyword is used in several contexts to define an entity that can only be assigned once.

The syntax and semantics of PHP, a programming language, form a set of rules that define how a PHP program can be written and interpreted.

The syntax and semantics of Prolog, a programming language, are the sets of rules that define how a Prolog program is written and how it is interpreted, respectively. The rules are laid out in ISO standard ISO/IEC 13211 although there are differences in the Prolog implementations.

References

↑ Cooper, Keith D.; Torczon, Linda. "Terminology, Principles, and Concerns (with examples from local value numbering)". elsevier. Retrieved 15 May 2017.
↑ Cooper, Keith D.; Torczon, Linda. "Local Optimization: Value Numbering" (PDF). Rice University. Retrieved 15 May 2017.

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis