Live-variable analysis

Last updated November 23, 2023

In compilers, live variable analysis (or simply liveness analysis) is a classic data-flow analysis to calculate the variables that are live at each point in the program. A variable is live at some point if it holds a value that may be needed in the future, or equivalently if its value may be read before the next time the variable is written to.

Example

Consider the following program:

b = 3 c = 5 a = f(b * c)

The set of live variables between lines 2 and 3 is {b, c} because both are used in the multiplication on line 3. But the set of live variables after line 1 is only {b}, since variable c is updated later, on line 2. The value of variable a is not used in this code.

Note that the assignment to a may be eliminated as a is not used later, but there is insufficient information to justify removing all of line 3 as f may have side effects (printing b * c, perhaps).

Expression in terms of dataflow equations

Liveness analysis is a "backwards may" analysis. The analysis is done in a backwards order, and the dataflow confluence operator is set union. In other words, if applying liveness analysis to a function with a particular number of logical branches within it, the analysis is performed starting from the end of the function working towards the beginning (hence "backwards"), and a variable is considered live if any of the branches moving forward within the function might potentially (hence "may") need the variable's current value. This is in contrast to a "backwards must" analysis which would instead enforce this condition on all branches moving forward.

The dataflow equations used for a given basic block s and exiting block f in live variable analysis are the following:

{\mbox{GEN}}[s]

: The set of variables that are used in s before any assignment in the same basic block.

{\mbox{KILL}}[s]

: The set of variables that are assigned a value in s (in many books that discuss compiler design, KILL (s) is also defined as the set of variables assigned a value in s before any use, but this does not change the solution of the dataflow equation):

{\mbox{LIVE}}_{in}[s]={\mbox{GEN}}[s]\cup ({\mbox{LIVE}}_{out}[s]-{\mbox{KILL}}[s])

{\mbox{LIVE}}_{out}[final]={\emptyset }

{\mbox{LIVE}}_{out}[s]=\bigcup _{p\in succ[s]}{\mbox{LIVE}}_{in}[p]

{\mbox{GEN}}[d:y\leftarrow f(x_{1},\cdots ,x_{n})]=\{x_{1},...,x_{n}\}

{\mbox{KILL}}[d:y\leftarrow f(x_{1},\cdots ,x_{n})]=\{y\}

The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live.

Second example

// in: {}; predecessor blocks: none b1: a = 3;      b = 5;     d = 4;     x = 100; //x is never being used later thus not in the out set {a,b,d}     if a > b then // out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}    // in: {a,b}; predecessor blocks: b1 b2: c = a + b;     d = 2; // out: {b,d}  // in: {b,d}; predecessor blocks: b1 and b2 b3: endif     c = 4;     return b * d + c; // out:{}

The in-state of b3 only contains b and d, since c has been written. The out-state of b1 is the union of the in-states of b2 and b3. The definition of c in b2 can be removed, since c is not live immediately after the statement.

Solving the data flow equations starts with initializing all in-states and out-states to the empty set. The work list is initialized by inserting the exit point (b3) in the work list (typical for backward flow). Its computed in-state differs from the previous one, so its predecessors b1 and b2 are inserted and the process continues. The progress is summarized in the table below.

processing	out-state	old in-state	new in-state	work list
b3	{}	{}	{b,d}	(b1,b2)
b1	{b,d}	{}	{}	(b2)
b2	{b,d}	{}	{a,b}	(b1)
b1	{a,b,d}	{}	{}	()

Note that b1 was entered in the list before b2, which forced processing b1 twice (b1 was re-entered as predecessor of b2). Inserting b2 before b1 would have allowed earlier completion.

Initializing with the empty set is an optimistic initialization: all variables start out as dead. Note that the out-states cannot shrink from one iteration to the next, although the out-state can be smaller than the in-state. This can be seen from the fact that after the first iteration the out-state can only change by a change of the in-state. Since the in-state starts as the empty set, it can only grow in further iterations.

Related Research Articles

<span class="mw-page-title-main">Allan variance</span> Measure of frequency stability in clocks and oscillators

The Allan variance (AVAR), also known as two-sample variance, is a measure of frequency stability in clocks, oscillators and amplifiers. It is named after David W. Allan and expressed mathematically as $. The Allan deviation (ADEV), also known as sigma-tau, is the square root of the Allan variance, .$

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems by minimizing the sum of the squares of the residuals made in the results of each individual equation.

Curry's paradox is a paradox in which an arbitrary claim F is proved from the mere existence of a sentence C that says of itself "If C, then F", requiring only a few apparently innocuous logical deduction rules. Since F is arbitrary, any logic having these rules allows one to prove everything. The paradox may be expressed in natural language and in various logics, including certain forms of set theory, lambda calculus, and combinatory logic.

In computer science, a set is an abstract data type that can store unique values, without any particular order. It is a computer implementation of the mathematical concept of a finite set. Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set.

A Bayesian network is a probabilistic graphical model that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). It is one of several forms of causal notation. Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

In mathematics, a Heyting algebra (also known as pseudo-Boolean algebra) is a bounded lattice (with join and meet operations written ∨ and ∧ and with least element 0 and greatest element 1) equipped with a binary operation a → b of implication such that (c ∧ a) ≤ b is equivalent to c ≤ (a → b). From a logical standpoint, A → B is by this definition the weakest proposition for which modus ponens, the inference rule A → B, A ⊢ B, is sound. Like Boolean algebras, Heyting algebras form a variety axiomatizable with finitely many equations. Heyting algebras were introduced by Arend Heyting (1930) to formalize intuitionistic logic.

In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is difficult. This sequence can be used to approximate the joint distribution ; to approximate the marginal distribution of one of the variables, or some subset of the variables ; or to compute an integral. Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

In statistics, nonlinear regression is a form of regression analysis in which observational data are modeled by a function which is a nonlinear combination of the model parameters and depends on one or more independent variables. The data are fitted by a method of successive approximations.

In numerical analysis, Brent's method is a hybrid root-finding algorithm combining the bisection method, the secant method and inverse quadratic interpolation. It has the reliability of bisection but it can be as quick as some of the less-reliable methods. The algorithm tries to use the potentially fast-converging secant method or inverse quadratic interpolation if possible, but it falls back to the more robust bisection method if necessary. Brent's method is due to Richard Brent and builds on an earlier algorithm by Theodorus Dekker. Consequently, the method is also known as the Brent–Dekker method.

CARINE (Computer Aided Reasoning Engine) is a first-order classical logic automated theorem prover. It was initially built for the study of the enhancement effects of the strategies delayed clause-construction (DCC) and attribute sequences (ATS) in a depth-first search based algorithm. CARINE's main search algorithm is semi-linear resolution (SLR) which is based on an iteratively-deepening depth-first search (also known as depth-first iterative-deepening (DFID)) and used in theorem provers like THEO. SLR employs DCC to achieve a high inference rate, and ATS to reduce the search space.

In compiler theory, a reaching definition for a given instruction is an earlier instruction whose target variable can reach the given one without an intervening assignment. For example, in the following code:

d1 : y := 3 d2 : x := y

The Frank–Wolfe algorithm is an iterative first-order optimization algorithm for constrained convex optimization. Also known as the conditional gradient method, reduced gradient algorithm and the convex combination algorithm, the method was originally proposed by Marguerite Frank and Philip Wolfe in 1956. In each iteration, the Frank–Wolfe algorithm considers a linear approximation of the objective function, and moves towards a minimizer of this linear function.

In the philosophy of science, a causal model is a conceptual model that describes the causal mechanisms of a system. Several types of causal notation may be used in the development of a causal model. Causal models can improve study designs by providing clear rules for deciding which independent variables need to be included/controlled for.

Linear Programming Boosting (LPBoost) is a supervised classifier from the boosting family of classifiers. LPBoost maximizes a margin between training samples of different classes and hence also belongs to the class of margin-maximizing supervised classification algorithms. Consider a classification function

The median polish is a simple and robust exploratory data analysis procedure proposed by the statistician John Tukey. The purpose of median polish is to find an additively-fit model for data in a two-way layout table of the form row effect + column effect + overall median.

In mathematics, the least-upper-bound property is a fundamental property of the real numbers. More generally, a partially ordered set $X$ has the least-upper-bound property if every non-empty subset of $X$ with an upper bound has a least upper bound (supremum) in $X$ . Not every (partially) ordered set has the least upper bound property. For example, the set $of all rational numbers with its natural order does not have the least upper bound property.$

ACE is the collection of units, implementing both a public key encryption scheme and a digital signature scheme. Corresponding names for these schemes — «ACE Encrypt» and «ACE Sign». Schemes are based on Cramer-Shoup public key encryption scheme and Cramer-Shoup signature scheme. Introduced variants of these schemes are intended to achieve a good balance between performance and security of the whole encryption system.

Most of the terms listed in Wikipedia glossaries are already defined and explained within Wikipedia itself. However, glossaries like this one are useful for looking up, comparing and reviewing large numbers of terms together. You can help enhance this page by adding new terms or writing definitions for existing ones.

LSH is a cryptographic hash function designed in 2014 by South Korea to provide integrity in general-purpose software environments such as PCs and smart devices. LSH is one of the cryptographic algorithms approved by the Korean Cryptographic Module Validation Program (KCMVP). And it is the national standard of South Korea.

References

Aho, Alfred; Lam, Monica; Sethi, Ravi; Ullman, Jeffrey (2007). Compilers: Principles, Techniques, and Tools (2 ed.). p. 608.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop optimization	Automatic parallelization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead-store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis