Use-define chain

Last updated February 08, 2024

Within computer science, a use-definition chain (or UD chain) is a data structure that consists of a use U, of a variable, and all the definitions D of that variable that can reach that use without any other intervening definitions.^[1]^[2] A UD Chain generally means the assignment of some value to a variable.

Purpose

Making the use-define or define-use chains is a step in liveness analysis, so that logical representations of all the variables can be identified and tracked through the code.

Consider the following snippet of code:

intx=0;/* A */x=x+y;/* B *//* 1, some uses of x */x=35;/* C *//* 2, some more uses of x */

Notice that x is assigned a value at three points (marked A, B, and C). However, at the point marked "1", the use-def chain for x should indicate that its current value must have come from line B (and its value at line B must have come from line A). Contrariwise, at the point marked "2", the use-def chain for x indicates that its current value must have come from line C. Since the value of the x in block 2 does not depend on any definitions in block 1 or earlier, x might as well be a different variable there; practically speaking, it is a different variable — call it x2.

intx=0;/* A */x=x+y;/* B *//* 1, some uses of x */intx2=35;/* C *//* 2, some uses of x2 */

The process of splitting x into two separate variables is called live range splitting. See also static single assignment form.

Setup

The list of statements determines a strong order among statements.

Statements are labeled using the following conventions: $s(i)$ , where i is an integer in $[1,n]$ ; and n is the number of statements in the basic block
Variables are identified in italic (e.g., v,u and t)
Every variable is assumed to have a definition in the context or scope. (In static single assignment form, use-define chains are explicit because each chain contains a single element.)

For a variable, such as v, its declaration is identified as V (italic capital letter), and for short, its declaration is identified as $s(0)$ . In general, a declaration of a variable can be in an outer scope (e.g., a global variable).

Definition of a variable

When a variable, v, is on the LHS of an assignment statement, such as $s(j)$ , then $s(j)$ is a definition of v. Every variable (v) has at least one definition by its declaration (V) (or initialization).

Use of a variable

If variable, v, is on the RHS of statement $s(j)$ , there is a statement, $s(i)$ with i<j and $\min(j-i)$ , that it is a definition of v and it has a use at $s(j)$ (or, in short, when a variable, v, is on the RHS of a statement $s(j)$ , then v has a use at statement $s(j)$ ).

Execution

Consider the sequential execution of the list of statements, $s(i)$ , and what can now be observed as the computation at statement, j:

A definition at statement $s(i)$ with i<j is alive at j, if it has a use at a statement $s(k)$ with k ≥ j. The set of alive definitions at statement i is denoted as $A(i)$ and the number of alive definitions as $|A(i)|$ . ( $A(i)$ is a simple but powerful concept: theoretical and practical results in space complexity theory, access complexity(I/O complexity), register allocation and cache locality exploitation are based on $A(i)$ .)
A definition at statement $s(i)$ kills all previous definitions ( $s(k)$ with k<i) for the same variables.

Execution example for def-use-chain

This example is based on a Java algorithm for finding the gcd. (It is not important to understand what this function does.)

/** * @param(a, b) The values used to calculate the divisor. * @return The greatest common divisor of a and b. */intgcd(inta,intb){intc=a;intd=b;if(c==0)returnd;while(d!=0){if(c>d)c=c-d;elsed=d-c;}returnc;}

To find out all def-use-chains for variable d, do the following steps:

Search for the first time the variable is defined (write access).
In this case it is "d=b" (l.7)
Search for the first time the variable is read.
In this case it is "return d"
Write down this information in the following style: [name of the variable you are creating a def-use-chain for, the concrete write access, the concrete read access]
In this case it is: [d, d=b, return d]

Repeat these steps in the following style: combine each write access with each read access (but NOT the other way round).

The result should be:

[d,d=b,returnd][d,d=b,while(d!=0)][d,d=b,if(c>d)][d,d=b,c=c-d][d,d=b,d=d-c][d,d=d-c,while(d!=0)][d,d=d-c,if(c>d)][d,d=d-c,c=c-d][d,d=d-c,d=d-c]

You have to take care, if the variable is changed by the time.

For example: From line 7 down to line 13 in the source code, d is not redefined / changed. At line 14, d could be redefined. This is why you have to recombine this write access on d with all possible read accesses which could be reached. In this case, only the code beyond line 10 is relevant. Line 7, for example, cannot be reached again. For your understanding, you can imagine 2 different variables d:

[d1,d1=b,returnd1][d1,d1=b,while(d1!=0)][d1,d1=b,if(c>d1)][d1,d1=b,c=c-d1][d1,d1=b,d1=d1-c][d2,d2=d2-c,while(d2!=0)][d2,d2=d2-c,if(c>d2)][d2,d2=d2-c,c=c-d2][d2,d2=d2-c,d2=d2-c]

As a result, you could get something like this. The variable d1 would be replaced by b

/** * @param(a, b) The values used to calculate the divisor. * @return The greatest common divisor of a and b. **/intgcd(inta,intb){intc=a;intd;if(c==0)returnb;if(b!=0){if(c>b){c=c-b;d=b;}elsed=b-c;while(d!=0){if(c>d)c=c-d;elsed=d-c;}}returnc;}

Method of building a use-def (or ud) chain

Set definitions in statement $s(0)$
For each $i$ in $[1,n]$ , find live definitions that have use in statement $s(i)$
Make a link among definitions and uses
Set the statement $s(i)$ , as definition statement
Kill previous definitions

With this algorithm, two things are accomplished:

A directed acyclic graph (DAG) is created on the variable uses and definitions. The DAG specifies a data dependency among assignment statements, as well as a partial order (therefore parallelism among statements).
When statement $s(i)$ is reached, there is a list of live variable assignments. If only one assignment is live, for example, constant propagation might be used.

Related Research Articles

In number theory, an arithmetic, arithmetical, or number-theoretic function is generally any function f(n) whose domain is the positive integers and whose range is a subset of the complex numbers. Hardy & Wright include in their definition the requirement that an arithmetical function "expresses some arithmetical property of n". There is a larger class of number-theoretic functions that do not fit this definition, for example, the prime-counting functions. This article provides links to functions of both classes.

In mathematics, the Euclidean algorithm, or Euclid's algorithm, is an efficient method for computing the greatest common divisor (GCD) of two integers (numbers), the largest number that divides them both without a remainder. It is named after the ancient Greek mathematician Euclid, who first described it in his Elements . It is an example of an algorithm, a step-by-step procedure for performing a calculation according to well-defined rules, and is one of the oldest algorithms in common use. It can be used to reduce fractions to their simplest form, and is a part of many other number-theoretic and cryptographic calculations.

A mathematical symbol is a figure or a combination of figures that is used to represent a mathematical object, an action on mathematical objects, a relation between mathematical objects, or for structuring the other symbols that occur in a formula. As formulas are entirely constituted with symbols of various types, many symbols are needed for expressing all mathematics.

<span class="mw-page-title-main">Curve</span> Mathematical idealization of the trace left by a moving point

In mathematics, a curve is an object similar to a line, but that does not have to be straight.

In arithmetic and computer programming, the extended Euclidean algorithm is an extension to the Euclidean algorithm, and computes, in addition to the greatest common divisor (gcd) of integers a and b, also the coefficients of Bézout's identity, which are integers x and y such that

Constant folding and constant propagation are related compiler optimizations used by many modern compilers. An advanced form of constant propagation known as sparse conditional constant propagation can more accurately propagate constants and simultaneously remove dead code.

D, also known as dlang, is a multi-paradigm system programming language created by Walter Bright at Digital Mars and released in 2001. Andrei Alexandrescu joined the design and development effort in 2007. Though it originated as a re-engineering of C++, D is now a very different language drawing inspiration from other high-level programming languages, notably Java, Python, Ruby, C#, and Eiffel.

In computer programming, an assignment statement sets and/or re-sets the value stored in the storage location(s) denoted by a variable name; in other words, it copies a value into the variable. In most imperative programming languages, the assignment statement is a fundamental construct.

In mathematics, an Apollonian gasket or Apollonian net is a fractal generated by starting with a triple of circles, each tangent to the other two, and successively filling in more circles, each tangent to another three. It is named after Greek mathematician Apollonius of Perga.

In computer science, corecursion is a type of operation that is dual to recursion. Whereas recursion works analytically, starting on data further from a base case and breaking it down into smaller data and repeating until one reaches a base case, corecursion works synthetically, starting from a base case and building it up, iteratively producing data further removed from a base case. Put simply, corecursive algorithms use the data that they themselves produce, bit by bit, as they become available, and needed, to produce further bits of data. A similar but distinct concept is generative recursion which may lack a definite "direction" inherent in corecursion and recursion.

In compiler theory, a reaching definition for a given instruction is an earlier instruction whose target variable can reach the given one without an intervening assignment. For example, in the following code:

d1 : y := 3 d2 : x := y

In algebra, Gauss's lemma, named after Carl Friedrich Gauss, is a statement about polynomials over the integers, or, more generally, over a unique factorization domain. Gauss's lemma underlies all the theory of factorization and greatest common divisors of such polynomials.

In number theory, a narcissistic number in a given number base $is a number that is the sum of its own digits each raised to the power of the number of digits.$

The syntax of JavaScript is the set of rules that define a correctly structured JavaScript program.

<span class="mw-page-title-main">Recursion (computer science)</span> Use of functions that call themselves

In computer science, recursion is a method of solving a computational problem where the solution depends on solutions to smaller instances of the same problem. Recursion solves such recursive problems by using functions that call themselves from within their own code. The approach can be applied to many types of problems, and recursion is one of the central ideas of computer science.

The power of recursion evidently lies in the possibility of defining an infinite set of objects by a finite statement. In the same manner, an infinite number of computations can be described by a finite recursive program, even if this program contains no explicit repetitions.

In mathematics, a natural number a is a unitary divisor of a number b if a is a divisor of b and if a and $are coprime, having no common factor other than 1. Equivalently, a divisor a of b is a unitary divisor if and only if every prime factor of a has the same multiplicity in a as it has in b .$

In computer programming, an anonymous function is a function definition that is not bound to an identifier. Anonymous functions are often arguments being passed to higher-order functions or used for constructing the result of a higher-order function that needs to return a function. If the function is only used once, or a limited number of times, an anonymous function may be syntactically lighter than using a named function. Anonymous functions are ubiquitous in functional programming languages and other languages with first-class functions, where they fulfil the same role for the function type as literals do for other data types.

In algebra, the greatest common divisor of two polynomials is a polynomial, of the highest possible degree, that is a factor of both the two original polynomials. This concept is analogous to the greatest common divisor of two integers.

In compiler theory, a greatest common divisor test is the test used in study of loop optimization and loop dependence analysis to test the dependency between loop statements.

In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) is already known to have occurred. This particular method relies on event A occurring with some sort of relationship with another event B. In this situation, the event A can be analyzed by a conditional probability with respect to B. If the event of interest is $A$ and the event $B$ is known or assumed to have occurred, "the conditional probability of $A$ given $B$ ", or "the probability of $A$ under the condition $B$ ", is usually written as $P(A | B)$ or occasionally $P B (A)$ . This can also be understood as the fraction of probability B that intersects with A, or the ratio of the probabilities of both events happening to the "given" one happening (how many times A occurs rather than not assuming B has occurred): $.$

References

↑ Kennedy, Ken (January 1978). "Use-definition chains with applications". Computer Languages. 3 (3): 163–179. doi:10.1016/0096-0551(78)90009-7.
↑ Searle, Aaron; Gough, John; Abramson, David (2003). "DUCT: An Interactive Define-Use Chain Navigation Tool for Relative Debugging". arXiv: cs/0311037 .
↑ Leiss, Ernst L. (26 September 2006). A Programmer's Companion to Algorithm Analysis. CRC Press. ISBN 978-1-4200-1170-8.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[kennedy-1] Kennedy, Ken (January 1978). "Use-definition chains with applications". Computer Languages. 3 (3): 163–179. doi:10.1016/0096-0551(78)90009-7.

[duct-2] Searle, Aaron; Gough, John; Abramson, David (2003). "DUCT: An Interactive Define-Use Chain Navigation Tool for Relative Debugging". arXiv: cs/0311037 .

[leiss-3] Leiss, Ernst L. (26 September 2006). A Programmer's Companion to Algorithm Analysis. CRC Press. ISBN 978-1-4200-1170-8.

[1]

[2]

[3]

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop optimization	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead-store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis