Control-flow graph

Last updated January 14, 2025

In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was conceived by Frances E. Allen,^[1] who noted that Reese T. Prosser used boolean connectivity matrices for flow analysis before.^[2]

Definition

In a control-flow graph each node in the graph represents a basic block, i.e. a straight-line sequence of code with a single entry point and a single exit point, where no branches or jumps occur within the block. Basic blocks starts with jump targets and ends with jumps or branch instructions. Directed edges are used to represent jumps in the control flow. There are, in most presentations, two specially designated blocks: the entry block, through which control enters into the flow graph, and the exit block, through which all control flow leaves.^[3]

Because of its construction procedure, in a CFG, every edge A→B has the property that:

outdegree(A) > 1 or indegree(B) > 1 (or both).^[4]

The CFG can thus be obtained, at least conceptually, by starting from the program's (full) flow graph—i.e. the graph in which every node represents an individual instruction—and performing an edge contraction for every edge that falsifies the predicate above, i.e. contracting every edge whose source has a single exit and whose destination has a single entry. This contraction-based algorithm is of no practical importance, except as a visualization aid for understanding the CFG construction, because the CFG can be more efficiently constructed directly from the program by scanning it for basic blocks.^[4]

Example

Consider the following fragment of code:

0: (A) t0 = read_num 1: (A) if t0 mod 2 == 0 2: (B)     print t0 + " is even." 3: (B)     goto 5 4: (C) print t0 + " is odd." 5: (D) end program

In the above, we have 4 basic blocks: A from 0 to 1, B from 2 to 3, C at 4 and D at 5. In particular, in this case, A is the "entry block", D the "exit block" and lines 4 and 5 are jump targets. A graph for this fragment has edges from A to B, A to C, B to D and C to D.

Reachability

Reachability is a graph property useful in optimization.

If a subgraph is not connected from the subgraph containing the entry block, that subgraph is unreachable during any execution, and so is unreachable code; under normal conditions it can be safely removed.

If the exit block is unreachable from the entry block, an infinite loop may exist. Not all infinite loops are detectable, see Halting problem. A halting order may also exist there.

Unreachable code and infinite loops are possible even if the programmer does not explicitly code them: optimizations like constant propagation and constant folding followed by jump threading can collapse multiple basic blocks into one, cause edges to be removed from a CFG, etc., thus possibly disconnecting parts of the graph.

Domination relationship

A block M dominates a block N if every path from the entry that reaches block N has to pass through block M. The entry block dominates all blocks.

In the reverse direction, block M postdominates block N if every path from N to the exit has to pass through block M. The exit block postdominates all blocks.

It is said that a block M immediately dominates block N if M dominates N, and there is no intervening block P such that M dominates P and P dominates N. In other words, M is the last dominator on all paths from entry to N. Each block has a unique immediate dominator.

Similarly, there is a notion of immediate postdominator, analogous to immediate dominator.

The dominator tree is an ancillary data structure depicting the dominator relationships. There is an arc from Block M to Block N if M is an immediate dominator of N. This graph is a tree, since each block has a unique immediate dominator. This tree is rooted at the entry block. The dominator tree can be calculated efficiently using Lengauer–Tarjan's algorithm.

A postdominator tree is analogous to the dominator tree. This tree is rooted at the exit block.

Special edges

A back edge is an edge that points to a block that has already been met during a depth-first (DFS) traversal of the graph. Back edges are typical of loops.

A critical edge is an edge which is neither the only edge leaving its source block, nor the only edge entering its destination block. These edges must be split: a new block must be created in the middle of the edge, in order to insert computations on the edge without affecting any other edges.

An abnormal edge is an edge whose destination is unknown. Exception handling constructs can produce them. These edges tend to inhibit optimization.

An impossible edge (also known as a fake edge) is an edge which has been added to the graph solely to preserve the property that the exit block postdominates all blocks. It cannot ever be traversed.

Loop management

A loop header (sometimes called the entry point of the loop) is a dominator that is the target of a loop-forming back edge. The loop header dominates all blocks in the loop body. A block may be a loop header for more than one loop. A loop may have multiple entry points, in which case it has no "loop header".

Suppose block M is a dominator with several incoming edges, some of them being back edges (so M is a loop header). It is advantageous to several optimization passes to break M up into two blocks M_pre and M_loop. The contents of M and back edges are moved to M_loop, the rest of the edges are moved to point into M_pre, and a new edge from M_pre to M_loop is inserted (so that M_pre is the immediate dominator of M_loop). In the beginning, M_pre would be empty, but passes like loop-invariant code motion could populate it. M_pre is called the loop pre-header, and M_loop would be the loop header.

Reducibility

A reducible CFG is one with edges that can be partitioned into two disjoint sets: forward edges, and back edges, such that:^[5]

Forward edges form a directed acyclic graph with all nodes reachable from the entry node.
For all back edges (A, B), node B dominates node A.

Structured programming languages are often designed such that all CFGs they produce are reducible, and common structured programming statements such as IF, FOR, WHILE, BREAK, and CONTINUE produce reducible graphs. To produce irreducible graphs, statements such as GOTO are needed. Irreducible graphs may also be produced by some compiler optimizations.

Loop connectedness

The loop connectedness of a CFG is defined with respect to a given depth-first search tree (DFST) of the CFG. This DFST should be rooted at the start node and cover every node of the CFG.

Edges in the CFG which run from a node to one of its DFST ancestors (including itself) are called back edges.

The loop connectedness is the largest number of back edges found in any cycle-free path of the CFG. In a reducible CFG, the loop connectedness is independent of the DFST chosen.^[6]^[7]

Loop connectedness has been used to reason about the time complexity of data-flow analysis.^[6]

Inter-procedural control-flow graph

While control-flow graphs represent the control flow of a single procedure, inter-procedural control-flow graphs represent the control flow of whole programs.^[8]

Related Research Articles

In mathematics and computer science, graph theory is the study of graphs, which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of vertices which are connected by edges. A distinction is made between undirected graphs, where edges link two vertices symmetrically, and directed graphs, where edges link two vertices asymmetrically. Graphs are one of the principal objects of study in discrete mathematics.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations, a.k.a. compiler optimizations – algorithms that transform code to produce semantically equivalent code optimized for some aspect.

In computer science, program analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness, robustness, safety and liveness. Program analysis focuses on two major areas: program optimization and program correctness. The first focuses on improving the program’s performance while reducing the resource usage while the latter focuses on ensuring that the program does what it is supposed to do.

<span class="mw-page-title-main">Directed acyclic graph</span> Directed graph with no directed cycles

In mathematics, particularly graph theory, and computer science, a directed acyclic graph (DAG) is a directed graph with no directed cycles. That is, it consists of vertices and edges, with each edge directed from one vertex to another, such that following those directions will never form a closed loop. A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions. DAGs have numerous scientific and computational applications, ranging from biology to information science to computation (scheduling).

This is a glossary of graph theory. Graph theory is the study of graphs, systems of nodes or vertices connected in pairs by lines or edges.

In compiler design, static single assignment form is a type of intermediate representation (IR) where each variable is assigned exactly once. SSA is used in most high-quality optimizing compilers for imperative languages, including LLVM, the GNU Compiler Collection, and many commercial compilers.

In computer science, a node $d$ of a control-flow graph dominates a node $n$ if every path from the entry node to $n$ must go through $d$ . Notationally, this is written as $d dom n$ . By definition, every node dominates itself.

In the mathematical field of graph theory, a spanning treeT of an undirected graph G is a subgraph that is a tree which includes all of the vertices of G. In general, a graph may have several spanning trees, but a graph that is not connected will not contain a spanning tree. If all of the edges of G are also edges of a spanning tree T of G, then G is a tree and is identical to T.

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.

In mathematics, and, in particular, in graph theory, a rooted graph is a graph in which one vertex has been distinguished as the root. Both directed and undirected versions of rooted graphs have been studied, and there are also variant definitions that allow multiple roots.

The structured program theorem, also called the Böhm–Jacopini theorem, is a result in programming language theory. It states that a class of control-flow graphs can compute any computable function if it combines subprograms in only three specific ways. These are

Executing one subprogram, and then another subprogram (sequence)
Executing one of two subprograms according to the value of a boolean expression (selection)
Repeatedly executing a subprogram as long as a boolean expression is true (iteration)

In compiler theory, a reaching definition for a given instruction is an earlier instruction whose target variable can reach the given one without an intervening assignment. For example, in the following code:

d1 : y := 3 d2 : x := y

Essential complexity is a numerical measure defined by Thomas J. McCabe, Sr., in his highly cited, 1976 paper better known for introducing cyclomatic complexity. McCabe defined essential complexity as the cyclomatic complexity of the reduced CFG after iteratively replacing (reducing) all structured programming control structures, i.e. those having a single entry point and a single exit point with placeholder single statements.

Network motifs are recurrent and statistically significant subgraphs or patterns of a larger graph. All networks, including biological networks, social networks, technological networks and more, can be represented as graphs, which include a wide variety of subgraphs.

<span class="mw-page-title-main">Pseudoforest</span> Graph with at most one cycle per component

In graph theory, a pseudoforest is an undirected graph in which every connected component has at most one cycle. That is, it is a system of vertices and edges connecting pairs of vertices, such that no two cycles of consecutive edges share any vertex with each other, nor can any two cycles be connected to each other by a path of consecutive edges. A pseudotree is a connected pseudoforest.

A signal-flow graph or signal-flowgraph (SFG), invented by Claude Shannon, but often called a Mason graph after Samuel Jefferson Mason who coined the term, is a specialized flow graph, a directed graph in which nodes represent system variables, and branches represent functional connections between pairs of nodes. Thus, signal-flow graph theory builds on that of directed graphs, which includes as well that of oriented graphs. This mathematical theory of digraphs exists, of course, quite apart from its applications.

In mathematics graph theory, a single-entry single-exit (SESE) region in a given graph is an ordered edge pair.

Control dependency is a situation in which a program instruction executes if the previous instruction evaluates in a way that allows its execution.

References

↑ Frances E. Allen (July 1970). "Control flow analysis". SIGPLAN Notices. 5 (7): 1–19. doi:10.1145/390013.808479.
↑ Reese T. Prosser (1959). "Applications of Boolean matrices to the analysis of flow diagrams". Papers presented at the December 1–3, 1959, eastern joint IRE-AIEE-ACM computer conference. pp. 133–138. doi:10.1145/1460299.1460314.
↑ Yousefi, Javad (2015). "Masking wrong-successor Control Flow Errors employing data redundancy". 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE. pp. 201–205. doi:10.1109/ICCKE.2015.7365827. ISBN 978-1-4673-9280-8.
1 2 Peri L. Tarr; Alexander L. Wolf (2011). Engineering of Software: The Continuing Contributions of Leon J. Osterweil. Springer Science & Business Media. p. 58. ISBN 978-3-642-19823-6.
↑ "Archived copy" (PDF). Archived from the original (PDF) on 2020-08-01. Retrieved 2018-03-24.{{cite web}}: CS1 maint: archived copy as title (link)
1 2 Kam, John B.; Ullman, Jeffrey D. (1976-01-01). "Global Data Flow Analysis and Iterative Algorithms". Journal of the ACM. 23 (1): 158–171. doi: 10.1145/321921.321938 . ISSN 0004-5411. S2CID 162375.
↑ Offner, Carl. "Notes on Graph Algorithms Used in Optimizing Compilers" (PDF). Archived (PDF) from the original on 2008-07-25. Retrieved 13 April 2018.
↑ "Control Flow Analysis" (PDF). 2016. Archived (PDF) from the original on 2016-12-19.

External links

Examples

Avrora – Control-Flow Graph Tool Archived 2011-08-25 at the Wayback Machine

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Frances E. Allen (July 1970). "Control flow analysis". SIGPLAN Notices. 5 (7): 1–19. doi:10.1145/390013.808479.

[2] Reese T. Prosser (1959). "Applications of Boolean matrices to the analysis of flow diagrams". Papers presented at the December 1–3, 1959, eastern joint IRE-AIEE-ACM computer conference. pp. 133–138. doi:10.1145/1460299.1460314.

[3] Yousefi, Javad (2015). "Masking wrong-successor Control Flow Errors employing data redundancy". 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE. pp. 201–205. doi:10.1109/ICCKE.2015.7365827. ISBN 978-1-4673-9280-8.

[TarrWolf2011-4] 1 2 Peri L. Tarr; Alexander L. Wolf (2011). Engineering of Software: The Continuing Contributions of Leon J. Osterweil. Springer Science & Business Media. p. 58. ISBN 978-3-642-19823-6.

[5] "Archived copy" (PDF). Archived from the original (PDF) on 2020-08-01. Retrieved 2018-03-24.{{cite web}}: CS1 maint: archived copy as title (link)

[:0-6] 1 2 Kam, John B.; Ullman, Jeffrey D. (1976-01-01). "Global Data Flow Analysis and Iterative Algorithms". Journal of the ACM. 23 (1): 158–171. doi: 10.1145/321921.321938 . ISSN 0004-5411. S2CID 162375.

[7] Offner, Carl. "Notes on Graph Algorithms Used in Optimizing Compilers" (PDF). Archived (PDF) from the original on 2008-07-25. Retrieved 13 April 2018.

[8] "Control Flow Analysis" (PDF). 2016. Archived (PDF) from the original on 2016-12-19.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]