Basic block

Last updated November 04, 2023

In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit.^[1]^[2] This restricted form makes a basic block highly amenable to analysis.^[3] Compilers usually decompose programs into their basic blocks as a first step in the analysis process. Basic blocks form the vertices or nodes in a control-flow graph.

Definition

The code in a basic block has:

One entry point, meaning that no code within it is the destination of a jump instruction anywhere in the program.
One exit point, meaning that only the last instruction can cause the program to begin executing code in a different basic block.

Under these circumstances, whenever the first instruction in a basic block is executed, the rest of the instructions are necessarily executed exactly once and in order.^[4]^[5]

The code may be source code, assembly code, or some other sequence of instructions.

More formally, a sequence of instructions forms a basic block if:

The instruction in each position dominates (always executes before) all those in later positions.
No other instruction executes between two instructions in the sequence.

This definition is more general than the intuitive one in some ways. For example, it allows unconditional jumps to labels not targeted by other jumps. This definition embodies the properties that make basic blocks easy to work with when constructing an algorithm.

The blocks to which control may transfer after reaching the end of a block are called that block's successors, while the blocks from which control may have come when entering a block are called that block's predecessors. The start of a basic block may be jumped to from more than one location.

Creation algorithm

The algorithm for generating basic blocks from a listing of code is simple: the analyser scans over the code, marking block boundaries, which are instructions that may either begin or end a block because they either transfer control or accept control from another point. Then, the listing is simply "cut" at each of these points, and basic blocks remain.

Note that this method does not always generate maximal basic blocks, by the formal definition, but they are usually sufficient (maximal basic blocks are basic blocks that cannot be extended by including adjacent blocks without violating the definition of a basic block^[6]).

Input: A sequence of instructions (mostly three-address code).^[7]
Output: A list of basic blocks with each three-address instruction in exactly one block.

Identify the leaders in the code. Leaders are instructions that come under any of the following 3 categories:
1. It is the first instruction. The first instruction is a leader.
2. The target of a conditional or an unconditional goto/jump instruction is a leader.
3. The instruction that immediately follows a conditional or an unconditional goto/jump instruction is a leader.
Starting from a leader, the set of all following instructions until and not including the next leader is the basic block corresponding to the starting leader. Thus every basic block has a leader.

Instructions that end a basic block include the following:

unconditional and conditional branches, both direct and indirect;
returns to a calling procedure;
instructions that may throw an exception;
function calls can be at the end of a basic block if they can not return, such as functions that throw exceptions or special calls like C's longjmp and exit.

Instructions that begin a new basic block include the following:

procedure and function entry points;
targets of jumps or branches;
"fall-through" instructions following some conditional branches;
instructions following ones that throw exceptions;
exception handlers.

Note that, because control can never pass through the end of a basic block, some instructions may have to be modified to find the basic blocks. In particular, fall-through conditional branches must be changed to two-way branches, and function calls throwing exceptions must have unconditional jumps added after them. Doing these may require adding labels to the beginning of other blocks.

Related Research Articles

Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection (if/then/else) and repetition, block structures, and subroutines.

In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power consumption.

In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that Reese T. Prosser used boolean connectivity matrices for flow analysis before.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

A one-instruction set computer (OISC), sometimes referred to as an ultimate reduced instruction set computer (URISC), is an abstract machine that uses only one instruction – obviating the need for a machine language opcode. With a judicious choice for the single instruction and given arbitrarily many resources, an OISC is capable of being a universal computer in the same manner as traditional computers that have multiple instructions. OISCs have been recommended as aids in teaching computer architecture and have been used as computational models in structural computing research. The first carbon nanotube computer is a 1-bit one-instruction set computer.

In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.

<span class="mw-page-title-main">Conditional (computer programming)</span> Control flow statement that executes code according to some condition(s)

In computer science, conditionals are programming language commands for handling decisions. Specifically, conditionals perform different computations or actions depending on whether a programmer-defined Boolean condition evaluates to true or false. In terms of control flow, the decision is always achieved by selectively altering the control flow based on some condition . Although dynamic dispatch is not usually classified as a conditional construct, it is another way to select between alternatives at runtime. Conditional statements are the checkpoints in the programe that determines behaviour according to situation.

In computer programming, a return statement causes execution to leave the current subroutine and resume at the point in the code immediately after the instruction which called the subroutine, known as its return address. The return address is saved by the calling routine, today usually on the process's call stack or in a register. Return statements in many programming languages allow a function to specify a return value to be passed back to the code that called the function.

In computer programming, a statement is a syntactic unit of an imperative programming language that expresses some action to be carried out. A program written in such a language is formed by a sequence of one or more statements. A statement may have internal components.

In computer programming, COMEFROM is an obscure control flow structure used in some programming languages, originally as a joke. COMEFROM is the inverse of GOTO in that it can take the execution state from any arbitrary point in code to a COMEFROM statement.

A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. Branch may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals.

In computer programming, unreachable code is part of the source code of a program which can never be executed because there exists no control flow path to the code from the rest of the program.

In computer programming languages, a switch statement is a type of selection control mechanism used to allow the value of a variable or expression to change the control flow of program execution via search and map.

In computer programming, a branch table or jump table is a method of transferring program control (branching) to another part of a program using a table of branch or jump instructions. It is a form of multiway branch. The branch table construction is commonly used when programming in assembly language but may also be generated by compilers, especially when implementing optimized switch statements whose values are densely packed together.

setjmp.h is a header defined in the C standard library to provide "non-local jumps": control flow that deviates from the usual subroutine call and return sequence. The complementary functions setjmp and longjmp provide this functionality.

In computer science, definite assignment analysis is a data-flow analysis used by compilers to conservatively ensure that a variable or location is always assigned before it is used.

<span class="mw-page-title-main">Control table</span> Data structures that control the execution order of computer commands

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

<span class="mw-page-title-main">Goto</span> One-way control statement in computer programming

Goto is a statement found in many computer programming languages. It performs a one-way transfer of control to another line of code; in contrast a function call normally returns control. The jumped-to locations are usually identified using labels, though some languages use line numbers. At the machine code level, a goto is a form of branch or jump statement, in some cases combined with a stack adjustment. Many languages support the goto statement, and many do not.

Linear code sequence and jump (LCSAJ), in the broad sense, is a software analysis method used to identify structural units in code under test. Its primary use is with dynamic software analysis to help answer the question "How much testing is enough?". Dynamic software analysis is used to measure the quality and efficacy of software test data, where the quantification is performed in terms of structural units of the code under test. When used to quantify the structural units exercised by a given set of test data, dynamic analysis is also referred to as structural coverage analysis.

References

↑ Hennessy, John L.; David A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.
↑ Cooper, Keith Daniel; Torczon, Linda (2012). Engineering a compiler (2nd ed.). Amsterdam: Elsevier/Morgan Kaufmann. p. 231. ISBN 978-0120884780. OCLC 714113472.
↑ "Control Flow Analysis" by Frances E. Allen.
↑ Yousefi, Javad (2015). "Masking wrong-successor Control Flow Errors employing data redundancy". 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE. pp. 201–205. doi:10.1109/ICCKE.2015.7365827. ISBN 978-1-4673-9280-8.
↑ "Global Common Subexpression Elimination" by John Cocke.
↑ Modern Compiler Design by Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, and Koen G. Langendoen, p. 320.
↑ Compiler Principles, Techniques and Tools, Aho Sethi Ullman.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Hennessy, John L.; David A. Patterson. Computer architecture: a quantitative approach. Elsevier, 2011.

[2] Cooper, Keith Daniel; Torczon, Linda (2012). Engineering a compiler (2nd ed.). Amsterdam: Elsevier/Morgan Kaufmann. p. 231. ISBN 978-0120884780. OCLC 714113472.

[3] "Control Flow Analysis" by Frances E. Allen.

[4] Yousefi, Javad (2015). "Masking wrong-successor Control Flow Errors employing data redundancy". 2015 5th International Conference on Computer and Knowledge Engineering (ICCKE). IEEE. pp. 201–205. doi:10.1109/ICCKE.2015.7365827. ISBN 978-1-4673-9280-8.

[5] "Global Common Subexpression Elimination" by John Cocke.

[6] Modern Compiler Design by Dick Grune, Henri E. Bal, Ceriel J. H. Jacobs, and Koen G. Langendoen, p. 320.

[7] Compiler Principles, Techniques and Tools, Aho Sethi Ullman.

[1]

[2]

[3]

[4]

[5]

[6]

[7]