Branch (computer science)

Last updated December 15, 2024

A branch, jump or transfer is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order.^[a]Branch (or branching, branched) may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals (i.e., executing a particular sequence of instructions only if certain conditions are satisfied).

A branch instruction can be either an unconditional branch, which always results in branching, or a conditional branch, which may or may not cause branching depending on some condition. Also, depending on how it specifies the address of the new instruction sequence (the "target" address), a branch instruction is generally classified as direct, indirect or relative, meaning that the instruction contains the target address, or it specifies where the target address is to be found (e.g., a register or memory location), or it specifies the difference between the current and target addresses.

Implementation

Branch instructions can alter the contents of the CPU's program counter (PC) (or instruction pointer on Intel microprocessors). The program counter maintains the memory address of the next machine instruction to be fetched and executed. Therefore, a branch, if executed, causes the CPU to execute code from a new memory address, changing the program logic according to the algorithm planned by the programmer.

One type of machine level branch is the jump instruction. These may or may not result in the PC being loaded or modified with some new, different value other than what it ordinarily would have been (being incremented past the current instruction to point to the following, next instruction). Jumps typically have unconditional and conditional forms where the latter may be taken or not taken (the PC is modified or not) depending on some condition.

The second type of machine level branch is the call instruction which is used to implement subroutines. Like jump instructions, calls may or may not modify the PC according to condition codes, however, additionally a return address is saved in a secure place in memory (usually in a memory resident data structure called a stack). Upon completion of the subroutine, this return address is restored to the PC, and so program execution resumes with the instruction following the call instruction.

The third type of machine level branch is the return instruction. This "pops" a return address off the stack and loads it into the PC register, thus returning control to the calling routine. Return instructions may also be conditionally executed. This description pertains to ordinary practice; however, the machine programmer has considerable powers to manipulate the return address on the stack, and so redirect program execution in any number of different ways.

Depending on the processor, jump and call instructions may alter the contents of the PC register in different ways. An absolute address may be loaded, or the current contents of the PC may have some value (or displacement) added or subtracted from its current value, making the destination address relative to the current place in the program. The source of the displacement value may vary, such as an immediate value embedded within the instruction, or the contents of a processor register or memory location, or the contents of some location added to an index value.

The term branch can also be used when referring to programs in high-level programming languages. In these branches usually take the form of conditional statements of various forms that encapsulate the instruction sequence that will be executed if the conditions are satisfied. Unconditional branch instructions such as GOTO are used to unconditionally jump to a different instruction sequence. If the algorithm requires a conditional branch, the GOTO (or GOSUB subroutine call) is preceded by an IF-THEN statement specifying the condition(s). All high level languages support algorithms that can re-use code as a loop, a control structure that repeats a sequence of instructions until some condition is satisfied that causes the loop to terminate. Loops also qualify as branch instructions. At the machine level, loops are implemented as ordinary conditional jumps that redirect execution to repeating code.

In CPUs with flag registers, an earlier instruction sets a condition in the flag register. The earlier instruction may be arithmetic, or a logic instruction. It is often close to the branch, though not necessarily the instruction immediately before the branch. The stored condition is then used in a branch such as jump if overflow-flag set. This temporary information is often stored in a flag register but may also be located elsewhere. A flag register design is simple in slower, simple computers. In fast computers a flag register can place a bottleneck on speed, because instructions that could otherwise operate in parallel (in several execution units) need to set the flag bits in a particular sequence.

There are also machines (or particular instructions) where the condition may be checked by the jump instruction itself, such as branch <label> if register X negative. In simple computer designs, comparison branches execute more arithmetic and can use more power than flag register branches. In fast computer designs comparison branches can run faster than flag register branches, because comparison branches can access the registers with more parallelism, using the same CPU mechanisms as a calculation.

Some early and simple CPU architectures, still found in microcontrollers, may not implement a conditional jump, but rather only a conditional "skip the next instruction" operation. A conditional jump or call is thus implemented as a conditional skip of an unconditional jump or call instruction.

Examples

Depending on the computer architecture, the assembly language mnemonic for a jump instruction is typically some shortened form of the word jump or the word branch, often along with other informative letters (or an extra parameter) representing the condition. Sometimes other details are included as well, such as the range of the jump (the offset size) or a special addressing mode that should be used to locate the actual effective offset.

This table lists the machine level branch or jump instructions found in several well-known architectures:

condition or result	x86	PDP-11, VAX	ARM (partly 6502)	equation
zero (implies equal for sub/cmp)	JZ; JNZ	BEQ; BNE	BEQ; BNE	zero; not zero
negative (N), sign (S), or minus (M)	JS; JNS	BMI; BPL	BMI; BPL	negative; not negative
arithmetic overflow (flag called O or V)	JO; JNO	BVS; BVC	BVS; BVC	overflow; not overflow
carry (from add, cmp, shift, etc.)	JC; JNC	BCS; BCC	BCS; BCC	carry; not carry
unsigned below (lower)	JB	BLO	BLO ^*	borrow
unsigned below or equal (lower or same)	JBE	BLOS	BLS ^*	borrow or zero
unsigned above or equal (higher or same)	JAE	BHIS	BHS ^*	not borrow
unsigned above (higher)	JA	BHI	BHI ^*	not borrow and not zero
signed less than	JL	BLT	BLT	sign≠overflow
signed less or equal	JLE	BLE	BLE	(sign≠overflow) or zero
signed greater or equal	JGE	BGE	BGE	sign=overflow
signed greater than	JG	BGT	BGT	(sign=overflow) and not zero

^* x86, the PDP-11, VAX, and some others, set the carry-flag to signal borrow and clear the carry-flag to signal no borrow. ARM, 6502, the PIC, and some others, do the opposite for subtractive operations. This inverted function of the carry flag for certain instructions is marked by (^*), that is, borrow=not carry in some parts of the table, but if not otherwise noted, borrow≡carry. However, carry on additive operations are handled the same way by most architectures.

Performance problems with branch instructions

To achieve high performance, modern processors are pipelined. They consist of multiple parts that each partially process an instruction, feed their results to the next stage in the pipeline, and start working on the next instruction in the program. This design expects instructions to execute in a particular unchanging sequence. Conditional branch instructions make it impossible to know this sequence. So conditional branches can cause "stalls" in which the pipeline has to be restarted on a different part of the program.

Improving performance by reducing stalls from branches

Several techniques improve speed by reducing stalls from conditional branches.

Branch prediction hints

Historically, branch prediction took statistics, and used the result to optimize code. A programmer would compile a test version of a program, and run it with test data. The test code counted how the branches were actually taken. The statistics from the test code were then used by the compiler to optimize the branches of released code. The optimization would arrange that the fastest branch direction (taken or not) would always be the most frequently taken control flow path. To permit this, CPUs must be designed with (or at least have) predictable branch timing. Some CPUs have instruction sets (such as the Power ISA) that were designed with "branch hints" so that a compiler can tell a CPU how each branch is to be taken.

The problem with software branch prediction is that it requires a complex software development process.

Hardware branch predictors

To run any software, hardware branch predictors moved the statistics into the electronics. Branch predictors are parts of a processor that guess the outcome of a conditional branch. Then the processor's logic gambles on the guess by beginning to execute the expected instruction flow. An example of a simple hardware branch prediction scheme is to assume that all backward branches (i.e. to a smaller program counter) are taken (because they are part of a loop), and all forward branches (to a larger program counter) are not taken (because they leave a loop). Better branch predictors are developed and validated statistically by running them in simulation on a variety of test programs. Good predictors usually count the outcomes of previous executions of a branch. Faster, more expensive computers can then run faster by investing in better branch prediction electronics. In a CPU with hardware branch prediction, branch hints let the compiler's presumably superior branch prediction override the hardware's more simplistic branch prediction.

Branch-free code

Some logic can be written without branches or with fewer branches. It is often possible to use bitwise operations, conditional moves or other predication instead of branches.^[1]^[2] In fact, branch-free code is a must for cryptography due to timing attacks.^[3]

Delay slot

Another technique is a branch delay slot. In this approach, at least one instruction following a branch is always executed, with some exceptions such like the legacy MIPS architecture likely/unlikely branch instruction. Therefore, the computer can use this instruction to do useful work whether or not its pipeline stalls. This approach was historically popular in RISC computers. In a family of compatible CPUs, it complicates multicycle CPUs (with no pipeline), faster CPUs with longer-than-expected pipelines, and superscalar CPUs (which can execute instructions out of order.)

Notes

↑ At least conceptually; see out-of-order execution.

Related Research Articles

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the operation of the other units.

In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binary representation of a computer program which is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions.

The PDP-8 is a family of 12-bit minicomputers that was produced by Digital Equipment Corporation (DEC). It was the first commercially successful minicomputer, with over 50,000 units being sold over the model's lifetime. Its basic design follows the pioneering LINC but has a smaller instruction set, which is an expanded version of the PDP-5 instruction set. Similar machines from DEC are the PDP-12 which is a modernized version of the PDP-8 and LINC concepts, and the PDP-14 industrial controller system.

An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations—algorithms that transform code to produce semantically equivalent code optimized for some aspect.

In computer science, control flow is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an imperative programming language from a declarative programming language.

In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.

The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is a processor register that indicates where a computer is in its program sequence.

The Motorola 68000 series is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and were the primary competitors of Intel's x86 microprocessors. They were best known as the processors used in the early Apple Macintosh, the Sharp X68000, the Commodore Amiga, the Sinclair QL, the Atari ST and Falcon, the Atari Jaguar, the Sega Genesis and Sega CD, the Philips CD-i, the Capcom System I (Arcade), the AT&T UNIX PC, the Tandy Model 16/16B/6000, the Sun Microsystems Sun-1, Sun-2 and Sun-3, the NeXT Computer, NeXTcube, NeXTstation, and NeXTcube Turbo, early Silicon Graphics IRIS workstations, the Aesthedes, computers from MASSCOMP, the Texas Instruments TI-89/TI-92 calculators, the Palm Pilot, the Control Data Corporation CDCNET Device Interface, the VTech Precomputer Unlimited and the Space Shuttle. Although no modern desktop computers are based on processors in the 680x0 series, derivative processors are still widely used in embedded systems.

In computer architecture and engineering, a sequencer or microsequencer generates the addresses used to step through the microprogram of a control store. It is used as a part of the control unit of a CPU or as a stand-alone generator for address ranges.

In computer architecture, predication is a feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (predicated) non-branch instructions associated with a predicate, a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.

In computer science, self-modifying code is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, thus simplifying maintenance. The term is usually only applied to code where the self-modification is intentional, not in situations where code accidentally modifies itself due to an error such as a buffer overflow.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch will go before this is known definitively. The purpose of the branch predictor is to improve the flow in the instruction pipeline. Branch predictors play a critical role in achieving high performance in many modern pipelined microprocessor architectures.

Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions in that architecture identify the operand(s) of each instruction. An addressing mode specifies how to calculate the effective memory address of an operand by using information held in registers and/or constants contained within a machine instruction or elsewhere.

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array. Due to its modular structure, TTA is an ideal processor template for application-specific instruction set processors (ASIP) with customized datapath but without the inflexibility and design cost of fixed function hardware accelerators.

<span class="mw-page-title-main">Control table</span> Data structures that control the execution order of computer commands

Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some way through "execution" by a processor or interpreter. The design of such tables is sometimes referred to as table-driven design. In some cases, control tables can be specific implementations of finite-state-machine-based automata-based programming. If there are several hierarchical levels of control table they may behave in a manner equivalent to UML state machines

The Hack Computer is a theoretical computer design created by Noam Nisan and Shimon Schocken and described in their book, The Elements of Computing Systems: Building a Modern Computer from First Principles. In using the term “modern”, the authors refer to a digital, binary machine that is patterned according to the von Neumann architecture model.

<span class="mw-page-title-main">COP400</span> 4-bit microcontroller family

The COP400 or COP II is a 4-bit microcontroller family introduced in 1977 by National Semiconductor as a follow-on product to their original PMOS COP microcontroller. COP400 family members are complete microcomputers containing internal timing, logic, ROM, RAM, and I/O necessary to implement dedicated controllers. Some COP400 devices were second-sourced by Western Digital as the WD4200 family. In the Soviet Union several COP400 microcontrollers were manufactured as the 1820 series.

References

↑ Knuth, Donald (2008). The Art of Computer Programming . Vol. 4, Pre-fascicle 1A (Revision 6 ed.). pp. 48–49.
↑ "Avoiding Branches". Chessprogramming wiki.
↑ "Constant-Time Crypto". BearSSL.

External links

Free IA-32 and x86-64 documentation, provided by Intel
The PDP-11 FAQ
The ARM instruction set

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] At least conceptually; see out-of-order execution.

[2] Knuth, Donald (2008). The Art of Computer Programming . Vol. 4, Pre-fascicle 1A (Revision 6 ed.). pp. 48–49.

[3] "Avoiding Branches". Chessprogramming wiki.

[4] "Constant-Time Crypto". BearSSL.

[a]

[1]

[2]

[3]