Peephole optimization is an optimization technique performed on a small set of compiler-generated instructions, known as a peephole or window, [1] [2] that involves replacing the instructions with a logically equivalent set that has better performance.
For example:
x << 1
The term peephole optimization was introduced by William Marshall McKeeman in 1965. [3]
Peephole optimization replacements include but are not limited to: [4]
Modern compilers often implement peephole optimizations with a pattern matching algorithm. [5]
The following Java bytecode:
aload 1 aload 1 mul
can be replaced with the following which executes faster:
aload 1 dup mul
As for most peephole optimizations, this is based on the relative efficiency of different instructions. In this case, dup
(which duplicates and pushes the top of the stack) is known/assumed to be more efficient than aload
(which loads a local variable and pushes it onto the stack).
The following source code:
a = b + c; d = a + e;
is straightforwardly compiled to:
MOVb,R0; Copy b to the registerADDc,R0; Add c to the register, the register is now b+cMOVR0,a; Copy the register to aMOVa,R0; Copy a to the registerADDe,R0; Add e to the register, the register is now a+e [(b+c)+e]MOVR0,d; Copy the register to d
but can be optimized to:
MOVb,R0; Copy b to the registerADDc,R0; Add c to the register, which is now b+c (a)MOVR0,a; Copy the register to aADDe,R0; Add e to the register, which is now b+c+e [(a)+e]MOVR0,d; Copy the register to d
If the compiler saves registers on the stack before calling a subroutine and restores them when returning, consecutive calls to subroutines may have redundant stack instructions.
Suppose the compiler generates the following Z80 instructions for each procedure call:
PUSHAFPUSHBCPUSHDEPUSHHLCALL_ADDRPOPHLPOPDEPOPBCPOPAF
If there were two consecutive subroutine calls, they would look like this:
PUSHAFPUSHBCPUSHDEPUSHHLCALL_ADDR1POPHLPOPDEPOPBCPOPAFPUSHAFPUSHBCPUSHDEPUSHHLCALL_ADDR2POPHLPOPDEPOPBCPOPAF
The sequence POP regs followed by PUSH for the same registers is generally redundant. In cases where it is redundant, a peephole optimization would remove these instructions. In the example, this would cause another redundant POP/PUSH pair to appear in the peephole, and these would be removed in turn. Assuming that subroutine _ADDR2 does not depend on previous register values, removing all of the redundant code in the example above would eventually leave the following code:
PUSHAFPUSHBCPUSHDEPUSHHLCALL_ADDR1CALL_ADDR2POPHLPOPDEPOPBCPOPAF
In computer programming, assembly language, often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence between the instructions in the language and the architecture's machine code instructions. Assembly language usually has one statement per machine instruction (1:1), but constants, comments, assembler directives, symbolic labels of, e.g., memory locations, registers, and macros are generally also supported.
The Intel 8080 ("eighty-eighty") is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1974 and is an extended and enhanced variant of the earlier 8008 design, although without binary compatibility. Although earlier microprocessors were commonly used in mass-produced devices such as calculators, cash registers, computer terminals, industrial robots, and other applications, the 8080 saw greater success in a wider set of applications, and is largely credited with starting the microcomputer industry.
An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of optimizing transformations—algorithms that transform code to produce semantically equivalent code optimized for some aspect.
In computer science, threaded code is a programming technique where the code has a form that essentially consists entirely of calls to subroutines. It is often used in compilers, which may generate code in that form or be implemented in that form themselves. The code may be processed by an interpreter or it may simply be a sequence of machine code call instructions.
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation of that ISA.
The Intel MCS-51 is a single-chip microcontroller (MCU) series developed by Intel in 1980 for use in embedded systems. The architect of the Intel MCS-51 instruction set was John H. Wharton. Intel's original versions were popular in the 1980s and early 1990s, and enhanced binary compatible derivatives remain popular today. It is a complex instruction set computer with separate memory spaces for program instructions and data.
The Intel 8085 ("eighty-eighty-five") is an 8-bit microprocessor produced by Intel and introduced in March 1976. It is the last 8-bit microprocessor developed by Intel.
In computing, code generation is part of the process chain of a compiler, in which an intermediate representation of source code is converted into a form that can be readily executed by the target system.
x86 assembly language is a family of low-level programming languages that are used to produce object code for the x86 class of processors. These languages provide backward compatibility with CPUs dating back to the Intel 8008 microprocessor, introduced in April 1972. As assembly languages, they are closely tied to the architecture's machine code instructions, allowing for precise control over hardware.
In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.
Loop unrolling, also known as loop unwinding, is a loop transformation technique that attempts to optimize a program's execution speed at the expense of its binary size, which is an approach known as space–time tradeoff. The transformation can be undertaken manually by the programmer or by an optimizing compiler. On modern processors, loop unrolling is often counterproductive, as the increased code size can cause more cache misses; cf. Duff's device.
In computer science, a tail call is a subroutine call performed as the final action of a procedure. If the target of a tail is the same subroutine, the subroutine is said to be tail recursive, which is a special case of direct recursion. Tail recursion is particularly useful, and is often easy to optimize in implementations.
In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This type of stack is also known as an execution stack, program stack, control stack, run-time stack, or machine stack, and is often shortened to simply the "stack". Although maintenance of the call stack is important for the proper functioning of most software, the details are normally hidden and automatic in high-level programming languages. Many computer instruction sets provide special instructions for manipulating stacks.
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been taken for where and how parameters are passed to that function, and where and how results are returned from that function, with these transfers typically done via certain registers or within a stack frame on the call stack. There are design choices for how the tasks of preparing for a function call and restoring the environment after the function has completed are divided between the caller and the callee. Some calling convention specifies the way every function should get called. The correct calling convention should be used for every function call, to allow the correct and reliable execution of the whole program using these functions.
This article describes the calling conventions used when programming x86 architecture microprocessors.
A stack register is a computer central processor register whose purpose is to keep track of a call stack. On an accumulator-based architecture machine, this may be a dedicated register. On a machine with multiple general-purpose registers, it may be a register that is reserved by convention, such as on the IBM System/360 through z/Architecture architecture and RISC architectures, or it may be a register that procedure call and return instructions are hardwired to use, such as on the PDP-11, VAX, and Intel x86 architectures. Some designs such as the Data General Eclipse had no dedicated register, but used a reserved hardware memory address for this function.
In computing, a compiler is a computer program that transforms source code written in a programming language or computer language, into another computer language. The most common reason for transforming source code is to create an executable program.
The PDP-11 architecture is a 16-bit CISC instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). It is implemented by central processing units (CPUs) and microprocessors used in PDP-11 minicomputers. It was in wide use during the 1970s, but was eventually overshadowed by the more powerful VAX architecture in the 1980s.
In computer programming, a function is a callable unit of software logic that has a well-defined interface and behavior and can be invoked multiple times.
The WD16 is a 16-bit microprocessor introduced by Western Digital in October 1976. It is based on the MCP-1600 chipset, a general-purpose design that was also used to implement the DEC LSI-11 low-end minicomputer and the Pascal MicroEngine processor. The three systems differed primarily in their microcode, giving each system a unique instruction set architecture (ISA).
The dictionary definition of peephole optimization at Wiktionary