Instruction window

Last updated November 18, 2023

An instruction window in computer architecture refers to the set of instructions which can execute out-of-order in a speculative processor.

In particular, in a conventional design, the instruction window consists of all instructions which are in the re-order buffer (ROB).^[1] In such a processor, any instruction within the instruction window can be executed when its operands are ready. Out-of-order processors derive their name because this may occur out-of-order (if operands to a younger instruction are ready before those of an older instruction).

The instruction window has a finite size, and new instructions can enter the window (usually called dispatch or allocate) only when other instructions leave the window (usually called retire or commit). Instructions enter and leave the instruction window in program order, and an instruction can only leave the window when it is the oldest instruction in the window, and it has been completed. Hence, the instruction window can be seen as a sliding window in which the instructions can become out-of-order. All execution within the window is speculative (i.e., side-effects are not applied outside the CPU) until it is committed in order to support asynchronous exception handling like interrupts.

This paradigm is also known as restricted dataflow^[2] because instructions within the window execute in dataflow order (not necessarily in program order) but the window in which this occurs is restricted (of finite size).

The instruction window is distinct from pipelining: instructions in an in-order pipeline are not in an instruction window in the conventionally understood sense, because they cannot execute out of order with respect to one another. Out-of-order processors are usually built around pipelines, but many of the pipeline stages (e.g., front-end instruction fetch and decode stages) are not considered to be part of the instruction window.

Related Research Articles

A central processing unit (CPU)—also called a central processor or main processor—is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the operation of the other units.

The Pentium is a fifth generation, 32-bit x86 microprocessor that was introduced by Intel on March 22, 1993, as the very first CPU in the Pentium brand. It was instruction set compatible with the 80486 but was a new and very different microarchitecture design from previous iterations. The P5 Pentium was the first superscalar x86 microarchitecture and the world's first superscalar microprocessor to be in mass production—meaning it generally executes at least 2 instructions per clock mainly because of a design-first dual integer pipeline design previously thought impossible to implement on a CISC microarchitecture. Additional features include a faster floating-point unit, wider data bus, separate code and data caches, and many other techniques and features to enhance performance and support security, encryption, and multiprocessing, for workstations and servers when compared to the next best previous industry standard processor implementation before it, the Intel 80486.

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units mostly allow programs to specify instructions to execute in sequence only, a VLIW processor allows programs to explicitly specify instructions to execute in parallel. This design is intended to allow higher performance without the complexity inherent in some other designs.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing the work after it is known that it is needed. If it turns out the work was not needed after all, most changes made by the work are reverted and the results are ignored.

Tomasulo's algorithm is a computer architecture hardware algorithm for dynamic scheduling of instructions that allows out-of-order execution and enables more efficient use of multiple execution units. It was developed by Robert Tomasulo at IBM in 1967 and was first implemented in the IBM System/360 Model 91’s floating point unit.

In the domain of central processing unit (CPU) design, hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle, and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control hazards.

In computer architecture, register renaming is a technique that abstracts logical registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instruction refers to a particular logical register, the processor transposes this name to one specific physical register on the fly. The physical registers are opaque and cannot be referenced directly but only via the canonical names.

In computer engineering, out-of-order execution is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.

Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution may be hard to predict.

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

A register file is an array of processor registers in a central processing unit (CPU). Register banking is the method of using a single name to access multiple different physical registers depending on the operating mode. Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports.

<span class="mw-page-title-main">Binary Modular Dataflow Machine</span>

Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applications. BMDFM automatically identifies and exploits parallelism due to the static and mainly dynamic scheduling of the dataflow instruction sequences derived from the formerly sequential program.

The history of general-purpose CPUs is a continuation of the earlier history of computing hardware.

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

Explicit data graph execution, or EDGE, is a type of instruction set architecture (ISA) which intends to improve computing performance compared to common processors like the Intel x86 line. EDGE combines many individual instructions into a larger group known as a "hyperblock". Hyperblocks are designed to be able to easily run in parallel.

Latency oriented processor architecture is the microarchitecture of a microprocessor designed to serve a serial computing thread with a low latency. This is typical of most central processing units (CPU) being developed since the 1970s. These architectures, in general, aim to execute as many instructions as possible belonging to a single serial thread, in a given window of time; however, the time to execute a single instruction completely from fetch to retire stages may vary from a few cycles to even a few hundred cycles in some cases. Latency oriented processor architectures are the opposite of throughput-oriented processors which concern themselves more with the total throughput of the system, rather than the service latencies for all individual threads that they work on.

References

↑ Shen and Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill, 2005.
↑ Patt et al., HPS, A New Microarchitecture: Rationale and Introduction. In IEEE International Symposium on Microarchitecture (MICRO), 1985.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Shen and Lipasti. Modern Processor Design: Fundamentals of Superscalar Processors. McGraw-Hill, 2005.

[2] Patt et al., HPS, A New Microarchitecture: Rationale and Introduction. In IEEE International Symposium on Microarchitecture (MICRO), 1985.

[1]

[2]

Instruction window

See also

Related Research Articles

References