Reservation station

Last updated
Reservation station as part of Intel's Nehalem microarchitecture Intel Nehalem arch.svg
Reservation station as part of Intel's Nehalem microarchitecture

A unified reservation station, also known as unified scheduler, is a decentralized feature of the microarchitecture of a CPU that allows for register renaming, and is used by the Tomasulo algorithm for dynamic instruction scheduling.

Contents

Reservation stations permit the CPU to fetch and re-use a data value as soon as it has been computed, rather than waiting for it to be stored in a register and re-read. When instructions are issued, they can designate the reservation station from which they want their input to read. When multiple instructions need to write to the same register, all can proceed and only the (logically) last one need actually be written. It checks if the operands are available (RAW) and if execution unit is free (Structural hazard) before starting execution.

Instructions are stored with available parameters, and executed when ready. Results are identified by the unit that will execute the corresponding instruction. Implicitly register renaming solves WAR and WAW hazards. Since this is a fully associative structure, it has a very high cost in comparators (need to compare all results returned from processing units with all stored addresses).

In Tomasulo's algorithm, instructions are issued in sequence to Reservation Stations which buffer the instruction as well as the operands of the instruction. If the operand is not available, the Reservation Station listens on a Common Data Bus for the operand to become available. When the operand becomes available, the Reservation Station buffers it, and the execution of the instruction can begin.

Functional Units (such as an adder or a multiplier), each have their own corresponding Reservation Stations. The output of the Functional Unit connects to the Common Data Bus, where Reservation Stations are listening for the operands they need.

Bibliography

Related Research Articles

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the operation of the other units.

In computer science, an instruction set architecture (ISA) is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an implementation.

<span class="mw-page-title-main">Intel 8085</span> 8-bit microprocessor by Intel

The Intel 8085 ("eighty-eighty-five") is an 8-bit microprocessor produced by Intel and introduced in March 1976. It is software-binary compatible with the more-famous Intel 8080 with only two minor instructions added to support its added interrupt and serial input/output features. However, it requires less support circuitry, allowing simpler and less expensive microcomputer systems to be built. The "5" in the part number highlighted the fact that the 8085 uses a single +5-volt (V) power supply by using depletion-mode transistors, rather than requiring the +5 V, −5 V and +12 V supplies needed by the 8080. This capability matched that of the competing Z80, a popular 8080-derived CPU introduced the year before. These processors could be used in computers running the CP/M operating system.

A re-order buffer (ROB) is a hardware unit used in an extension to the Tomasulo algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order.

Tomasulo's algorithm is a computer architecture hardware algorithm for dynamic scheduling of instructions that allows out-of-order execution and enables more efficient use of multiple execution units. It was developed by Robert Tomasulo at IBM in 1967 and was first implemented in the IBM System/360 Model 91’s floating point unit.

In the domain of central processing unit (CPU) design, hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle, and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control hazards.

In computer architecture, register renaming is a technique that abstracts logical registers from physical registers. Every logical register has a set of physical registers associated with it. When a machine language instruction refers to a particular logical register, the processor transposes this name to one specific physical register on the fly. The physical registers are opaque and cannot be referenced directly but only via the canonical names.

In computer science, computer engineering and programming language implementations, a stack machine is a computer processor or a virtual machine in which the primary interaction is moving short-lived temporary values to and from a push down stack. In the case of a hardware processor, a hardware stack is used. The use of a stack significantly reduces the required number of processor registers. Stack machines extend push-down automata with additional load/store operations or multiple stacks and hence are Turing-complete.

<span class="mw-page-title-main">Instruction cycle</span> Basic operation cycle of a computer

The instruction cycle is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions. It is composed of three main stages: the fetch stage, the decode stage, and the execute stage.

<span class="mw-page-title-main">Intel 8087</span> Floating-point microprocessor made by Intel

The Intel 8087, announced in 1980, was the first floating-point coprocessor for the 8086 line of microprocessors. The purpose of the chip was to speed up floating-point arithmetic operations, such as addition, subtraction, multiplication, division, and square root. It also computes transcendental functions such as exponential, logarithmic or trigonometric calculations. The performance enhancements were from approximately 20% to over 500%, depending on the specific application. The 8087 could perform about 50,000 FLOPS using around 2.4 watts.

In computer engineering, out-of-order execution is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.

Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and execution of instructions is solely determined based on the availability of input arguments to the instructions, so that the order of instruction execution may be hard to predict.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array. Due to its modular structure, TTA is an ideal processor template for application-specific instruction set processors (ASIP) with customized datapath but without the inflexibility and design cost of fixed function hardware accelerators.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

Scoreboarding is a centralized method, first used in the CDC 6600 computer, for dynamically scheduling instructions so that they can execute out of order when there are no conflicts and the hardware is available.

<span class="mw-page-title-main">Arithmetic logic unit</span> Combinational digital circuit

In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numbers. It is a fundamental building block of many types of computing circuits, including the central processing unit (CPU) of computers, FPUs, and graphics processing units (GPUs).

Kepler is the codename for a GPU microarchitecture developed by Nvidia, first introduced at retail in April 2012, as the successor to the Fermi microarchitecture. Kepler was Nvidia's first microarchitecture to focus on energy efficiency. Most GeForce 600 series, most GeForce 700 series, and some GeForce 800M series GPUs were based on Kepler, all manufactured in 28 nm. Kepler also found use in the GK20A, the GPU component of the Tegra K1 SoC, as well as in the Quadro Kxxx series, the Quadro NVS 510, and Nvidia Tesla computing modules. Kepler was followed by the Maxwell microarchitecture and used alongside Maxwell in the GeForce 700 series and GeForce 800M series.

The Hack Computer is a theoretical computer design created by Noam Nisan and Shimon Schocken and described in their book, The Elements of Computing Systems: Building a Modern Computer from First Principles.  In using the term “modern”, the authors refer to a digital, binary machine that is patterned according to the von Neumann architecture model.

References