Slot (computer architecture)

Last updated

A slot comprises the operation issue and data path machinery surrounding a set of one or more execution unit (also called a functional unit (FU)) which share these resources. [1] [2] The term slot is common for this purpose in very long instruction word (VLIW) computers, where the relationship between operation in an instruction and pipeline to execute it is explicit. In dynamically scheduled machines, the concept is more commonly called an execute pipeline.

Modern conventional central processing units (CPU) have several compute pipelines, for example: two arithmetic logic units (ALU), one floating point unit (FPU), one Streaming SIMD Extensions (SSE) (such as MMX), one branch. Each of them can issue one instruction per basic instruction cycle, but can have several instructions in process. These are what correspond to slots. The pipelines may have several FUs, such as an adder and a multiplier, but only one FU in a pipeline can be issued to in a given cycle. The FU population of a pipeline (slot) is a design option in a CPU.

Related Research Articles

<span class="mw-page-title-main">Central processing unit</span> Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor, or just processor, is the most important processor in a given computer. Its electronic circuitry executes instructions of a computer program, such as arithmetic, logic, controlling, and input/output (I/O) operations. This role contrasts with that of external components, such as main memory and I/O circuitry, and specialized coprocessors such as graphics processing units (GPUs).

The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the operation of the other units.

In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer, also known as its machine code. It consists of a set of hardware-level instructions that implement the higher-level machine code instructions or control internal finite-state machine sequencing in many digital processing components. While microcode is utilized in Intel and AMD general-purpose CPUs in contemporary desktops and laptops, it functions only as a fallback path for scenarios that the faster hardwired control unit is unable to manage.

<span class="mw-page-title-main">Floating-point unit</span> Part of a computer system

A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multiplication, division, and square root. Some FPUs can also perform various transcendental functions such as exponential or trigonometric calculations, but the accuracy can be low, so some systems prefer to compute these functions in software.

<span class="mw-page-title-main">Superscalar processor</span> CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction-level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

<span class="mw-page-title-main">CDC 6600</span> Mainframe computer by Control Data

The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM 7030 Stretch, by a factor of three. With performance of up to three megaFLOPS, the CDC 6600 was the world's fastest computer from 1964 to 1969, when it relinquished that status to its successor, the CDC 7600.

The Intel i860 is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of the 1980s. It was the world's first million-transistor chip. It was released with considerable fanfare, slightly obscuring the earlier Intel i960, which was successful in some niches of embedded systems. The i860 never achieved commercial success and the project was terminated in the mid-1990s.

Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with flexible hardware platforms like field-programmable gate arrays (FPGAs). The principal difference when compared to using ordinary microprocessors is the ability to add custom computational blocks using FPGAs. On the other hand, the main difference from custom hardware, i.e. application-specific integrated circuits (ASICs) is the possibility to adapt the hardware during runtime by "loading" a new circuit on the reconfigurable fabric, thus providing new computational blocks without the need to manufacture and add new chips to the existing system.

In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction on a RISC or DSP architecture; this instruction will execute even if the preceding branch is taken. This makes the instruction execute out-of-order compared to its location in the original assembler language code.

In the history of computer hardware, some early reduced instruction set computer central processing units used a very similar architectural solution, now called a classic RISC pipeline. Those CPUs were: MIPS, SPARC, Motorola 88000, and later the notional CPU DLX invented for education.

Execution in computer and software engineering is the process by which a computer or virtual machine interprets and acts on the instructions of a computer program. Each instruction of a program is a description of a particular action which must be carried out, in order for a specific problem to be solved. Execution involves repeatedly following a "fetch–decode–execute" cycle for each instruction done by the control unit. As the executing machine follows the instructions, specific effects are produced in accordance with the semantics of those instructions.

<span class="mw-page-title-main">Instruction cycle</span> Basic operation cycle of a computer

The instruction cycle is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instructions. It is composed of three main stages: the fetch stage, the decode stage, and the execute stage.

In computer engineering, out-of-order execution is a paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

A branch, jump or transfer is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. Branch may also refer to the act of switching execution to a different instruction sequence as a result of executing a branch instruction. Branch instructions are used to implement control flow in program loops and conditionals.

In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements.

<span class="mw-page-title-main">Microarchitecture</span> Component of computer engineering

In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

In computer architecture, a transport triggered architecture (TTA) is a kind of processor design in which programs directly control the internal transport buses of a processor. Computation happens as a side effect of data transports: writing data into a triggering port of a functional unit triggers the functional unit to start a computation. This is similar to what happens in a systolic array. Due to its modular structure, TTA is an ideal processor template for application-specific instruction set processors (ASIP) with customized datapath but without the inflexibility and design cost of fixed function hardware accelerators.

References

  1. Majer, Mateusz; Teich, Jürgen; Ahmadinia, Ali; Bobda, Christophe (2007-04-01). "The Erlangen Slot Machine: A Dynamically Reconfigurable FPGA-based Computer". The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology. 47 (1): 15–31. doi:10.1007/s11265-006-0017-6. ISSN   1573-109X. S2CID   15629606.
  2. Ye, Zhi Alex; Moshovos, Andreas; Hauck, Scott; Banerjee, Prithviraj (2000-05-01). "CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit". ACM SIGARCH Computer Architecture News. 28 (2): 225–235. doi:10.1145/342001.339687. ISSN   0163-5964.