Replay system

Last updated

The replay system is a subsystem within the Intel Pentium 4 processor. [1] Its primary function is to catch operations that have been mistakenly sent for execution by the processor's scheduler. Operations caught by the replay system are then re-executed in a loop until the conditions necessary for their proper execution have been fulfilled. [2]

Contents

Overview

The replay system came about as a result of Intel's quest for ever-increasing clock speeds. These higher clock speeds necessitated very lengthy pipelines (up to 31 stages in the Prescott core). Because of this, there are six stages between the scheduler and the execution units in the Prescott core. In an attempt to maintain acceptable performance, Intel engineers had to design the scheduler to be very optimistic. [2]

The scheduler in a Pentium 4 processor is so aggressive that it will send operations for execution without a guarantee that they can be successfully executed. (Among other things, the scheduler assumes all data is in level 1 "trace cache" CPU cache.) The most common reason execution fails is that the requisite data is not available, which itself is most likely due to a cache miss. When this happens, the replay system signals the scheduler to stop, then repeatedly executes the failed string of dependent operations until they have completed successfully. [2] [3]

Performance considerations

Not surprisingly, in some cases the replay system can have a very bad impact on performance. Under normal circumstances, the execution units in the Pentium 4 are in use roughly 33% of the time. When the replay system is invoked, it will occupy execution units nearly every available cycle. This wastes power, which is an increasingly important architectural design metric, but poses no performance penalty because the execution units would be sitting idle anyway. However, if hyper-threading is in use, the replay system will prevent the other thread from utilizing the execution units. This is the true cause of any performance degradation concerning hyper-threading. In Prescott, the Pentium 4 gained a replay queue, which reduces the time the replay system will occupy the execution units. [2]

In other cases, where each thread is processing different types of operations, the replay system will not interfere, and a performance increase can appear. This explains why performance with hyper-threading is application-dependent. [2]

See also

Related Research Articles

Central processing unit Central computer component which executes instructions

A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs).

Superscalar processor CPU that implements instruction-level parallelism within a single processor

A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor that can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows for more throughput than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor, but an execution resource within a single CPU such as an arithmetic logic unit.

Celeron Brand name by Intel

Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers.

IA-64 Instruction set architecture of the Itanium family of 64-bit Intel microprocessors

IA-64 is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was evolved and then implemented in a new processor microarchitecture by Intel with HP's continued partnership and expertise on the underlying EPIC design concepts. In order to establish what was their first new ISA in 20 years and bring an entirely new product line to market, Intel made a massive investment in product definition, design, software development tools, OS, software industry partnerships, and marketing. To support this effort Intel created the largest design team in their history and a new marketing and industry enabling team completely separate from x86. The first Itanium processor, codenamed Merced, was released in 2001.

Hyper-threading Proprietary simultaneous multithreading implementation by Intel

Hyper-threading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations performed on x86 microprocessors. It was introduced on Xeon server processors in February 2002 and on Pentium 4 desktop processors in November 2002. Since then, Intel has included this technology in Itanium, Atom, and Core 'i' Series CPUs, among others.

Pentium 4 Brand by Intel

Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 until May 21, 2010.

Pentium Pro Sixth-generation x86 microprocessor by Intel

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputers like ASCI Red, the first computer to reach the trillion floating point operations per second (teraFLOPS) performance mark. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998.

Tejas was a code name for Intel's microprocessor, which was to be a successor to the latest Pentium 4 with the Prescott core and was sometimes referred to as Pentium V. Jayhawk was a code name for its Xeon counterpart. The cancellation of the processors in May 2004 underscored Intel's historical transition of its focus on single-core processors to multi-core processors.

In computer science, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming instructions into a series of sequential steps performed by different processor units with different parts of instructions processed in parallel.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium 4, released on November 20, 2000 and the first of the Pentium 4 CPUs; all subsequent Pentium 4 and Pentium D variants have also been based on NetBurst. In mid-2004, Intel released the Foster core, which was also based on NetBurst, thus switching the Xeon CPUs to the new architecture as well. Pentium 4-based Celeron CPUs also use the NetBurst architecture.

Microarchitecture Component of computer engineering

In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be implemented with different microarchitectures; implementations may vary due to different goals of a given design or due to shifts in technology.

The Pentium D brand refers to two series of desktop dual-core 64-bit x86-64 microprocessors with the NetBurst microarchitecture, which is the dual-core variant of Pentium 4 "Prescott" manufactured by Intel. Each CPU comprised two dies, each containing a single core, residing next to each other on a multi-chip module package. The brand's first processor, codenamed Smithfield, was released by Intel on May 25, 2005. Nine months later, Intel introduced its successor, codenamed Presler, but without offering significant upgrades in design, still resulting in relatively high power consumption. By 2004, the NetBurst processors reached a clock speed barrier at 3.8 GHz due to a thermal limit exemplified by the Presler's 130 watt thermal design power. The future belonged to more energy efficient and slower clocked dual-core CPUs on a single die instead of two. The final shipment date of the dual die Presler chips was August 8, 2008, which marked the end of the Pentium D brand and also the NetBurst microarchitecture.

The megahertz myth, or in more recent cases the gigahertz myth, refers to the misconception of only using clock rate to compare the performance of different microprocessors. While clock rates are a valid way of comparing the performance of different speeds of the same model and type of processor, other factors such as an amount of execution units, pipeline depth, cache hierarchy, branch prediction, and instruction sets can greatly affect the performance when considering different processors. For example, one processor may take two clock cycles to add two numbers and another clock cycle to multiply by a third number, whereas another processor may do the same calculation in two clock cycles. Comparisons between different types of processors are difficult because performance varies depending on the type of task. A benchmark is a more thorough way of measuring and comparing computer performance.

The P6 microarchitecture is the sixth-generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is frequently referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M line of microprocessors. The successor to the Pentium M variant of the P6 microarchitecture is the Core microarchitecture which in turn is also derived from the P6 microarchitecture.

The Intel Core microarchitecture is a multi-core processor microarchitecture unveiled by Intel in Q1 2006. It is based on the Yonah processor design and can be considered an iteration of the P6 microarchitecture introduced in 1995 with Pentium Pro. High power consumption and heat intensity, the resulting inability to effectively increase clock rate, and other shortcomings such as an inefficient pipeline were the primary reasons why Intel abandoned the NetBurst microarchitecture and switched to a different architectural design, delivering high efficiency through a small pipeline rather than high clock rates. The Core microarchitecture initially did not reach the clock rates of the NetBurst microarchitecture, even after moving to 45 nm lithography. However after many generations of successor microarchitectures which used Core as their basis, Intel managed to eventually surpass the clock rates of Netburst with the Devil's Canyon microarchitecture reaching a base frequency of 4 GHz and a maximum tested frequency of 4.4 GHz using 22 nm lithography.

Pentium Brand of microprocessors produced by Intel

Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium was released in 1993. After that, the Pentium II and Pentium III were released.

Multithreading (computer architecture) Ability of a CPU to provide multiple threads of execution concurrently

In computer architecture, multithreading is the ability of a central processing unit (CPU) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB).

References

  1. Carmean, Doug (Spring 2002). "The Intel® Pentium® 4 Processor" (PDF).
  2. 1 2 3 4 5 Replay: Unknown Features of the NetBurst Core (2005-06-06). "Replay: Unknown Features of the NetBurst Core". X-bit labs. Archived from the original on 2014-04-08. Retrieved 2014-04-07.
  3. González, Antonio; Latorre, Fernando; Magklis, Grigorios (2010). Processor Microarchitecture: An Implementation Perspective. Morgan & Claypool Publishers. p. 68. ISBN   978-1-60845-452-5.