POWER3

Last updated
POWER3
POWER3.jpg
POWER 3 microprocessor
General information
Launched1998
Designed byIBM
Architecture and classification
Instruction set PowerPC
History
Predecessor POWER2
Successor POWER4
Dual 375 MHz IBM POWER3-II processors on the CPU module of a RS/6000 44P 270. POWER3-board.jpg
Dual 375 MHz IBM POWER3-II processors on the CPU module of a RS/6000 44P 270.

The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture (ISA), including all of the optional instructions of the ISA (at the time) such as instructions present in the POWER2 version of the POWER ISA but not in the PowerPC ISA. It was introduced on 5 October 1998, debuting in the RS/6000 43P Model 260, a high-end graphics workstation. [1] The POWER3 was originally supposed to be called the PowerPC 630 but was renamed, probably to differentiate the server-oriented POWER processors it replaced from the more consumer-oriented 32-bit PowerPCs. The POWER3 was the successor of the P2SC derivative of the POWER2 and completed IBM's long-delayed transition from POWER to PowerPC, which was originally scheduled to conclude in 1995. The POWER3 was used in IBM RS/6000 servers and workstations at 200 MHz. It competed with the Digital Equipment Corporation (DEC) Alpha 21264 and the Hewlett-Packard (HP) PA-8500.

Contents

Description

The logic schema of the POWER3 processor Power3 chip schema.png
The logic schema of the POWER3 processor

The POWER3 was based on the PowerPC 620, an earlier 64-bit PowerPC implementation that was late, under-performing and commercially unsuccessful. Like the PowerPC 620, the POWER3 has three fixed-point units, but the single floating-point unit (FPU) was replaced with two floating-point fused multiply–add units, and an extra load-store unit was added (for a total of two) to improve floating-point performance. The POWER3 is a superscalar design that executed instructions out of order. It has a seven-stage integer pipeline, a minimal eight-stage load/store pipeline and a ten-stage floating-point pipeline.

The front end consists of two stages: fetch and decode. During the first stage, eight instructions were fetched from a 32 KB instruction cache and placed in a 12-entry instruction buffer. During the second stage, four instructions were taken from the instruction buffer, decoded, and issued to instruction queues. Restrictions on instruction issue are few: of the two integer instruction queues, only one can accept one instruction, the other can accept up to four, as does the floating-point instruction queue. If the queues do not have enough unused entries, instructions cannot be issued. The front end has a short pipeline, resulting in a small three-cycle branch misprediction penalty.

In stage three, instructions in the instruction queues that are ready for execution have their operands read from the register files. The general-purpose register file contains 48 registers, of which 32 are general-purpose registers and 16 are rename registers for register renaming. To reduce the number of ports required to provide data and receive results, the general purpose register file is duplicated so that there are two copies, the first supporting three integer execution units and the second supporting the two load/store units. This scheme was similar to a contemporary microprocessor, the DEC Alpha 21264, but was simpler as it did not require an extra clock cycle to synchronize the two copies due to the POWER3's higher cycle times. The floating-point register file contains 56 registers, of which 32 are floating-point registers and 24 rename registers. Compared to the PowerPC 620, there were more rename registers, which allowed more instructions to be executed out of order, improving performance.

Execution begins in stage four. The instruction queues dispatch up to eight instructions to the execution units. Integer instructions are executed in three integer execution units (termed "fixed-point units" by IBM). Two of the units are identical and execute all integer instructions except for multiply and divide. All instructions executed by them have a one-cycle latency. The third unit executes multiply and divide instructions. These instructions are not pipelined and have multi-cycle latencies. 64-bit multiply has a nine-cycle latency and 64-bit divide has a 37-cycle latency.

Floating-point instructions are executed in two floating-point units (FPUs). The FPUs are capable of fused multiply–add, where multiplication and addition is performed simultaneously. Such instructions, along with individual add and multiply, have a four-cycle latency. Divide and square-root instructions are executed in the same FPUs, but are assisted by specialized hardware. Single-precision (32-bit) divide and square-root instructions have a 14-cycle latency, whereas double-precision (64-bit) divide and square-root instructions have an 18-cycle and a 22-cycle latency, respectively.

After execution is completed, the instructions are held in buffers before being committed and made visible to software. Execution finishes in stage five for integer instructions and stage eight for floating-point. Committing occurs during stage six for integers, stage nine for floating-point. Writeback occurs in the stage after commit. The POWER3 can retire up to four instructions per cycle.

The PowerPC 620 data cache was optimized for technical and scientific applications. Its capacity was doubled to 64 KB, to improve the cache-hit rate; the cache was dual-ported, implemented by interleaving eight banks, to enable two loads or two stores to be performed in one cycle in certain cases; and the line-size was increased to 128-bytes. The L2 cache bus was doubled in width to 256 bits to compensate for the larger cache line size and to retain a four-cycle latency for cache refills.

The POWER3 contained 15 million transistors on a 270 mm2 die. It was fabricated in IBM's CMOS-6S2 process, a complementary metal–oxide–semiconductor process that is a hybrid of 0.25 μm feature sizes and 0.35 μm metal layers. The process features five layers of aluminium. It was packaged in the same 1,088-column ceramic column grid array as the P2SC, but with a different pin out.

POWER3-II

POWER3-II KL IBM Power3 II.jpg
POWER3-II

The POWER3-II was an improved POWER3 that increased the clock frequency to 450 MHz. It contains 23 million transistors and measured 170 mm2. It was fabricated in the IBM CMOS7S process, a 0.22 μm CMOS process with six levels of copper interconnect. It was succeeded by the POWER4 in 2001.

See also

Notes

  1. New IBM POWER3 chip.

Related Research Articles

<span class="mw-page-title-main">UltraSPARC</span> Microprocessor developed by Sun Microsystems

The UltraSPARC is a microprocessor developed by Sun Microsystems and fabricated by Texas Instruments, introduced in mid-1995. It is the first microprocessor from Sun to implement the 64-bit SPARC V9 instruction set architecture (ISA). Marc Tremblay was a co-microarchitect.

In computer engineering, out-of-order execution is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently.

<span class="mw-page-title-main">Emotion Engine</span>

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

The POWER1 is a multi-chip CPU developed and fabricated by IBM that implemented the POWER instruction set architecture (ISA). It was originally known as the RISC System/6000 CPU or, when in an abbreviated form, the RS/6000 CPU, before introduction of successors required the original name to be replaced with one that used the same naming scheme (POWERn) as its successors in order to differentiate it from the newer designs.

SPARC64 is a microprocessor developed by HAL Computer Systems and fabricated by Fujitsu. It implements the SPARC V9 instruction set architecture (ISA), the first microprocessor to do so. SPARC64 was HAL's first microprocessor and was the first in the SPARC64 brand. It operates at 101 and 118 MHz. The SPARC64 was used exclusively by Fujitsu in their systems; the first systems, the Fujitsu HALstation Model 330 and Model 350 workstations, were formally announced in September 1995 and were introduced in October 1995, two years late. It was succeeded by the SPARC64 II in 1996.

<span class="mw-page-title-main">RISC Single Chip</span>

The RISC Single Chip, or RSC, is a single-chip microprocessor developed and fabricated by International Business Machines (IBM). The RSC was a feature-reduced single-chip implementation of the POWER1, a multi-chip central processing unit (CPU) which implemented the POWER instruction set architecture (ISA). It was used in entry-level workstation models of the IBM RS/6000 family, such as the Model 220 and 230.

<span class="mw-page-title-main">R10000</span> MIPS microprocessor

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

<span class="mw-page-title-main">R4000</span>

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

<span class="mw-page-title-main">R5000</span>

The R5000 is a 64-bit, bi-endian, superscalar, in-order execution 2-issue design microprocessor, that implements the MIPS IV instruction set architecture (ISA) developed by Quantum Effect Design (QED) in 1996. The project was funded by MIPS Technologies, Inc (MTI), also the licensor. MTI then licensed the design to Integrated Device Technology (IDT), NEC, NKK, and Toshiba. The R5000 succeeded the QED R4600 and R4700 as their flagship high-end embedded microprocessor. IDT marketed its version of the R5000 as the 79RV5000, NEC as VR5000, NKK as the NR5000, and Toshiba as the TX5000. The R5000 was sold to PMC-Sierra when the company acquired QED. Derivatives of the R5000 are still in production today for embedded systems.

The R8000 is a microprocessor chipset developed by MIPS Technologies, Inc. (MTI), Toshiba, and Weitek. It was the first implementation of the MIPS IV instruction set architecture. The R8000 is also known as the TFP, for Tremendous Floating-Point, its name during development.

The PowerPC e200 is a family of 32-bit Power ISA microprocessor cores developed by Freescale for primary use in automotive and industrial control systems. The cores are designed to form the CPU part in system-on-a-chip (SoC) designs with speed ranging up to 600 MHz, thus making them ideal for embedded applications.

<span class="mw-page-title-main">Gekko (processor)</span> CPU for the GameCube

Gekko is a superscalar out-of-order 32-bit PowerPC microprocessor custom-made by IBM in 2000 for Nintendo to use as the CPU in their sixth generation game console, the GameCube, and later the Triforce Arcade Board.

<span class="mw-page-title-main">Alpha 21064</span>

The Alpha 21064 is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced as the DECchip 21064 before it was renamed in 1994. The 21064 is also known by its code name, EV4. It was announced in February 1992 with volume availability in September 1992. The 21064 was the first commercial implementation of the Alpha ISA, and the first microprocessor from Digital to be available commercially. It was succeeded by a derivative, the Alpha 21064A in October 1993. This last version was replaced by the Alpha 21164 in 1995.

<span class="mw-page-title-main">Alpha 21164</span>

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

<span class="mw-page-title-main">Alpha 21264</span>

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

<span class="mw-page-title-main">PA-8000</span>

The PA-8000 (PCX-U), code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard (HP) that implemented the PA-RISC 2.0 instruction set architecture (ISA). It was a completely new design with no circuitry derived from previous PA-RISC microprocessors. The PA-8000 was introduced on 2 November 1995 when shipments began to members of the Precision RISC Organization (PRO). It was used exclusively by PRO members and was not sold on the merchant market. All follow-on PA-8x00 processors are based on the basic PA-8000 processor core.

<span class="mw-page-title-main">TurboSPARC</span>

The TurboSPARC is a microprocessor that implements the SPARC V8 instruction set architecture (ISA) developed by Fujitsu Microelectronics, Inc. (FMI), the United States subsidiary of the Japanese multinational information technology equipment and services company Fujitsu Limited located in San Jose, California. It was a low-end microprocessor primarily developed as an upgrade for the Sun Microsystems microSPARC-II-based SPARCstation 5 workstation. It was introduced on 30 September 1996, with a 170 MHz version priced at US$499 in quantities of 1,000. The TurboSPARC was mostly succeeded in the low-end SPARC market by the UltraSPARC IIi in late 1997, but remained available.

<span class="mw-page-title-main">R4600</span>

The R4600, code-named "Orion", is a microprocessor developed by Quantum Effect Design (QED) that implemented the MIPS III instruction set architecture (ISA). As QED was a design firm that did not fabricate or sell their designs, the R4600 was first licensed to Integrated Device Technology (IDT), and later to Toshiba and then NKK. These companies fabricated the microprocessor and marketed it. The R4600 was designed as a low-end workstation or high-end embedded microprocessor. Users included Silicon Graphics, Inc. (SGI) for their Indy workstation and DeskStation Technology for their Windows NT workstations. The R4600 was instrumental in making the Indy successful by providing good integer performance at a competitive price. In embedded systems, prominent users included Cisco Systems in their network routers and Canon in their printers.

IBM POWER is a reduced instruction set computer (RISC) instruction set architecture (ISA) developed by IBM. The name is an acronym for Performance Optimization With Enhanced RISC.

The Power Processing Element (PPE) comprises a Power Processing Unit (PPU) and a 512 KB L2 cache. In most instances the PPU is used in a PPE. The PPU is a 64-bit dual-threaded in-order PowerPC 2.02 microprocessor core designed by IBM for use primarily in the game consoles PlayStation 3 and Xbox 360, but has also found applications in high performance computing in supercomputers such as the record setting IBM Roadrunner.

References