Alpha 21264

Last updated
Alpha 21264 microarchitecture. Alpha 21264.png
Alpha 21264 microarchitecture.

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

Contents

Description

The Alpha 21264 is a four-issue superscalar microprocessor with out-of-order execution and speculative execution. It has a peak execution rate of six instructions per cycle and could sustain four instructions per cycle. It has a seven-stage instruction pipeline.

Out of order execution

At any given stage, the microprocessor could have up to 80 instructions in various stages of execution, surpassing any other contemporary microprocessor.

Decoded instructions are held in instruction queues and are issued when their operands are available. The integer queue contained 20 entries and the floating-point queue 15. Each queue could issue as many instructions as there were pipelines.

Ebox

The Ebox executes integer, load and store instructions. It has two integer units, two load store units and two integer register files. Each integer register file contained 80 entries, of which 32 are architectural registers, 40 are rename registers and 8 are PAL shadow registers. There was no entry for register R31 because in the Alpha architecture, R31 is hardwired to zero and is read-only.

Each register file served an integer unit and a load store unit, and the register file and its two units are referred to as a "cluster". The two clusters were designated U0 and U1. This scheme was used as it reduced the number of write and read ports required to serve operands and receive results, thus reducing the physical size of the register file, enabling the microprocessor to operate at higher clock frequencies. Writes to any of the register files thus have to be synchronized, which required a clock cycle to complete, negatively impacting performance by one percent. The reduction of performance resulting from the synchronization was compensated in two ways. Firstly, the higher clock frequency achievable offset the loss. Secondly, the logic responsible for instruction issue avoided creating situations where the register file had to be synchronized by issuing instructions that were not dependent on data held in other register file where possible.

The clusters are near identical except for two differences: U1 has a seven-cycle pipelined multiplier while U0 has a three-cycle pipeline for executing Motion Video Instructions (MVI), an extension to the Alpha Architecture defining single instruction multiple data (SIMD) instructions for multimedia.

The load store units are simple arithmetic logic units used to calculate virtual addresses for memory access. They are also capable of executing simple arithmetic and logic instructions. The Alpha 21264 instruction issue logic utilized this capability, issuing instructions to these units when they were available for use (not performing address arithmetic).

The Ebox therefore has four 64-bit adders, four logic units, two barrel shifters, byte-manipulation logic, two sets of conditional branch logic equally divided between U1 and U0.

Fbox

The Fbox is responsible for executing floating-point instructions. It consists of two floating-point pipelines and a floating-point register file. The pipelines are not identical, one executes the majority of instructions and the other only multiply instructions. The adder pipeline has two non-pipelined units connected to it, a divide unit and a square root unit. Adds, multiplies and most other instructions have a 4-cycle latency, a double-precision divide has 16-cycle latency and a double-precision square root has a 33-cycle latency. The floating point register file contains 72 entries, of which 32 are architectural registers and 40 are rename registers.

Cache

The Alpha 21264 has two levels of cache, a primary cache and secondary cache. The level three (L3, or "victim") cache of the Alpha 21164 was not used due to problems with bandwidth.

Primary caches

The primary cache is split into separate caches for instructions and data ("modified Harvard architecture"), the I-cache and D-cache, respectively. Both caches have a capacity of 64 KB. The D-cache is dual-ported by transferring data on both the rising and falling edges of the clock signal. This method of dual-porting enabled any combination of reads or writes to the cache every processor cycle. It also avoided duplication the cache so there are two, as in the Alpha 21164. Duplicating the cache restricted the capacity of the cache, as it required more transistors to provide the same amount of capacity, and in turn increased the area required and power consumed.

B-cache

The secondary cache, termed the B-cache, is an external cache with a capacity of 1 to 16 MB. It is controlled by the microprocessor and is implemented by synchronous static random access memory (SSRAM) chips that operate at two thirds, half, one-third or one-fourth the internal clock frequency, or 133 to 333 MHz at 500 MHz. The B-cache was accessed with a dedicated 128-bit bus that operates at the same clock frequency as the SSRAM or at twice the clock frequency if double data rate SSRAM is used. The B-cache is direct-mapped. [1]

Branch prediction

Branch prediction is performed by a tournament branch prediction algorithm. The algorithm was developed by Scott McFarling at Digital's Western Research Laboratory (WRL) and was described in a 1993 paper. This predictor was used as the Alpha 21264 has a minimum branch misprediction penalty of seven cycles. Due to the instruction cache's two cycle latency and the instruction queues, the average branch misprediction penalty is 11 cycles. The algorithm maintains two history tables, Local and Global, and the table used to predict the outcome of a branch is determined by a Choice predictor.

The local predictor is a two-level table which records the history of individual branches. It consists of a 1,024-entry by 10-bit branch history table. A two-level table was used as the prediction accuracy is similar to that of a larger single-level table while requiring fewer bits of storage. It has a 1,024-entry branch prediction table. Each entry is a 3-bit saturating counter. The value of the counter determines whether the current branch is taken or not taken.

The global predictor is a single-level, 4096-entry branch history table. Each entry is a 2-bit saturating counter; the value of this counter determines whether the current branch is taken or not taken.

The choice predictor records the history of the local and global predictors to determine which predictor is the best for a particular branch. It has a 4,096-entry branch history table. Each entry is a 2-bit saturating counter. The value of the counter determines if the local or global predictor is used.

External interface

The external interface consisted of a bidirectional 64-bit double data rate (DDR) data bus and two 15-bit unidirectional time-multiplexed address and control buses, one for signals originating from the Alpha 21264 and one for signals originating from the system. Digital licensed the bus to Advanced Micro Devices (AMD), and it was subsequently used in their Athlon microprocessors, where it was known as the EV6 bus.

Memory addressing

Alpha 21264 CPU supports 48-bit or 43-bit virtual address (256 TiB or 8 TiB virtual address space respectively), selectable under IPR control (using VA_CTL control register). Alpha 21264 supports a 44-bit physical address (up to 16 TiB of physical memory). This is an increase from previous Alpha CPUs (43-bit virtual and 40-bit physical for Alpha 21164, and 43-bit virtual and 34-bit physical for Alpha 21064). [2]

Fabrication

The Alpha 21264 contained 15.2 million transistors. The logic consisted of approximately six million transistors, with the rest contained in the caches and branch history tables. The die measured 16.7 mm by 18.8 mm (313.96 mm²). [3] It was fabricated in a 0.35 μm complementary metaloxidesemiconductor (CMOS) process with six levels of interconnect.

Packaging

The Alpha 21264 was packaged in a 587-pin ceramic interstitial pin grid array (IPGA).

Alpha Processor, Inc. later sold the Alpha 21264 in a Slot B package containing the microprocessor mounted on a printed circuit board with the B-cache and voltage regulators. The design was intended to use the success of slot-based microprocessors from Intel and AMD. Slot B was originally developed to be used by AMD's Athlon as well, so that API could obtain materials for the Slot B at commodity prices in order to reduce the cost of the Alpha 21264 to gain a wider market share. This never materialized as AMD chose to use Slot A for their slot-based Athlons.

Derivatives

Alpha 21264A

Alpha 21264A DEC Alpha 21264A (EV67) (8875557248) (cropped).jpg
Alpha 21264A

The Alpha 21264A, code-named EV67 was a shrink of the Alpha 21264 introduced in late 1999. There were six versions: 600, 667, 700, 733, 750, 833 MHz. The EV67 was the first Alpha microprocessor to implement the count extension (CIX), which extended the instruction set with instructions for performing population count. It was fabricated by Samsung Electronics in a 0.25 μm CMOS process that had 0.25 μm transistors but 0.35 μm metal layers. The die had an area of 210 mm². The EV68 used a 2.0 V power supply. It dissipated a maximum of 73 W at 600 MHz, 80 W at 667 MHz, 85 W at 700 MHz, 88 W at 733 MHz and 90 W at 750 MHz.

Alpha 21264B

The Alpha 21264B is a further development for increased clock frequencies. There were two models, one fabricated by IBM, code-named EV68C, and one by Samsung, code-named EV68A.

The EV68A was fabricated in a 0.18 μm CMOS process with aluminium interconnects. It had a die size of 125 mm², a third smaller than the Alpha 21264A, and used a 1.7 V power supply. It was available in volume in 2001 at clock frequencies of 750, 833, 875 and 940 MHz. The EV68A dissipated a maximum of 60 W at 750 MHz, 67 W at 833 MHz, 70 W at 875 MHz and 75 W at 940 MHz. [4]

The EV68C was fabricated in a 0.18 μm CMOS process with copper interconnects. It was sampled in early 2000 and achieved a maximum clock frequency of 1.25 GHz.

In September 1998, Samsung announced they would fabricate a variant of the Alpha 21264B in a 0.18 μm fully depleted silicon-on-insulator (SOI) process with copper interconnects that was capable of achieving a clock frequency of 1.5 GHz. This version never materialized.

Alpha 21264C

The Alpha 21264C, code-named EV68CB was a derivative of the Alpha 21264. It was available at clock frequencies of 1.0, 1.25 and 1.33 GHz. The EV68CB contained 15.5 million transistors and measured 120 mm². It was fabricated by IBM in a 0.18 μm CMOS process with seven levels of copper interconnect and low-K dielectric. It was packaged in a 675-pad flip-chip ceramic land grid array (CLGA) measuring 49.53 by 49.53 mm. The EV68CB used a 1.7 V power supply, dissipating a maximum of 64 W at 1.0 GHz, 75 W at 1.25 GHz and 80 W at 1.33 GHz. [5]

Alpha 21264D

The Alpha 21264D, code-named EV68CD is a faster derivative fabricated by IBM.

Alpha 21264E

The Alpha 21264E, code-named EV68E, was a cancelled derivative developed by Samsung first announced on 10 October 2000 at Microprocessor Forum 2000 slated for introduction at around mid-2001. Improvements were a higher operating frequency of 1.25 GHz and the addition of an on-die 1.85 MB secondary cache. It was to be fabricated in a 0.18 micrometre CMOS process with copper interconnects.

Chipsets

Digital and Advanced Micro Devices (AMD) both developed chipsets for the Alpha 21264.

21272/21274

The Digital 21272, also known as the Tsunami, and the 21274, also known as the Typhoon, were the first chipset for the Alpha 21264. The 21272 chipset supported one- or two-way multiprocessing and up to 8GB of memory, while the 21274 supported one-, two-, three- or four-way multiprocessing, up to 64GB of memory, and both supported one or two 64-bit 33 MHz PCI buses. They had 128- to 512-bit memory bus which operated at 83 MHz, yielding a maximum bandwidth of 5,312 MB/s. The chipset supported 100 MHz registered ECC SDRAM.

The chipset consisted of three devices, a C-chip, a D-chip and a P-chip. The number of devices which made up the chipset varied as it was determined by the configuration of the chipset. The C-chip is the control chip containing the memory controller. One C-chip was required for every microprocessor.

The P-chip is the PCI controller, implementing a 33 MHz PCI bus. The 21272 could have one or two P-chips.

The D-chip is the DRAM controller, implementing access to/from the CPUs, and to/from the P-chip. The 21272 could have two or four D-chips and the 21274 could have two, four, or eight D-chips.

The 21272 and 21274 were used extensively by Digital, Compaq and Hewlett Packard in their entry-level to mid-range AlphaServers and in all models of the AlphaStation. It was also used in third-party products from Alpha Processor, Inc. (later known as API NetWorks) such as their UP2000+ motherboard.

Irongate

AMD developed two Alpha 21264-compatible chipsets, the Irongate, also known as the AMD-751, and its successor, Irongate-2, also known as the AMD-761. These chipsets were developed for their Athlon microprocessors but due to AMD licensing the EV6 bus used in the Alpha from Digital, the Athlon and Alpha 21264 were compatible in terms of bus protocol. The Irongate was used by Samsung in their UP1000 and UP1100 motherboards. The Irongate-2 was used by Samsung in their UP1500 motherboard.

See also

Notes

  1. The Alpha 21264 Microprocessor Architecture, p. 5.
  2. "Alpha 21264 Microprocessor Data Sheet" (PDF). Compaq Computer Corporation. Retrieved 2020-06-03.
  3. Gronowski, "High Performance Microprocessor Design", p. 676.
  4. Compaq, "21264/EV68A Microprocessor Hardware Reference Manual".
  5. Compaq, "21264/EV68CB and 21264/EV68DC Hardware Reference Manual".

Related Research Articles

<span class="mw-page-title-main">Athlon</span> Brand name for several AMD processors

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon was the first seventh-generation x86 processor and the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop SoC architecture, and Socket AM4 Zen microarchitecture. The modern Zen-based Athlon with a Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor.

<span class="mw-page-title-main">DEC Alpha</span> 64-bit RISC instruction set architecture

Alpha is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers (CISC) and to be a highly competitive RISC processor for Unix workstations and similar markets.

<span class="mw-page-title-main">Pentium (original)</span> Intel microprocessor

The Pentium is a fifth generation, 32-bit x86 microprocessor that was introduced by Intel on March 22, 1993, as the very first CPU in the Pentium brand. It was instruction set compatible with the 80486 but was a new and very different microarchitecture design from previous iterations. The P5 Pentium was the first superscalar x86 microarchitecture and the world's first superscalar microprocessor to be in mass production—meaning it generally executes at least 2 instructions per clock mainly because of a design-first dual integer pipeline design previously thought impossible to implement on a CISC microarchitecture. Additional features include a faster floating-point unit, wider data bus, separate code and data caches, and many other techniques and features to enhance performance and support security, encryption, and multiprocessing, for workstations and servers when compared to the next best previous industry standard processor implementation before it, the Intel 80486.

<span class="mw-page-title-main">Pentium III</span> Line of desktop and mobile microprocessors produced by Intel

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile CPUs based on the sixth-generation P6 microarchitecture introduced on February 28, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded processors. The most notable differences were the addition of the Streaming SIMD Extensions (SSE) instruction set, and the introduction of a controversial serial number embedded in the chip during manufacturing. The Pentium III is also a single-core processor.

<span class="mw-page-title-main">Pentium Pro</span> Sixth-generation x86 microprocessor by Intel

The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture and was originally intended to replace the original Pentium in a full range of applications. While the Pentium and Pentium MMX had 3.1 and 4.5 million transistors, respectively, the Pentium Pro contained 5.5 million transistors. Later, it was reduced to a more narrow role as a server and high-end desktop processor and was used in supercomputers like ASCI Red, the first computer to reach the trillion floating point operations per second (teraFLOPS) performance mark. The Pentium Pro was capable of both dual- and quad-processor configurations. It only came in one form factor, the relatively large rectangular Socket 8. The Pentium Pro was succeeded by the Pentium II Xeon in 1998.

<span class="mw-page-title-main">Transmeta Crusoe</span>

The Transmeta Crusoe is a family of x86-compatible microprocessors developed by Transmeta and introduced in 2000.

SPARC64 is a microprocessor developed by HAL Computer Systems and fabricated by Fujitsu. It implements the SPARC V9 instruction set architecture (ISA), the first microprocessor to do so. SPARC64 was HAL's first microprocessor and was the first in the SPARC64 brand. It operates at 101 and 118 MHz. The SPARC64 was used exclusively by Fujitsu in their systems; the first systems, the Fujitsu HALstation Model 330 and Model 350 workstations, were formally announced in September 1995 and were introduced in October 1995, two years late. It was succeeded by the SPARC64 II in 1996.

<span class="mw-page-title-main">POWER3</span> 1998 family of microprocessors by IBM

The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture (ISA), including all of the optional instructions of the ISA such as instructions present in the POWER2 version of the POWER ISA but not in the PowerPC ISA. It was introduced on 5 October 1998, debuting in the RS/6000 43P Model 260, a high-end graphics workstation. The POWER3 was originally supposed to be called the PowerPC 630 but was renamed, probably to differentiate the server-oriented POWER processors it replaced from the more consumer-oriented 32-bit PowerPCs. The POWER3 was the successor of the P2SC derivative of the POWER2 and completed IBM's long-delayed transition from POWER to PowerPC, which was originally scheduled to conclude in 1995. The POWER3 was used in IBM RS/6000 servers and workstations at 200 MHz. It competed with the Digital Equipment Corporation (DEC) Alpha 21264 and the Hewlett-Packard (HP) PA-8500.

<span class="mw-page-title-main">CVAX</span>

The CVAX is a microprocessor chipset developed and fabricated by Digital Equipment Corporation (DEC) that implemented the VAX instruction set architecture (ISA). The chipset consisted of the CVAX 78034 CPU, CFPA floating-point accelerator, CVAX clock chip, and the associated support chips, the CVAX System Support Chip (CSSC), CVAX Memory Controller (CMCTL), and CVAX Q-Bus Interface Chip (CQBIC).

<span class="mw-page-title-main">R10000</span> MIPS microprocessor

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

<span class="mw-page-title-main">R4000</span>

The R4000 is a microprocessor developed by MIPS Computer Systems that implements the MIPS III instruction set architecture (ISA). Officially announced on 1 October 1991, it was one of the first 64-bit microprocessors and the first MIPS III implementation. In the early 1990s, when RISC microprocessors were expected to replace CISC microprocessors such as the Intel i486, the R4000 was selected to be the microprocessor of the Advanced Computing Environment (ACE), an industry standard that intended to define a common RISC platform. ACE ultimately failed for a number of reasons, but the R4000 found success in the workstation and server markets.

<span class="mw-page-title-main">R5000</span>

The R5000 is a 64-bit, little endian (mipsel) superscalar, in-order execution 2-issue design microprocessor, that implements the MIPS IV instruction set architecture (ISA) developed by Quantum Effect Design (QED) in 1996. The project was funded by MIPS Technologies, Inc (MTI), also the licensor. MTI then licensed the design to Integrated Device Technology (IDT), NEC, NKK, and Toshiba. The R5000 succeeded the QED R4600 and R4700 as their flagship high-end embedded microprocessor. IDT marketed its version of the R5000 as the 79RV5000, NEC as VR5000, NKK as the NR5000, and Toshiba as the TX5000. The R5000 was sold to PMC-Sierra when the company acquired QED. Derivatives of the R5000 are still in production today for embedded systems.

The PowerPC 600 family was the first family of PowerPC processors built. They were designed at the Somerset facility in Austin, Texas, jointly funded and staffed by engineers from IBM and Motorola as a part of the AIM alliance. Somerset was opened in 1992 and its goal was to make the first PowerPC processor and then keep designing general purpose PowerPC processors for personal computers. The first incarnation became the PowerPC 601 in 1993, and the second generation soon followed with the PowerPC 603, PowerPC 604 and the 64-bit PowerPC 620.

<span class="mw-page-title-main">Alpha 21064</span>

The Alpha 21064 is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced as the DECchip 21064 before it was renamed in 1994. The 21064 is also known by its code name, EV4. It was announced in February 1992 with volume availability in September 1992. The 21064 was the first commercial implementation of the Alpha ISA, and the first microprocessor from Digital to be available commercially. It was succeeded by a derivative, the Alpha 21064A in October 1993. This last version was replaced by the Alpha 21164 in 1995.

<span class="mw-page-title-main">Alpha 21164</span>

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

The Alpha 21364, code-named "Marvel", also known as EV7 is a microprocessor developed by Digital Equipment Corporation (DEC), later Compaq Computer Corporation, that implemented the Alpha instruction set architecture (ISA).

<span class="mw-page-title-main">PA-8000</span>

The PA-8000 (PCX-U), code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard (HP) that implemented the PA-RISC 2.0 instruction set architecture (ISA). It was a completely new design with no circuitry derived from previous PA-RISC microprocessors. The PA-8000 was introduced on 2 November 1995 when shipments began to members of the Precision RISC Organization (PRO). It was used exclusively by PRO members and was not sold on the merchant market. All follow-on PA-8x00 processors are based on the basic PA-8000 processor core.

<span class="mw-page-title-main">UltraSPARC III</span> Microprocessor developed by Sun Microsystems

The UltraSPARC III, code-named "Cheetah", is a microprocessor that implements the SPARC V9 instruction set architecture (ISA) developed by Sun Microsystems and fabricated by Texas Instruments. It was introduced in 2001 and operates at 600 to 900 MHz. It was succeeded by the UltraSPARC IV in 2004. Gary Lauterbach was the chief architect.

The Alpha 21464 is an unfinished microprocessor that implements the Alpha instruction set architecture (ISA) developed by Digital Equipment Corporation and later by Compaq after it acquired Digital. The microprocessor was also known as EV8. Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of Itanium by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been taped out.

<span class="mw-page-title-main">R4600</span>

The R4600, code-named "Orion", is a microprocessor developed by Quantum Effect Design (QED) that implemented the MIPS III instruction set architecture (ISA). As QED was a design firm that did not fabricate or sell their designs, the R4600 was first licensed to Integrated Device Technology (IDT), and later to Toshiba and then NKK. These companies fabricated the microprocessor and marketed it. The R4600 was designed as a low-end workstation or high-end embedded microprocessor. Users included Silicon Graphics, Inc. (SGI) for their Indy workstation and DeskStation Technology for their Windows NT workstations. The R4600 was instrumental in making the Indy successful by providing good integer performance at a competitive price. In embedded systems, prominent users included Cisco Systems in their network routers and Canon in their printers.

References

Further reading