Alpha 21364

Last updated

The Alpha 21364, code-named "Marvel", also known as EV7 is a microprocessor developed by Digital Equipment Corporation (DEC), later Compaq Computer Corporation, that implemented the Alpha instruction set architecture (ISA).

Contents

History

The Alpha 21364 was revealed in October 1998 by Compaq at the 11th Annual Microprocessor Forum, where it was described as an Alpha 21264 with a 1.5 MB 6-way set-associative on-die secondary cache, an integrated Direct Rambus DRAM memory controller and an integrated network controller for connecting to other microprocessors. Changes to the Alpha 21264 core included a larger victim buffer, which was quadrupled in capacity to 32 entries, 16 for the Dcache and 16 for the Scache. It was reported by the Microprocessor Report that Compaq considered implementing minor changes to branch predictor to improve branch prediction accuracy and doubling the miss buffer in capacity to 16 entries instead of 8 in the Alpha 21264. [1]

It was expected to be taped-out in late 1999, with samples available in early 2000 and volume shipments in late 2000. However, the original schedule was delayed, with the tape-out in April 2001 instead of late 1999. [2] The Alpha 21364 was introduced on 20 January 2002 when systems using the microprocessor debuted. It operated at 1.25 GHz, but production models in the AlphaServer ES47, ES80 and GS1280 operated at 1.0 GHz or 1.15 GHz. Unlike previous Alpha microprocessors, the Alpha 21364 was not sold on the open market.[ citation needed ]

The Alpha 21364 was originally intended to be succeeded by the Alpha 21464, code-named EV8, a new implementation of the Alpha ISA with four-way simultaneous multithreading (SMT). [3] It was first presented in October 1999 at the 12th Annual Microprocessor Forum, [3] but was cancelled on 25 June 2001 at a late stage of development. [4]

Development

The development of the Alpha 21364 was most focused on features that would improve memory performance and multiprocessor scalability. The focus on memory performance was the result of a forward-looking article published in Microprocessor Report titled, "It's the Memory, Stupid!" written by Richard L. Sites, who co-led the definition of the Alpha architecture. [5] The article concluded that, "Over the coming decade, memory subsystem design will be the only important design issue for microprocessors."

Description

The Alpha 21364 was an Alpha 21264 with a 1.75 MB on-die secondary cache, two integrated memory controllers and an integrated network controller.

Core

The Alpha 21364's core is based on the EV68CB, a derivative of the Alpha 21264. The only modification was a larger victim buffer, now quadrupled in capacity to 32 entries. The 32 entries of victim buffer is divided equally into 16 entries each for the Dcache and Scache. Although the Alpha 21364 is a fourth-generation implementation of the Alpha Architecture, aside from this modification, the core is otherwise identical to the EV68CB derivative of the Alpha 21264. [6]

Scache

The secondary cache (termed "Scache") is a unified cache with a capacity of 1.75 MB. It is 7-way set associative, uses a 64-byte line size, and has a write-back policy. The cache is protected by single-bit error correction, double-bit error detection (SECDED) error-correcting code (ECC). It is connected to the cache controller by a 128-bit data path. Access to the cache is fully pipelined, yielding a sustainable bandwidth of 16 GB/s at 1.0 GHz.

The time required for data requested from the cache to when it can be used is 12 cycles. [7] The 12-cycle latency was considered by observers, such as the Microprocessor Report, to be significant. The latency of the Scache was not reduced further as it would have not improved performance. The Alpha 21264 core upon which the Alpha 21364 was based on was designed to use an external cache built from commodity SRAM, which has a significantly higher latency than the on-die Scache of the Alpha 21364. Thus, it could only accept data at a limited rate. Once improving latency saw no further gains, the designers focused on reducing the power consumed by the Scache. [8] Compaq was not willing to remedy this deficiency as it would have required the Alpha 21264 core to be modified significantly. [9] The high latency of the Scache permitted the cache tags be looked up first to determine if the Scache contained the requested data and in which bank it was located in before powering up the Scache bank and accessing it. This avoided unproductive Scache accesses, reducing power consumption.

The tag store consisted of 5.75 million transistors and data store of 108 million transistors. [8]

Memory controller

The Alpha 21364 has two integrated memory controllers that support Rambus DRAM (RDRAM) that operate at two thirds of the microprocessor's clock frequency, or 800 MHz at 1.2 GHz. Compaq designed custom memory controllers for the Alpha 21364, giving them capabilities not found in standard RDRAM memory controllers such as having all the 128 pages open, reducing the access latency to those pages; and proprietary fault-tolerant features.

Each memory controller provides five RDRAM channels that support PC800 Rambus inline memory modules (RIMMs). Four of the channels are used to provide memory, while the fifth is used to provide RAID-like redundancy. [7] Each channel is 16 bits wide, operates at 400 MHz and transfers data on both the rising and falling edges of the clock signal (double data rate) for a transfer rate of 800 MT/s, yielding 1.6 GB/s of bandwidth. The total memory bandwidth of the eight channels is 12.8 GB/s.

Cache coherence is provided by the memory controllers. Each memory controller has a cache coherence engine. The Alpha 21364 uses a directory cache coherence scheme where part of the memory is used to store Modified, Exclusive, Shared, Invalid (MESI) coherency data.

R-box

The R-box contains the network router. The network router connected the microprocessor to other microprocessors using four ports named North, South, East and West. Each port consisted of two 39-bit unidirectional links operating at 800 MHz. 32 bits were for data and 7 bits were for ECC. The network router also has a fifth port, used for I/O. This port connects to an IO7 application-specific integrated circuit (ASIC), which was a bridge to an AGP 4x channel and two PCI-X buses. The I/O port consisted of two unidirectional 32-bit links operating at 200 MHz, yielding a peak bandwidth of 3.2 GB/s. The I/O port link operated at a quarter of the clock frequency to simplify the design of the I/O ASIC.

The Alpha 21364 can connect to as many as 127 other microprocessors using two network topologies: shuffle and an 2D torus. The shuffle topology had more direct paths to other microprocessors, reducing latency and therefore improving performance, but was limited to connecting up to eight microprocessors as a result of its nature. The 2D torus topology enabled the network to feature up to 128 microprocessors.

In multiprocessing systems, each microprocessor is a node with its own memory. Accessing the memory of other nodes is possible, but with a latency. The latency increases with distance, thus the Alpha 21364 implements non-uniform memory access (NUMA) multiprocessing. I/O is also distributed in an identical fashion. An Alpha 21364 microprocessor in a multiprocessing system did not have to have its RIMM slots populated with memory or its I/O port populated with devices. It could use another microprocessor's memory and I/O.

Fault tolerance

The Alpha 21364 could operate in lock-step for fault-tolerant computers. [10] This feature was a result in Compaq's decision to migrate Tandem's Himalaya fault-tolerant servers from the MIPS architecture to Alpha. The machines however never used the microprocessor as the decision to phase out the Alpha in favor of the Itanium was made before the availability of the Alpha 21364.

Fabrication

The Alpha 21364 contained 152 million transistors. The die measured 21.1 mm by 18.8 mm for an area of 397 mm2. It was fabricated by International Business Machines (IBM) in their 0.18 μm, seven-level copper complementary metaloxidesemiconductor (CMOS) process. It was packaged in a 1,443-land flip-chip land grid array (LGA). [2] It used a 1.65 V power supply and a 1.5 V external interface for a maximum power dissipation of 155 W at 1.25 GHz.

Alpha 21364A

The Alpha 21364A, code-named EV79, previously EV78, was a further development of the Alpha 21364. It was intended to be the last Alpha microprocessor developed. Scheduled to be introduced in 2004, it was cancelled on 23 October 2003, with HP cited performance and schedule issues as reasons. A replacement, the EV7z was announced on the same day.

A prototype of the microprocessor was presented by Hewlett-Packard at the International Solid-State Circuits Conference in February 2003. It operated at 1.45 GHz, had a die area of 251 mm2, used a 1.2 V power supply, and dissipated 100 W (estimated). [11]

The Alpha 21364A was to have improved upon the Alpha 21364 by featuring higher clock frequencies in the range of ~1.6 to ~1.7 GHz and support for 1066 Mbit/s RDRAM memory. It was to be fabricated by IBM in their 0.13 μm silicon on insulator (SOI) process. As a result of the more advanced process, there were reductions in die size, power supply voltage (1.2 V compared to 1.65 V), and in power consumption and dissipation.

EV7z

The EV7z was a further development of the Alpha 21364. It was the last Alpha microprocessor developed and introduced. The EV7z became known on 23 October 2003 when HP announced they had cancelled the Alpha 21364A and would be replacing it with the EV7z. [12] The EV7z was introduced on 16 August 2004 when the only computer using the microprocessor, AlphaServer GS1280, was introduced. It was discontinued on 27 April 2007 when the computer it was featured in was discontinued. It operated at 1.3 GHz, supported PC1066 RIMMs and was fabricated in the same 0.18 μm process as the Alpha 21364. Compared to the Alpha 21364, the EV7z was 14 to 16 percent faster, but was still slower than the Alpha 21364A it replaced, which was estimated to outperform the Alpha 21364 by 25 percent at 1.5 GHz.

Notes

  1. "Alpha 21364 to Ease Memory Bottleneck", p. 2.
  2. 1 2 "Alpha 21364 (EV7)", p. 2.
  3. 1 2 "Compaq Chooses SMT for Alpha"
  4. "Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor", p. 1.
  5. Sites, Richard (5 August 1996). "It's the Memory, Stupid!". Microprocessor Report. 10 (10). S2CID   6293956.
  6. Compiler Writer's Guide for the 21264/21364, p. 1-4.
  7. 1 2 Compiler Writer's Guide for the 21264/21364, p. 1-5
  8. 1 2 "Power and CAD considerations for the 1.75 Mbyte, 1.2 GHz L2 cache on the Alpha 21364 CPU"
  9. "Alpha 21364 to Ease Memory Bottleneck", p. 3.
  10. "Alpha 21364 (EV7)"
  11. "Moore, Moore, and More at ISSCC", p. 3.
  12. "HP is Dealt a Delay in its HP-UX OS and Alpha Processor Roadmap"

Related Research Articles

<span class="mw-page-title-main">DEC Alpha</span> 64-bit RISC instruction set architecture

Alpha is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers (CISC) and to be a highly competitive RISC processor for Unix workstations and similar markets.

<span class="mw-page-title-main">Itanium</span> Family of 64-bit Intel microprocessors

Itanium is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture. The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel. Launched in June 2001, Intel initially marketed the processors for enterprise servers and high-performance computing systems. In the concept phase, engineers said "we could run circles around PowerPC...we could kill the x86." Early predictions were that IA-64 would expand to the lower-end servers, supplanting Xeon, and eventually penetrate into the personal computers, eventually to supplant reduced instruction set computing (RISC) and complex instruction set computing (CISC) architectures for all general-purpose applications.

<span class="mw-page-title-main">Non-uniform memory access</span> Computer memory design used in multiprocessing

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

<span class="mw-page-title-main">Synchronous dynamic random-access memory</span> Type of computer memory

Synchronous dynamic random-access memory is any DRAM where the operation of its external pin interface is coordinated by an externally supplied clock signal.

Rambus DRAM (RDRAM), and its successors Concurrent Rambus DRAM (CRDRAM) and Direct Rambus DRAM (DRDRAM), are types of synchronous dynamic random-access memory (SDRAM) developed by Rambus from the 1990s through to the early 2000s. The third-generation of Rambus DRAM, DRDRAM was replaced by XDR DRAM. Rambus DRAM was developed for high-bandwidth applications and was positioned by Rambus as replacement for various types of contemporary memories, such as SDRAM.

<span class="mw-page-title-main">PowerPC 970</span> 64-bit processor

The PowerPC 970, PowerPC 970FX, and PowerPC 970MP are 64-bit PowerPC CPUs from IBM introduced in 2002. Apple branded the 970 as PowerPC G5 for its Power Mac G5.

<span class="mw-page-title-main">POWER5</span> 2004 family of multiprocessors by IBM

The POWER5 is a microprocessor developed and fabricated by IBM. It is an improved version of the POWER4. The principal improvements are support for simultaneous multithreading (SMT) and an on-die memory controller. The POWER5 is a dual-core microprocessor, with each core supporting one physical thread and two logical threads, for a total of two physical threads and four logical threads.

<span class="mw-page-title-main">Emotion Engine</span> Central processing unit by Sony Computer Entertainment and Toshiba

The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North America to provide PlayStation 2 game support. Mass production of the Emotion Engine began in 1999 and ended in late 2012 with the discontinuation of the PlayStation 2.

The Intel QuickPath Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and available bandwidth. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as Yet Another Protocol (YAP) and YAP+.

XDR DRAM is a high-performance dynamic random-access memory interface. It is based on and succeeds RDRAM. Competing technologies include DDR2 and GDDR4.

<span class="mw-page-title-main">R10000</span> MIPS microprocessor

The R10000, code-named "T5", is a RISC microprocessor implementation of the MIPS IV instruction set architecture (ISA) developed by MIPS Technologies, Inc. (MTI), then a division of Silicon Graphics, Inc. (SGI). The chief designers are Chris Rowen and Kenneth C. Yeager. The R10000 microarchitecture is known as ANDES, an abbreviation for Architecture with Non-sequential Dynamic Execution Scheduling. The R10000 largely replaces the R8000 in the high-end and the R4400 elsewhere. MTI was a fabless semiconductor company; the R10000 was fabricated by NEC and Toshiba. Previous fabricators of MIPS microprocessors such as Integrated Device Technology (IDT) and three others did not fabricate the R10000 as it was more expensive to do so than the R4000 and R4400.

<span class="mw-page-title-main">AlphaServer</span> Computer system

AlphaServer is a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and later by Compaq and HP. AlphaServers were based on the DEC Alpha 64-bit microprocessor. Supported operating systems for AlphaServers are Tru64 UNIX, OpenVMS, MEDITECH MAGIC and Windows NT, while enthusiasts have provided alternative operating systems such as Linux, NetBSD, OpenBSD and FreeBSD.

<span class="mw-page-title-main">Memory module</span>

In computing, a memory module or RAM stick is a printed circuit board on which memory integrated circuits are mounted.

<span class="mw-page-title-main">Alpha 21064</span> Microprocessor

The Alpha 21064 is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced as the DECchip 21064 before it was renamed in 1994. The 21064 is also known by its code name, EV4. It was announced in February 1992 with volume availability in September 1992. The 21064 was the first commercial implementation of the Alpha ISA, and the first microprocessor from Digital to be available commercially. It was succeeded by a derivative, the Alpha 21064A in October 1993. This last version was replaced by the Alpha 21164 in 1995.

<span class="mw-page-title-main">Alpha 21164</span> Microprocessor

The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Alpha 21064A as Digital's flagship microprocessor. It was succeeded by the Alpha 21264 in 1998.

<span class="mw-page-title-main">Alpha 21264</span> RISC microprocessor

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

<span class="mw-page-title-main">PA-8000</span> HP microprocessor

The PA-8000 (PCX-U), code-named Onyx, is a microprocessor developed and fabricated by Hewlett-Packard (HP) that implemented the PA-RISC 2.0 instruction set architecture (ISA). It was a completely new design with no circuitry derived from previous PA-RISC microprocessors. The PA-8000 was introduced on 2 November 1995 when shipments began to members of the Precision RISC Organization (PRO). It was used exclusively by PRO members and was not sold on the merchant market. All follow-on PA-8x00 processors are based on the basic PA-8000 processor core.

<span class="mw-page-title-main">UltraSPARC III</span> Microprocessor developed by Sun Microsystems

The UltraSPARC III, code-named "Cheetah", is a microprocessor that implements the SPARC V9 instruction set architecture (ISA) developed by Sun Microsystems and fabricated by Texas Instruments. It was introduced in 2001 and operates at 600 to 900 MHz. It was succeeded by the UltraSPARC IV in 2004. Gary Lauterbach was the chief architect.

The Alpha 21464 is an unfinished microprocessor that implements the Alpha instruction set architecture (ISA) developed by Digital Equipment Corporation and later by Compaq after it acquired Digital. The microprocessor was also known as EV8. Slated for a 2004 release, it was canceled on 25 June 2001 when Compaq announced that Alpha would be phased out in favor of Itanium by 2004. When it was canceled, the Alpha 21464 was at a late stage of development but had not been taped out.

<span class="mw-page-title-main">Intel 850</span>

The Intel 850 chipset was the first chipset available for the Pentium 4 processor, and was simultaneously released in November 2000. It consists of an 82850 memory controller hub and an 82801BA I/O controller hub.

References

Further reading