HyperTransport

Last updated

Logo of the HyperTransport Consortium HyperTransport Consortium logo.svg
Logo of the HyperTransport Consortium

HyperTransport (HT), formerly known as Lightning Data Transport, is a technology for interconnection of computer processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point link that was introduced on April 2, 2001. [1] The HyperTransport Consortium is in charge of promoting and developing HyperTransport technology.

Contents

HyperTransport is best known as the system bus architecture of AMD central processing units (CPUs) from Athlon 64 through AMD FX and the associated motherboard chipsets. HyperTransport has also been used by IBM and Apple for the Power Mac G5 machines, as well as a number of modern MIPS systems.

The current specification HTX 3.1 remained competitive for 2014 high-speed (2666 and 3200  MT/s or about 10.4 GB/s and 12.8 GB/s) DDR4 RAM and slower (around 1 GB/s similar to high end PCIe SSDs ULLtraDIMM flash RAM) technology[ clarification needed ]—a wider range of RAM speeds on a common CPU bus than any Intel front-side bus. Intel technologies require each speed range of RAM to have its own interface, resulting in a more complex motherboard layout but with fewer bottlenecks. HTX 3.1 at 26 GB/s can serve as a unified bus for as many as four DDR4 sticks running at the fastest proposed speeds. Beyond that DDR4 RAM may require two or more HTX 3.1 buses diminishing its value as unified transport.

Overview

HyperTransport comes in four versions—1.x, 2.0, 3.0, and 3.1—which run from 200  MHz to 3.2 GHz. It is also a DDR or "double data rate" connection, meaning it sends data on both the rising and falling edges of the clock signal. This allows for a maximum data rate of 6400 MT/s when running at 3.2 GHz. The operating frequency is autonegotiated with the motherboard chipset (North Bridge) in current computing.

HyperTransport supports an autonegotiated bit width, ranging from 2 to 32 bits per link; there are two unidirectional links per HyperTransport bus. With the advent of version 3.1, using full 32-bit links and utilizing the full HyperTransport 3.1 specification's operating frequency, the theoretical transfer rate is 25.6  GB/s (3.2 GHz × 2 transfers per clock cycle × 32 bits per link) per direction, or 51.2 GB/s aggregated throughput, making it faster than most existing bus standard for PC workstations and servers as well as making it faster than most bus standards for high-performance computing and networking.

Links of various widths can be mixed together in a single system configuration as in one 16-bit link to another CPU and one 8-bit link to a peripheral device, which allows for a wider interconnect between CPUs, and a lower bandwidth interconnect to peripherals as appropriate. It also supports link splitting, where a single 16-bit link can be divided into two 8-bit links. The technology also typically has lower latency than other solutions due to its lower overhead.

Electrically, HyperTransport is similar to low-voltage differential signaling (LVDS) operating at 1.2 V. [2] HyperTransport 2.0 added post-cursor transmitter deemphasis. HyperTransport 3.0 added scrambling and receiver phase alignment as well as optional transmitter precursor deemphasis.

Packet-oriented

HyperTransport is packet-based, where each packet consists of a set of 32-bit words, regardless of the physical width of the link. The first word in a packet always contains a command field. Many packets contain a 40-bit address. An additional 32-bit control packet is prepended when 64-bit addressing is required. The data payload is sent after the control packet. Transfers are always padded to a multiple of 32 bits, regardless of their actual length.

HyperTransport packets enter the interconnect in segments known as bit times. The number of bit times required depends on the link width. HyperTransport also supports system management messaging, signaling interrupts, issuing probes to adjacent devices or processors, I/O transactions, and general data transactions. There are two kinds of write commands supported: posted and non-posted. Posted writes do not require a response from the target. This is usually used for high bandwidth devices such as uniform memory access traffic or direct memory access transfers. Non-posted writes require a response from the receiver in the form of a "target done" response. Reads also require a response, containing the read data. HyperTransport supports the PCI consumer/producer ordering model.

Power-managed

HyperTransport also facilitates power management as it is compliant with the Advanced Configuration and Power Interface specification. This means that changes in processor sleep states (C states) can signal changes in device states (D states), e.g. powering off disks when the CPU goes to sleep. HyperTransport 3.0 added further capabilities to allow a centralized power management controller to implement power management policies.

Applications

Front-side bus replacement

The primary use for HyperTransport is to replace the Intel-defined front-side bus, which is different for every type of Intel processor. For instance, a Pentium cannot be plugged into a PCI Express bus directly, but must first go through an adapter to expand the system. The proprietary front-side bus must connect through adapters for the various standard buses, like AGP or PCI Express. These are typically included in the respective controller functions, namely the northbridge and southbridge .

In contrast, HyperTransport is an open specification, published by a multi-company consortium. A single HyperTransport adapter chip will work with a wide spectrum of HyperTransport enabled microprocessors.

AMD used HyperTransport to replace the front-side bus in their Opteron, Athlon 64, Athlon II, Sempron 64, Turion 64, Phenom, Phenom II and FX families of microprocessors.

Multiprocessor interconnect

Another use for HyperTransport is as an interconnect for NUMA multiprocessor computers. AMD used HyperTransport with a proprietary cache coherency extension as part of their Direct Connect Architecture in their Opteron and Athlon 64 FX (Dual Socket Direct Connect (DSDC) Architecture) line of processors. Infinity Fabric used with the EPYC server CPUs is a superset of HyperTransport. The HORUS interconnect from Newisys extends this concept to larger clusters. The Aqua device from 3Leaf Systems virtualizes and interconnects CPUs, memory, and I/O.

Router or switch bus replacement

HyperTransport can also be used as a bus in routers and switches. Routers and switches have multiple network interfaces, and must forward data between these ports as fast as possible. For example, a four-port, 1000  Mbit/s Ethernet router needs a maximum 8000 Mbit/s of internal bandwidth (1000 Mbit/s × 4 ports × 2 directions)—HyperTransport greatly exceeds the bandwidth this application requires. However a 4 + 1 port 10 Gb router would require 100 Gbit/s of internal bandwidth. Add to that 802.11ac 8 antennas and the WiGig 60 GHz standard (802.11ad) and HyperTransport becomes more feasible (with anywhere between 20 and 24 lanes used for the needed bandwidth).

Co-processor interconnect

The issue of latency and bandwidth between CPUs and co-processors has usually been the major stumbling block to their practical implementation. Co-processors such as FPGAs have appeared that can access the HyperTransport bus and become integrated on the motherboard. Current generation FPGAs from both main manufacturers (Altera and Xilinx) directly support the HyperTransport interface, and have IP Cores available. Companies such as XtremeData, Inc. and DRC take these FPGAs (Xilinx in DRC's case) and create a module that allows FPGAs to plug directly into the Opteron socket.

AMD started an initiative named Torrenza on September 21, 2006 to further promote the usage of HyperTransport for plug-in cards and coprocessors. This initiative opened their "Socket F" to plug-in boards such as those from XtremeData and DRC.

Add-on card connector (HTX and HTX3)

Connectors from top to bottom: HTX, PCI-Express for riser card, PCI-Express HyperTransport16 pcie8riser pcie16.jpg
Connectors from top to bottom: HTX, PCI-Express for riser card, PCI-Express

A connector specification that allows a slot-based peripheral to have direct connection to a microprocessor using a HyperTransport interface was released by the HyperTransport Consortium. It is known as HyperTransport eXpansion (HTX). Using a reversed instance of the same mechanical connector as a 16-lane PCI Express slot (plus an x1 connector for power pins), HTX allows development of plug-in cards that support direct access to a CPU and DMA to the system RAM. The initial card for this slot was the QLogic InfiniPath InfiniBand HCA. IBM and HP, among others, have released HTX compliant systems.

The original HTX standard is limited to 16 bits and 800 MHz. [3]

In August 2008, the HyperTransport Consortium released HTX3, which extends the clock rate of HTX to 2.6 GHz (5.2 GT/s, 10.7 GTi, 5.2 real GHz data rate, 3 MT/s edit rate) and retains backwards compatibility. [4]

Testing

The "DUT" test connector [5] is defined to enable standardized functional test system interconnection.

Implementations

Frequency specifications

HyperTransport
version
YearMax. HT frequencyMax. link widthMax. aggregate bandwidth (GB/s)
bi-directional16-bit unidirectional32-bit unidirectional*
1.02001800 MHz32-bit12.83.26.4
1.12002800 MHz32-bit12.83.26.4
2.020041.4 GHz32-bit22.45.611.2
3.020062.6 GHz32-bit41.610.420.8
3.120083.2 GHz32-bit51.212.825.6

* AMD Athlon 64, Athlon 64 FX, Athlon 64 X2, Athlon X2, Athlon II, Phenom, Phenom II, Sempron, Turion series and later use one 16-bit HyperTransport link. AMD Athlon 64 FX (1207), Opteron use up to three 16-bit HyperTransport links. Common clock rates for these processor links are 800 MHz to 1 GHz (older single and multi socket systems on 754/939/940 links) and 1.6 GHz to 2.0 GHz (newer single socket systems on AM2+/AM3 links—most newer CPUs using 2.0 GHz). While HyperTransport itself is capable of 32-bit width links, that width is not currently utilized by any AMD processors. Some chipsets though do not even utilize the 16-bit width used by the processors. Those include the Nvidia nForce3 150, nForce3 Pro 150, and the ULi M1689—which use a 16-bit HyperTransport downstream link but limit the HyperTransport upstream link to 8 bits.

Name

There has been some marketing confusion[ citation needed ] between the use of HT referring to HyperTransport and the later use of HT to refer to Intel's Hyper-Threading feature on some Pentium 4-based and the newer Nehalem and Westmere-based Intel Core microprocessors. Hyper-Threading is officially known as Hyper-Threading Technology (HTT) or HT Technology. Because of this potential for confusion, the HyperTransport Consortium always uses the written-out form: "HyperTransport."

Infinity Fabric

Infinity Fabric (IF) is a superset of HyperTransport announced by AMD in 2016 as an interconnect for its GPUs and CPUs. It is also usable as interchip interconnect for communication between CPUs and GPUs (for Heterogeneous System Architecture), an arrangement known as Infinity Architecture. [7] [8] [9] The company said the Infinity Fabric would scale from 30 GB/s to 512 GB/s, and be used in the Zen-based CPUs and Vega GPUs which were subsequently released in 2017.

On Zen and Zen+ CPUs, the "SDF" data interconnects are run at the same frequency as the DRAM memory clock (MEMCLK), a decision made to remove the latency caused by different clock speeds. As a result, using a faster RAM module makes the entire bus faster. The links are 32-bit wide, as in HT, but 8 transfers are done per cycle (128-bit packets) compared to the original 2. Electrical changes are made for higher power efficiency. [10] On Zen 2 and Zen 3 CPUs, the IF bus is on a separate clock, either in a 1:1 or 2:1 ratio to the DRAM clock, because of Zen's early problems with high-speed DRAM affecting IF speed, and therefore system stability. The bus width has also been doubled. [11]

See also

Related Research Articles

<span class="mw-page-title-main">Athlon</span> Brand of microprocessors by AMD

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by AMD. The original Athlon was the first seventh-generation x86 processor and the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop SoC architecture, and Socket AM4 Zen microarchitecture. The modern Zen-based Athlon with a Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor.

<span class="mw-page-title-main">PCI Express</span> Computer expansion bus standard

PCI Express, officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common motherboard interface for personal computers' graphics cards, sound cards, hard disk drive host adapters, SSDs, Wi-Fi and Ethernet hardware connections. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices, a more detailed error detection and reporting mechanism, and native hot-swap functionality. More recent revisions of the PCIe standard provide hardware support for I/O virtualization.

<span class="mw-page-title-main">Pentium 4</span> Brand by Intel

Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. It was removed from the official price lists starting in 2010, being replaced by Pentium Dual-Core.

<span class="mw-page-title-main">Pentium III</span> Line of desktop and mobile microprocessors produced by Intel

The Pentium III brand refers to Intel's 32-bit x86 desktop and mobile CPUs based on the sixth-generation P6 microarchitecture introduced on February 28, 1999. The brand's initial processors were very similar to the earlier Pentium II-branded processors. The most notable differences were the addition of the Streaming SIMD Extensions (SSE) instruction set, and the introduction of a controversial serial number embedded in the chip during manufacturing. The Pentium III is also a single-core processor.

<span class="mw-page-title-main">Opteron</span> Server and workstation processor line by AMD

Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The last released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.

<span class="mw-page-title-main">Front-side bus</span> Type of computer communication interface

The front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The EV6 bus served the same function for competing AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

<span class="mw-page-title-main">Athlon 64</span> Series of CPUs by AMD

The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name Athlon, and the immediate successor to the Athlon XP. The Athlon 64 was the second processor to implement the AMD64 architecture and the first 64-bit processor targeted at the average consumer. Variants of the Athlon 64 have been produced for Socket 754, Socket 939, Socket 940, and Socket AM2. It was AMD's primary consumer CPU, and primarily competed with Intel's Pentium 4, especially the Prescott and Cedar Mill core revisions.

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for error correction code (ECC) memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture (MCA). They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus.

<span class="mw-page-title-main">Chipset</span> Electronic component to manage data flow of a CPU

In a computer system, a chipset is a set of electronic components on one or more integrated circuits that manages the data flow between the processor, memory and peripherals. The chipset is usually found on the motherboard of computers. Chipsets are usually designed to work with a specific family of microprocessors. Because it controls communications between the processor and external devices, the chipset plays a crucial role in determining system performance.

nForce4 Motherboard chipset

The nForce4 is a motherboard chipset released by Nvidia in October 2004. The chipset supports AMD 64-bit processors and Intel Pentium 4 LGA 775 processors.

<span class="mw-page-title-main">Socket 939</span> CPU socket for old AMD CPUs

Socket 939 is a CPU socket released by AMD in June 2004 to supersede the previous Socket 754 for Athlon 64 processors. Socket 939 was succeeded by Socket AM2 in May 2006. It was the second socket designed for AMD's AMD64 range of processors.

<span class="mw-page-title-main">Geode (processor)</span> Series of x86-compatible processor

Geode was a series of x86-compatible system-on-a-chip (SoC) microprocessors and I/O companions produced by AMD, targeted at the embedded computing market.

<span class="mw-page-title-main">Pentium D</span> Family of Intel microprocessors

Pentium D is a range of desktop 64-bit x86-64 processors based on the NetBurst microarchitecture, which is the dual-core variant of the Pentium 4 manufactured by Intel. Each CPU comprised two cores. The brand's first processor, codenamed Smithfield and manufactured on the 90 nm process, was released on May 25, 2005, followed by the 65 nm Presler nine months later. The core implementation on the 90 nm "Smithfield" and later 65 nm "Presler" are designed differently but are functionally the same. The 90 nm "Smithfield" contains a single die, with two adjoined but functionally separate CPU cores cut from the same wafer. The later 65 nm "Presler" utilized a multi-chip module package, where two discrete dies each containing a single core reside on the CPU substrate. Neither the 90nm "Smithfield" nor the 65 nm "Presler" were capable of direct core to core communication, relying instead on the northbridge link to send information between the 2 cores.

The Intel QuickPath Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and available bandwidth. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as Yet Another Protocol (YAP) and YAP+.

The AMD Quad FX platform is an AMD platform targeted at enthusiasts which allows users to plug two Socket F Athlon 64 FX or 2-way Opteron processors (CPUs) into a single motherboard for a total of four physical cores. This is a type of dual processor setup, where two CPUs are installed on a motherboard to increase computing power. The major difference between the platform and past dual processor systems like Xeon is that each processor has its own dedicated memory stores. The Quad FX platform also has HyperTransport capability targeted toward consumer platforms.

The AMD 700 chipset series is a set of chipsets designed by ATI for AMD Phenom processors to be sold under the AMD brand. Several members were launched in the end of 2007 and the first half of 2008, others launched throughout the rest of 2008.

AMD Turion is the brand name AMD applies to its x86-64 low-power consumption (mobile) processors codenamed K8L. The Turion 64 and Turion 64 X2/Ultra processors compete with Intel's mobile processors, initially the Pentium M and the Intel Core and Intel Core 2 processors.

<span class="mw-page-title-main">Intel X58</span> Chip designed by Intel

The Intel X58 is an Intel chip designed to connect Intel processors with Intel QuickPath Interconnect (QPI) interface to peripheral devices. Supported processors implement the Nehalem microarchitecture and therefore have an integrated memory controller (IMC), so the X58 does not have a memory interface. Initially supported processors were the Core i7, but the chip also supported Nehalem and Westmere-based Xeon processors.

<span class="mw-page-title-main">Alpha 21264</span> RISC microprocessor

The Alpha 21264 is a Digital Equipment Corporation RISC microprocessor launched on 19 October 1998. The 21264 implemented the Alpha instruction set architecture (ISA).

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

References

  1. "API NetWorks Accelerates Use of HyperTransport Technology With Launch of Industry's First HyperTransport Technology-to-PCI Bridge Chip". HyperTransport Consortium (Press release). April 2, 2001. Archived from the original on October 10, 2006.
  2. "Overview" (PDF). HyperTransport Consortium. Archived from the original (PDF) on July 16, 2011.
  3. Emberson, David; Holden, Brian (December 12, 2007). "HTX specification" (PDF). HyperTransport Consortium. p. 4. Archived from the original (PDF) on March 8, 2012. Retrieved January 30, 2008.
  4. Emberson, David (June 25, 2008). "HTX3 specification" (PDF). HyperTransport Consortium. p. 4. Archived from the original (PDF) on March 8, 2012. Retrieved August 17, 2008.
  5. Holden, Brian; Meschke, Mike; Abu-Lebdeh, Ziad; D'Orfani, Renato. "DUT Connector and Test Environment for HyperTransport" (PDF). HyperTransport Consortium. Archived from the original (PDF) on September 3, 2006. Retrieved November 12, 2022.
  6. Apple (June 25, 2003). "WWDC 2003 Keynote". YouTube. Archived from the original on July 8, 2012. Retrieved October 16, 2009.
  7. AMD. "AMD_presentation_EPYC". Archived from the original on August 21, 2017. Retrieved May 24, 2017.
  8. Merritt, Rick (December 13, 2016). "AMD Clocks Ryzen at 3.4 GHz+". EE Times. Archived from the original on August 8, 2019. Retrieved January 17, 2017.
  9. Alcorn, Paul (March 5, 2020). "AMD's CPU-to-GPU Infinity Fabric Detailed". Tom's Hardware. Retrieved November 12, 2022.
  10. "Infinity Fabric (IF) - AMD". WikiChip.
  11. Cutress, Ian (June 10, 2019). "AMD Zen 2 Microarchitecture Analysis: Ryzen 3000 and EPYC Rome". AnandTech. Retrieved November 12, 2022.