Intel QuickPath Interconnect

Last updated

The Intel QuickPath Interconnect (QPI) [1] [2] is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and available bandwidth. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). [3] Earlier incarnations were known as Yet Another Protocol (YAP) and YAP+.

Contents

QPI 1.1 is a significantly revamped version introduced with Sandy Bridge-EP (Romley platform). [4]

QPI was replaced by Intel Ultra Path Interconnect (UPI) in Skylake-SP Xeon processors based on LGA 3647 socket. [5]

Background

Although sometimes called a "bus", QPI is a point-to-point interconnect. It was designed to compete with HyperTransport that had been used by Advanced Micro Devices (AMD) since around 2003. [6] [7] Intel developed QPI at its Massachusetts Microprocessor Design Center (MMDC) by members of what had been the Alpha Development Group, which Intel had acquired from Compaq and HP and in turn originally came from Digital Equipment Corporation (DEC). [8] Its development had been reported as early as 2004. [9]

Intel first delivered it for desktop processors in November 2008 on the Intel Core i7-9xx and X58 chipset. It was released in Xeon processors code-named Nehalem in March 2009 and Itanium processors in February 2010 (code named Tukwila). [10]

It was supplanted by the Intel Ultra Path Interconnect starting in 2017 on the Xeon Skylake-SP platforms. [11]

Implementation

QPI is an uncore component in Intel's Nehalem microarchitecture. Intel Nehalem arch.svg
QPI is an uncore component in Intel's Nehalem microarchitecture.

The QPI is an element of a system architecture that Intel calls the QuickPath architecture that implements what Intel calls QuickPath technology. [12] In its simplest form on a single-processor motherboard, a single QPI is used to connect the processor to the IO Hub (e.g., to connect an Intel Core i7 to an X58). In more complex instances of the architecture, separate QPI link pairs connect one or more processors and one or more IO hubs or routing hubs in a network on the motherboard, allowing all of the components to access other components via the network. As with HyperTransport, the QuickPath Architecture assumes that the processors will have integrated memory controllers, and enables a non-uniform memory access (NUMA) architecture.

Each QPI comprises two 20-lane point-to-point data links, one in each direction (full duplex), with a separate clock pair in each direction, for a total of 42 signals. Each signal is a differential pair, so the total number of pins is 84. The 20 data lanes are divided onto four "quadrants" of 5 lanes each. The basic unit of transfer is the 80-bit flit, which has 8 bits for error detection, 8 bits for "link-layer header", and 64 bits for data. One 80-bit flit is transferred in two clock cycles (four 20-bit transfers, two per clock tick.) QPI bandwidths are advertised by computing the transfer of 64 bits (8 bytes) of data every two clock cycles in each direction. [8]

Although the initial implementations use single four-quadrant links, the QPI specification permits other implementations. Each quadrant can be used independently. On high-reliability servers, a QPI link can operate in a degraded mode. If one or more of the 20+1 signals fails, the interface will operate using 10+1 or even 5+1 remaining signals, even reassigning the clock to a data signal if the clock fails. [8] The initial Nehalem implementation used a full four-quadrant interface to achieve 25.6 GB/s (6.4GT/s × 1 byte × 4), which provides exactly double the theoretical bandwidth of Intel's 1600 MHz FSB used in the X48 chipset.

Although some high-end Core i7 processors expose QPI, other "mainstream" Nehalem desktop and mobile processors intended for single-socket boards (e.g. LGA 1156 Core i3, Core i5, and other Core i7 processors from the Lynnfield/Clarksfield and successor families) do not expose QPI externally, because these processors are not intended to participate in multi-socket systems.

However, QPI is used internally on these chips to communicate with the "uncore", which is part of the chip containing memory controllers, CPU-side PCI Express and GPU, if present; the uncore may or may not be on the same die as the CPU core, for instance it is on a separate die in the Westmere-based Clarkdale/Arrandale. [13] [14] [15] [16] :3

In post-2009 single-socket chips starting with Lynnfield, Clarksfield, Clarkdale and Arrandale, the traditional northbridge functions are integrated into these processors, which therefore communicate externally via the slower DMI and PCI Express interfaces.

Thus, there is no need to incur the expense of exposing the (former) front-side bus interface via the processor socket. [17]

Although the core–uncore QPI link is not present in desktop and mobile Sandy Bridge processors (as it was on Clarkdale, for example), the internal ring interconnect between on-die cores is also based on the principles behind QPI, at least as far as cache coherency is concerned. [16] :10

Frequency specifications

Being a synchronous circuit the QPI operates at a clock rate of 2.4 GHz, 2.93 GHz, 3.2 GHz, 3.6 GHz, 4.0 GHz or 4.8 GHz (3.6 GHz and 4.0 GHz frequencies were introduced with the Sandy Bridge-E/EP platform and 4.8 GHz with the Haswell-E/EP platform). The clock rate for a particular link depends on the capabilities of the components at each end of the link and the signal characteristics of the signal path on the printed circuit board. The non-extreme Core i7 9xx processors are restricted to a 2.4 GHz frequency at stock reference clocks.

Bit transfers occur on both the rising and the falling edges of the clock, so the transfer rate is double the clock rate.

Intel describes the data throughput (in GB/s) by counting only the 64-bit data payload in each 80-bit flit. However, Intel then doubles the result because the unidirectional send and receive link pair can be simultaneously active. Thus, Intel describes a 20-lane QPI link pair (send and receive) with a 3.2 GHz clock as having a data rate of 25.6 GB/s. A clock rate of 2.4 GHz yields a data rate of 19.2 GB/s. More generally, by this definition a two-link 20-lane QPI transfers eight bytes per clock cycle, four in each direction.

The rate is computed as follows:

3.2 GHz
× 2 bits/Hz (double data rate)
× 16(20) (data bits/QPI link width)
× 2 (unidirectional send and receive operating simultaneously)
÷ 8 (bits/byte)
= 25.6 GB/s

Protocol layers

QPI is specified as a five-layer architecture, with separate physical, link, routing, transport, and protocol layers. [1] In devices intended only for point-to-point QPI use with no forwarding, such as the Core i7-9xx and Xeon DP processors, the transport layer is not present and the routing layer is minimal.

Physical layer
The physical layer comprises the actual wiring and the differential transmitters and receivers, plus the lowest-level logic that transmits and receives the physical-layer unit. The physical-layer unit is the 20-bit "phit." The physical layer transmits a 20-bit "phit" using a single clock edge on 20 lanes when all 20 lanes are available, or on 10 or 5 lanes when the QPI is reconfigured due to a failure. Note that in addition to the data signals, a clock signal is forwarded from the transmitter to receiver (which simplifies clock recovery at the expense of additional pins).
Link layer
The link layer is responsible for sending and receiving 80-bit flits. Each flit is sent to the physical layer as four 20-bit phits. Each flit contains an 8-bit CRC generated by the link layer transmitter and a 72-bit payload. If the link layer receiver detects a CRC error, the receiver notifies the transmitter via a flit on the return link of the pair and the transmitter resends the flit. The link layer implements flow control using a credit/debit scheme to prevent the receiver's buffer from overflowing. The link layer supports six different classes of message to permit the higher layers to distinguish data flits from non-data messages primarily for maintenance of cache coherence. In complex implementations of the QuickPath architecture, the link layer can be configured to maintain separate flows and flow control for the different classes. It is not clear if this is needed or implemented for single-processor and dual-processor implementations.
Routing layer
The routing layer sends a 72-bit unit consisting of an 8-bit header and a 64-bit payload. The header contains the destination and the message type. When the routing layer receives a unit, it examines its routing tables to determine if the unit has reached its destination. If so it is delivered to the next-higher layer. If not, it is sent on the correct outbound QPI. On a device with only one QPI, the routing layer is minimal. For more complex implementations, the routing layer's routing tables are more complex, and are modified dynamically to avoid failed QPI links.
Transport layer
The transport layer is not needed and is not present in devices that are intended for only point-to-point connections. This includes the Core i7. The transport layer sends and receives data across the QPI network from its peers on other devices that may not be directly connected (i.e., the data may have been routed through an intervening device.) the transport layer verifies that the data is complete, and if not, it requests retransmission from its peer.
Protocol layer
The protocol layer sends and receives packets on behalf of the device. A typical packet is a memory cache row. The protocol layer also participates in maintenance of cache coherence by sending and receiving relevant messages.

See also

Related Research Articles

HyperTransport (HT), formerly known as Lightning Data Transport, is a technology for interconnection of computer processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point link that was introduced on April 2, 2001. The HyperTransport Consortium is in charge of promoting and developing HyperTransport technology.

<span class="mw-page-title-main">Front-side bus</span> Type of computer communication interface

The front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The EV6 bus served the same function for competing AMD CPUs. Both typically carry data between the central processing unit (CPU) and a memory controller hub, known as the northbridge.

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for ECC memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture. They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus.

The Arm Advanced Microcontroller Bus Architecture (AMBA) is an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. It facilitates development of multi-processor designs with large numbers of controllers and components with a bus architecture. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones. AMBA is a registered trademark of Arm Ltd.

In the fields of digital electronics and computer hardware, multi-channel memory architecture is a technology that increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between them. Theoretically, this multiplies the data rate by exactly the number of channels present. Dual-channel memory employs two channels. The technique goes back as far as the 1960s having been used in IBM System/360 Model 91 and in CDC 6600.

<span class="mw-page-title-main">Pentium</span> Brand of semi-discontinued microprocessors produced by Intel

Pentium is a semi-discontinued series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium was first released on March 22, 1993. The name "Pentium" is originally derived from the Greek word pente (πεντε), meaning "five", a reference to the prior numeric naming convention of Intel's 80x86 processors (8086–80486), with the Latin ending -ium since the processor would otherwise have been named 80586 using that convention.

<span class="mw-page-title-main">Nehalem (microarchitecture)</span> CPU microarchitecture by Intel

Nehalem is the codename for Intel's 45 nm microarchitecture released in November 2008. It was used in the first generation of the Intel Core i5 and i7 processors, and succeeds the older Core microarchitecture used on Core 2 processors. The term "Nehalem" comes from the Nehalem River.

<span class="mw-page-title-main">Sandy Bridge</span> Intel processor microarchitecture

Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors. The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture. Intel demonstrated a A1 stepping Sandy Bridge processor in 2009 during IDF, and released first products based on the architecture in January 2011 under the Core brand.

<span class="mw-page-title-main">Direct Media Interface</span> Intel bus for connecting CPU and I/O chipset

In computing, Direct Media Interface (DMI) is Intel's proprietary link between the northbridge and southbridge chipset on a computer motherboard. It was first used between the 9xx chipsets and the ICH6, released in 2004. Previous Intel chipsets had used the Intel Hub Architecture to perform the same function, and server chipsets use a similar interface called Enterprise Southbridge Interface (ESI). While the "DMI" name dates back to ICH6, Intel mandates specific combinations of compatible devices, so the presence of a DMI does not guarantee by itself that a particular northbridge–southbridge combination is allowed.

<span class="mw-page-title-main">Intel X58</span> Chip designed by Intel

The Intel X58 is an Intel chip designed to connect Intel processors with Intel QuickPath Interconnect (QPI) interface to peripheral devices. Supported processors implement the Nehalem microarchitecture and therefore have an integrated memory controller (IMC), so the X58 does not have a memory interface. Initially supported processors were the Core i7, but the chip also supported Nehalem and Westmere-based Xeon processors.

<span class="mw-page-title-main">LGA 1366</span> CPU socket for Intel processors

LGA 1366, also known as Socket B, is an Intel CPU socket. This socket supersedes Intel's LGA 775 in the high-end and performance desktop segments. It also replaces the server-oriented LGA 771 in the entry level and is superseded itself by LGA 2011. This socket has 1,366 protruding pins which touch contact points on the underside of the processor (CPU) and accesses up to three channels of DDR3 memory via the processor's internal memory controller.

"Uncore" is a term used by Intel to describe the functions of a microprocessor that are not in the core, but which must be closely connected to the core to achieve high performance. It has been called "system agent" since the release of the Sandy Bridge microarchitecture.

Gulftown or Westmere-E is the codename of an up to six-core hyperthreaded Intel processor able to run up to 12 threads in parallel. It is based on Westmere microarchitecture, the 32 nm shrink of Nehalem. Originally rumored to be called the Intel Core i9, it is sold as an Intel Core i7. The first release was the Core i7 980X in the first quarter of 2010, along with its server counterpart, the Xeon 3600 and the dual-socket Xeon 5600 (Westmere-EP) series using identical chips.

Bloomfield is the code name for Intel high-end desktop processors sold as Core i7-9xx and single-processor servers sold as Xeon 35xx., in almost identical configurations, replacing the earlier Yorkfield processors. The Bloomfield core is closely related to the dual-processor Gainestown, which has the same CPUID value of 0106Ax and which uses the same socket. Bloomfield uses a different socket than the later Lynnfield and Clarksfield processors based on the same 45 nm Nehalem microarchitecture, even though some of these share the same Intel Core i7 brand.

<span class="mw-page-title-main">Intel Core</span> Line of CPUs by Intel

Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets.

<span class="mw-page-title-main">LGA 2011</span> CPU socket created by Intel

LGA 2011, also called Socket R, is a CPU socket by Intel released on November 14, 2011. It launched along with LGA 1356 to replace its predecessor, LGA 1366 and LGA 1567. While LGA 1356 was designed for dual-processor or low-end servers, LGA 2011 was designed for high-end desktops and high-performance servers. The socket has 2011 protruding pins that touch contact points on the underside of the processor.

<span class="mw-page-title-main">Westmere (microarchitecture)</span> CPU microarchitecture by Intel

Westmere is the code name given to the 32 nm die shrink of Nehalem. While sharing the same CPU sockets, Westmere included Intel HD Graphics, while Nehalem did not.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

The Intel Ultra Path Interconnect (UPI) is a point-to-point processor interconnect developed by Intel which replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP platforms starting in 2017.

References

  1. 1 2 "An Introduction to the Intel QuickPath Interconnect" (PDF). Intel Corporation. January 30, 2009. Retrieved June 14, 2011.
  2. DailyTech report Archived 2013-10-17 at the Wayback Machine , retrieved August 21, 2007
  3. Eva Glass (May 16, 2007). "Intel CSI name revealed: Slow, slow, quick quick slow". The Inquirer. Archived from the original on June 10, 2012. Retrieved September 13, 2013.{{cite news}}: CS1 maint: unfit URL (link)
  4. David Kanter (2011-07-20). "Intel's Quick Path Evolved". Realworldtech.com. Retrieved 2014-01-21.
  5. SoftPedia: Intel Plans to Replace Xeon with Its New Skylake-Based “Purley” Super Platform
  6. Gabriel Torres (August 25, 2008). "Everything You Need to Know About The QuickPath Interconnect (QPI)". Hardware Secrets. Retrieved January 23, 2017.
  7. Charlie Demerjian (December 13, 2005). "Intel Intel gets knickers in a twist over Tanglewood". The Inquirer. Archived from the original on September 3, 2010. Retrieved September 13, 2013.{{cite news}}: CS1 maint: unfit URL (link)
  8. 1 2 3 David Kanter (August 28, 2007). "The Common System Interface: Intel's Future Interconnect". Real World Tech. Retrieved August 14, 2014.
  9. Eva Glass (December 12, 2004). "Intel's Whitefield takes four core IA-32 shape". The Inquirer. Archived from the original on May 24, 2009. Retrieved September 13, 2013.{{cite news}}: CS1 maint: unfit URL (link)
  10. David Kanter (May 5, 2006). "Intel's Tukwila Confirmed to be Quad Core". Real World Tech. Archived from the original on May 10, 2012. Retrieved September 13, 2013.
  11. "Intel® Xeon® Processor Scalable Family Technical Overview".
  12. "Intel Demonstrates Industry's First 32nm Chip and Next-Generation Nehalem Microprocessor Architecture". Archived from the original on 2008-01-02. Retrieved 2007-12-31.
  13. Chris Angelini (2009-09-07). "QPI, Integrated Memory, PCI Express, And LGA 1156 - Intel Core i5 And Core i7: Intel's Mainstream Magnum Opus". Tomshardware.com. Retrieved 2014-01-21.
  14. Published on 25th January 2010 by Richard Swinburne (2010-01-25). "Feature - Intel GMA HD Graphics Performance". bit-tech.net. Retrieved 2014-01-21.{{cite web}}: CS1 maint: numeric names: authors list (link)
  15. "Intel Clarkdale 32nm CPU-and-GPU chip benchmarked (again) - CPU - Feature". HEXUS.net. 2009-09-25. Retrieved 2014-01-21.
  16. 1 2 Oded Lempel (2013-07-28). "2nd Generation Intel Core Processor Family: Intel Core i7, i5 and i3" (PDF). hotchips.org. Archived from the original (PDF) on 2020-07-29. Retrieved 2014-01-21.
  17. Lily Looi, Stephan Jourdan, Transitioning the Intel® Next Generation Microarchitectures (Nehalem and Westmere) into the Mainstream, Hot Chips 21, August 24, 2009