Intel Ultra Path Interconnect

Last updated

The Intel Ultra Path Interconnect (UPI) [1] [2] is a point-to-point processor interconnect developed by Intel which replaced the Intel QuickPath Interconnect (QPI) in Xeon Skylake-SP platforms starting in 2017.

Contents

Interconnect

UPI is a low-latency coherent interconnect for scalable multiprocessor systems with a shared address space. It uses a directory-based home snoop coherency protocol with a transfer speed of up to 10.4  GT/s. Supporting processors typically have two or three UPI links.

Comparing to QPI, it improves power efficiency with a new low-power state, improves transfer efficiency with a new packetization format, and improves scalability with protocol layer that does not require preallocation of resources.

UPI only supports directory-based coherency, unlike previous QPI processors which supported multiple snoop modes (no snoop, early snoop, home snoop, and directory).

A combined caching and home agent (CHA) handles resolution of coherency across multiple processors, as well as snoop requests from processor cores and local and remote agents. Separate physical CHAs are placed within each processor core and last level cache (LLC) bank to improve scalability according to the number of cores, memory controllers, or the sub-NUMA clustering mode. The address space is interleaved across different CHAs, which act like a single logical agent.

See also

Related Research Articles

<span class="mw-page-title-main">Non-uniform memory access</span> Computer memory design used in multiprocessing

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

<span class="mw-page-title-main">Scalable Coherent Interface</span> High-speed interconnect standard for shared memory multiprocessing and message passing

The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations.

<span class="mw-page-title-main">Xeon</span> Line of Intel server processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for ECC memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture. They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus.

The Itanium 9300 series, code-named Tukwila, is the generation of Intel's Itanium processor family following Itanium 2 and Montecito. It was released on 8 February 2010. It utilizes both multiple processor cores (multi-core) and SMT techniques. The engineers said to be working on this project were from the DEC Alpha project, specifically those who worked on the Alpha 21464 (EV8), which was focused on SMT.

The Intel QuickPath Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and available bandwidth. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as Yet Another Protocol (YAP) and YAP+.

The ARM Advanced Microcontroller Bus Architecture (AMBA) is an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. It facilitates development of multi-processor designs with large numbers of controllers and components with a bus architecture. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones. AMBA is a registered trademark of ARM Ltd.

<span class="mw-page-title-main">P6 (microarchitecture)</span> Intel processor microarchitecture

The P6 microarchitecture is the sixth-generation Intel x86 microarchitecture, implemented by the Pentium Pro microprocessor that was introduced in November 1995. It is frequently referred to as i686. It was succeeded by the NetBurst microarchitecture in 2000, but eventually revived in the Pentium M line of microprocessors. The successor to the Pentium M variant of the P6 microarchitecture is the Core microarchitecture which in turn is also derived from P6.

The Intel Core microarchitecture is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the previous iteration of the P6 microarchitecture series which started in 1995 with Pentium Pro. It also replaced the NetBurst microarchitecture, which suffered from high power consumption and heat intensity due to an inefficient pipeline designed for high clock rate. In early 2004 the new version of NetBurst (Prescott) needed very high power to reach the clocks it needed for competitive performance, making it unsuitable for the shift to dual/multi-core CPUs. On May 7, 2004 Intel confirmed the cancellation of the next NetBurst. Intel had been developing Merom, the 64-bit evolution of the Pentium M, since 2001, and decided to expand it to all market segments, replacing NetBurst in desktop computers and servers. It inherited from Pentium M the choice of a short and efficient pipeline, delivering superior performance despite not reaching the high clocks of NetBurst.

Yonah was the code name of Intel's first generation 65 nm process CPU cores, based on cores of the earlier Banias / Dothan Pentium M microarchitecture. Yonah CPU cores were used within Intel's Core Solo and Core Duo mobile microprocessor products. SIMD performance on Yonah improved through the addition of SSE3 instructions and improvements to SSE and SSE2 implementations; integer performance decreased slightly due to higher latency cache. Additionally, Yonah included support for the NX bit.

<span class="mw-page-title-main">Nehalem (microarchitecture)</span> CPU microarchitecture by Intel

Nehalem is the codename for Intel's 45 nm microarchitecture released in November 2008. It was used in the first-generation of the Intel Core i5 and i7 processors, and succeeds the older Core microarchitecture used on Core 2 processors. The term "Nehalem" comes from the Nehalem River.

<span class="mw-page-title-main">Intel X58</span> Chip designed by Intel

The Intel X58 is an Intel chip designed to connect Intel processors with Intel QuickPath Interconnect (QPI) interface to peripheral devices. Supported processors implement the Nehalem microarchitecture and therefore have an integrated memory controller (IMC), so the X58 does not have a memory interface. Initially supported processors were the Core i7, but the chip also supported Nehalem and Westmere-based Xeon processors.

"Uncore" is a term used by Intel to describe the functions of a microprocessor that are not in the core, but which must be closely connected to the core to achieve high performance. It has been called "system agent" since the release of the Sandy Bridge microarchitecture.

Bloomfield is the code name for Intel high-end desktop processors sold as Core i7-9xx and single-processor servers sold as Xeon 35xx., in almost identical configurations, replacing the earlier Yorkfield processors. The Bloomfield core is closely related to the dual-processor Gainestown, which has the same CPUID value of 0106Ax and which uses the same socket. Bloomfield uses a different socket than the later Lynnfield and Clarksfield processors based on the same 45 nm Nehalem microarchitecture, even though some of these share the same Intel Core i7 brand.

<span class="mw-page-title-main">LGA 2011</span>

LGA 2011, also called Socket R, is a CPU socket by Intel released on November 14, 2011. It launched along with LGA 1356 to replace its predecessor, LGA 1366 and LGA 1567. While LGA 1356 was designed for dual-processor or low-end servers, LGA 2011 was designed for high-end desktops and high-performance servers. The socket has 2011 protruding pins that touch contact points on the underside of the processor.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">LGA 3647</span> Intel CPU socket for servers (released 2016)

LGA 3647 is an Intel microprocessor compatible socket used by Xeon Phi x200, Xeon Phi 72x5, Skylake-SP, Cascade Lake-SP/AP, and Cascade Lake-W microprocessors.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

Compute Express Link (CXL) is an open standard for high-speed central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. CXL is built on the PCI Express (PCIe) physical and electrical interface and includes PCIe-based block input/output protocol (CXL.io) and new cache-coherent protocols for accessing system memory (CXL.cache) and device memory (CXL.mem).

References

  1. David Mulnix (September 14, 2017). "Intel® Xeon® Processor Scalable Family Technical Overview". Intel Corporation. Retrieved September 17, 2017.
  2. "The Mesh Topology & UPI - Intel Xeon Platinum 8176 Scalable Processor Review". 11 July 2017.