Compute Express Link

Last updated
Compute Express Link
ComputeExpressLinkLogo.png
Year created2019;5 years ago (2019)
Created by Intel
No. of devices4096
SpeedFull duplex
1.x, 2.0 (32 GT/s):
  • 3.938 GB/s (×1)
  • 63.015 GB/s (×16)

3.x (64 GT/s):
  • 7.563 GB/s (×1)
  • 121.0 GB/s (×16)
Style Serial
Website www.computeexpresslink.org

Compute Express Link (CXL) is an open standard interconnect for high-speed, high capacity central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. [1] [2] [3] [4] CXL is built on the serial PCI Express (PCIe) physical and electrical interface and includes PCIe-based block input/output protocol (CXL.io) and new cache-coherent protocols for accessing system memory (CXL.cache) and device memory (CXL.mem). The serial communication and pooling capabilities allows CXL memory to overcome performance and socket packaging limitations of common DIMM memory when implementing high storage capacities. [5] [6]

Contents

History

The CXL technology was primarily developed by Intel. The CXL Consortium was formed in March 2019 by founding members Alibaba Group, Cisco Systems, Dell EMC, Meta, Google, Hewlett Packard Enterprise (HPE), Huawei, Intel Corporation and Microsoft, [7] [8] and officially incorporated in September 2019. [9] As of January 2022, AMD, Nvidia, Samsung Electronics and Xilinx joined the founders on the board of directors, while ARM, Broadcom, Ericsson, IBM, Keysight, Kioxia, Marvell Technology, Mellanox, Microchip Technology, Micron, Oracle Corporation, Qualcomm, Rambus, Renesas, Seagate, SK Hynix, Synopsys, and Western Digital, among others, were contributing members. [10] [11] Industry partners include the PCI-SIG, [12] Gen-Z, [13] SNIA, [14] and DMTF. [15]

On April 2, 2020, the Compute Express Link and Gen-Z Consortiums announced plans to implement interoperability between the two technologies, [16] [17] with initial results presented in January 2021. [18] On November 10, 2021, Gen-Z specifications and assets were transferred to CXL, to focus on developing a single industry standard. [19] At the time of this announcement, 70% of Gen-Z members already joined the CXL Consortium. [20]

On August 1, 2022, OpenCAPI specifications and assets were transferred to the CXL Consortium, [21] [22] which now includes companies behind memory coherent interconnect technologies such as OpenCAPI (IBM), Gen-Z (HPE), and CCIX (Xilinx) open standards, and proprietary InfiniBand / RoCE (Mellanox), Infinity Fabric (AMD), Omni-Path and QuickPath/Ultra Path (Intel), and NVLink/NVSwitch (Nvidia) protocols. [23]

Specifications

On March 11, 2019, the CXL Specification 1.0 based on PCIe 5.0 was released. [8] It allows host CPU to access shared memory on accelerator devices with a cache coherent protocol. The CXL Specification 1.1 was released in June, 2019.

On November 10, 2020, the CXL Specification 2.0 was released. The new version adds support for CXL switching, to allow connecting multiple CXL 1.x and 2.0 devices to a CXL 2.0 host processor, and/or pooling each device to multiple host processors, in distributed shared memory and disaggregated storage configurations; it also implements device integrity and data encryption. [24] There is no bandwidth increase from CXL 1.x, because CXL 2.0 still utilizes PCIe 5.0 PHY.

On August 2, 2022, the CXL Specification 3.0 was released, based on PCIe 6.0 physical interface and PAM-4 coding with double the bandwidth; new features include fabrics capabilities with multi-level switching and multiple device types per port, and enhanced coherency with peer-to-peer DMA and memory sharing. [25] [26]

On November 14, 2023, the CXL Specification 3.1 was released.

Implementations

On April 2, 2019, Intel announced their family of Agilex FPGAs featuring CXL. [27]

On May 11, 2021, Samsung announced a 128 GB DDR5 based memory expansion module that allows for terabyte level memory expansion along with high performance for use in data centres and potentially next generation PCs. [28] An updated 512 GB version based on a proprietary memory controller was released on May 10, 2022. [29]

In 2021, CXL 1.1 support was announced for Intel Sapphire Rapids processors [30] and AMD Zen 4 EPYC "Genoa" and "Bergamo" processors. [31]

CXL devices were shown at the ACM/IEEE Supercomputing Conference (SC21) by vendors including Intel, [32] Astera, Rambus, Synopsys, Samsung, and Teledyne LeCroy. [33] [34] [35]

Protocols

The CXL transaction layer is composed of three dynamically multiplexed (they change accordingly to demand) sub-protocols on a single link: [36] [37] [24]

CXL.cache and CXL.mem protocols operate with a common link/transaction layer, which is separate from the CXL.io protocol link and transaction layer. These protocols/layers are multiplexed together by an Arbitration and Multiplexing (ARB/MUX) block before being transported over standard PCIe 5.0 PHY using fixed-width 528 bit (66 byte) Flow Control Unit (FLIT) block consisting of four 16-byte data 'slots' and a two-byte cyclic redundancy check (CRC) value. [37] CXL FLITs encapsulate PCIe standard Transaction Layer Packet (TLP) and Data Link Layer Packet (DLLP) data with a variable frame size format. [39] [40]

CXL 3.0 introduces 256-byte FLIT in PAM-4 transfer mode.

Device types

CXL is designed to support three primary device types: [24]

Type 2 devices implement two memory coherence modes, managed by device driver. In device bias mode, device directly accesses local memory, and no caching is performed by the CPU; in host bias mode, the host CPU's cache controller handles all access to device memory. Coherence mode can be set individually for each 4 KB page, stored in a translation table in local memory of Type 2 devices. Unlike other CPU-to-CPU memory coherency protocols, this arrangement only requires the host CPU memory controller to implement the cache agent; such asymmetric approach reduces implementation complexity and reduces latency. [37]

CXL 2.0 added support for switching in tree-based device fabrics, allowing PCIe, CXL 1.1 and CXL 2.0 devices to form virtual hierarchies of single- and multi-logic devices that can be managed by multiple hosts. [41]

CXL 3.0 replaced bias modes with enhanced coherency semantics, allowing Type 2 and Type 3 devices to back invalidate the data in the host cache when the device has made a change to the local memory. Enhanced coherency also helps implement peer-to-peer transfers within a virtual hierarchy of devices in the same coherency domain. It also supports memory sharing of the same memory segment between multiple devices, as opposed to memory pooling where each device was assigned a separate segment. [42]

CXL 3.0 allows multiple Type 1 and Type 2 devices per each CXL root port; it also adds multi-level switching, helping implement device fabrics with non-tree topologies like mesh, ring, or spline/leaf. Each node can be a host or a device of any type. Type 3 devices can implement Global Fabric Attached Memory (GFAM) mode, which connects a memory device to a switch node without requiring direct host connection. Devices and hosts use Port Based Routing (PBR) addressing mechanism that supports up to 4,096 nodes. [42]

Devices

In May 2022 the first 512 GB devices became available with 4 times more storage than previous devices. [43]

Latency

CXL memory controllers typically add about 200 ns of latency. [44]

See also

Related Research Articles

<span class="mw-page-title-main">Non-uniform memory access</span> Computer memory design used in multiprocessing

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. NUMA is beneficial for workloads with high memory locality of reference and low lock contention, because a processor may operate on a subset of memory mostly or entirely within its own cache node, reducing traffic on the memory bus.

HyperTransport (HT), formerly known as Lightning Data Transport, is a technology for interconnection of computer processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point link that was introduced on April 2, 2001. The HyperTransport Consortium is in charge of promoting and developing HyperTransport technology.

<span class="mw-page-title-main">PCI Express</span> Computer expansion bus standard

PCI Express, officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common motherboard interface for personal computers' graphics cards, capture cards, sound cards, hard disk drive host adapters, SSDs, Wi-Fi, and Ethernet hardware connections. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices, a more detailed error detection and reporting mechanism, and native hot-swap functionality. More recent revisions of the PCIe standard provide hardware support for I/O virtualization.

<span class="mw-page-title-main">Chipset</span> Electronic component to manage data flow of a CPU

In a computer system, a chipset is a set of electronic components on one or more integrated circuits that manages the data flow between the processor, memory and peripherals. The chipset is usually found on the motherboard of computers. Chipsets are usually designed to work with a specific family of microprocessors. Because it controls communications between the processor and external devices, the chipset plays a crucial role in determining system performance. Sometimes the term "chipset" is used to describe a system on chip (SoC) used in a mobile phone.

<span class="mw-page-title-main">Southbridge (computing)</span> One of the two chips in the core logic chipset architecture on a PC motherboard

On older personal computer motherboards, the southbridge is one of the two chips in the core logic chipset, handling many of a computer's input/output functions. The other component of the chipset is the northbridge, which generally handles high speed onboard communications.

The Intel QuickPath Interconnect (QPI) is a point-to-point processor interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. It increased the scalability and available bandwidth. Prior to the name's announcement, Intel referred to it as Common System Interface (CSI). Earlier incarnations were known as Yet Another Protocol (YAP) and YAP+.

The Arm Advanced Microcontroller Bus Architecture (AMBA) is an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. It facilitates development of multi-processor designs with large numbers of controllers and components with a bus architecture. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones. AMBA is a registered trademark of Arm Ltd.

<span class="mw-page-title-main">Sandy Bridge</span> Intel processor microarchitecture

Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors. The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture. Intel demonstrated an A1 stepping Sandy Bridge processor in 2009 during Intel Developer Forum (IDF), and released first products based on the architecture in January 2011 under the Core brand.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

<span class="mw-page-title-main">Zen (first generation)</span> 2017 AMD 14-nanometer processor microarchitecture

Zen is the first iteration in the Zen family of computer processor microarchitectures from AMD. It was first used with their Ryzen series of CPUs in February 2017. The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs, codenamed "Summit Ridge", reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017 and Zen-based APUs arrived in November 2017.

The Gen-Z Consortium is a trade group of technology vendors involved in designing CPUs, random access memory, servers, storage, and accelerators. The goal was to design an open and royalty-free "memory-semantic" bus protocol, which is not limited by the memory controller of a CPU, to be used in either a switched fabric or a point-to-point device link on a standard connector.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">Epyc</span> AMD brand for server microprocessors

Epyc is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system markets.

<span class="mw-page-title-main">Power10</span> 2020 family of multi-core microprocessors by IBM

Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, and announced in August 2020 at the Hot Chips conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server.

Granite Rapids is the codename for 6th generation Xeon Scalable server processors designed by Intel, launched on 24 September 2024. Featuring up to 128 P-cores, Granite Rapids is designed for high performance computing applications. The platform equivalent Sierra Forest processors with up to 288 E-cores launched in June 2024 before Granite Rapids.

Universal Chiplet Interconnect Express (UCIe) is an open specification for a die-to-die interconnect and serial bus between chiplets. It is co-developed by AMD, Arm, ASE Group, Google Cloud, Intel, Meta, Microsoft, Qualcomm, Samsung, and TSMC.

The ARM Neoverse is a group of 64-bit ARM processor cores licensed by Arm Holdings. The cores are intended for datacenter, edge computing, and high-performance computing use. The group consists of ARM Neoverse V-Series, ARM Neoverse N-Series, and ARM Neoverse E-Series.

References

  1. "ABOUT CXL". Compute Express Link. Retrieved 2019-08-09.
  2. "Synopsys Delivers Industry's First Compute Express Link (CXL) IP Solution for Breakthrough Performance in Data-Intensive SoCs". finance.yahoo.com. Yahoo! Finance . Retrieved 2019-11-09.
  3. "A Milestone in Moving Data". Intel Newsroom. Intel . Retrieved 2019-11-09.
  4. "Compute Express Link Consortium (CXL) Officially Incorporates; Announces Expanded Board of Directors". www.businesswire.com. Business Wire. 2019-09-17. Retrieved 2019-11-09.
  5. "StackPath". www.electronicdesign.com. 13 October 2021. Retrieved 2023-02-03.
  6. Mann, Tobias (2022-12-05). "Just How Bad Is CXL Memory Latency?". The Next Platform. Retrieved 2023-02-03.
  7. Calvert, Will (March 13, 2019). "Intel, Google and others join forces for CXL interconnect". www.datacenterdynamics.com.
  8. 1 2 Cutress, Ian. "CXL Specification 1.0 Released: New Industry High-Speed Interconnect From Intel". Anandtech. Retrieved 2019-08-09.
  9. "Compute Express Link Consortium (CXL) Officially Incorporates; Announces Expanded Board of Directors". www.businesswire.com. September 17, 2019.
  10. "Compute Express Link: Our Members". CXL Consortium. 2020. Retrieved 2020-09-25.
  11. Papermaster, Mark (July 18, 2019). "AMD Joins Consortia to Advance CXL, a New High-Speed Interconnect for Breakthrough Performance". Community.AMD. Retrieved 2020-09-25.
  12. "CXL Consortium and PCI-SIG Announce Marketing MOU Agreement". 23 September 2021. Archived from the original on 29 August 2023. Retrieved 18 January 2022.
  13. "Industry Liaisons".
  14. "SNIA and CXL Consortium Form Strategic Alliance". 3 November 2020. Archived from the original on 16 January 2022. Retrieved 16 January 2022.
  15. "DMTF and CXL Consortium Establish Work Register". 14 April 2020. Archived from the original on 29 August 2023. Retrieved 16 January 2022.
  16. "CXL Consortium and Gen-Z Consortium Announce MOU Agreement" (PDF). Beaverton, Oregon. April 2, 2020. Retrieved September 25, 2020.
  17. "CXL Consortium and Gen-Z Consortium Announce MOU Agreement". April 2, 2020. Retrieved April 11, 2020.
  18. "CXL™ Consortium and Gen-Z Consortium™ MoU Update: A Path to Protocol". 24 June 2021.
  19. Consortium, C. X. L. (November 10, 2021). "Exploring the Future". Compute Express Link.[ permanent dead link ]
  20. "CXL Will Absorb Gen-Z". 9 December 2021.
  21. OpenCAPI to Fold into CXL - CXL Set to Become Dominant CPU Interconnect Standard
  22. CXL Consortium and OpenCAPI Consortium Sign Letter of Intent to Transfer OpenCAPI Specifications to CXL [ permanent dead link ]
  23. Morgan, Timothy Prickett (November 23, 2021). "Finally, A Coherent Interconnect Strategy: CXL Absorbs Gen-Z". The Next Platform.
  24. 1 2 3 "Compute Express Link (CXL): All you need to know". Rambus.
  25. "Compute Express Link (CXL) 3.0 Announced: Doubled Speeds and Flexible Fabrics".
  26. "Compute Express Link (CXL) 3.0 Debuts, Wins CPU Interconnect Wars". 2 August 2022.
  27. "How do the new Intel Agilex FPGA family and the CXL coherent interconnect fabric intersect?". PSG@Intel. 2019-05-03. Retrieved 2019-08-09.
  28. "Samsung Unveils Industry-First Memory Module Incorporating New CXL Interconnect Standard". Samsung. 2021-05-11. Retrieved 2021-05-11.
  29. "Samsung Electronics Introduces Industry's First 512GB CXL Memory Module".
  30. "Intel Architecture Day 2021". Intel.
  31. Paul Alcorn (November 8, 2021). "AMD Unveils Zen 4 CPU Roadmap: 96-Core 5nm Genoa in 2022, 128-Core Bergamo in 2023". Tom's Hardware.
  32. Patrick Kennedy (December 7, 2021). "Intel Sapphire Rapids CXL with Emmitsburg PCH Shown at SC21". Serve the Home. Retrieved November 18, 2022.
  33. "CXL Put Through Its Paces". December 10, 2021.
  34. "CXL Consortium Showcases First Public Demonstrations of Compute Express Link Technology at SC21". HPCwire.
  35. Consortium, C. X. L. (December 16, 2021). "CXL Consortium Makes a Splash at Supercomputing 2021 (SC21)". Compute Express Link.
  36. "Introduction to Compute Express Link (CXL): The CPU-To-Device Interconnect Breakthrough - Compute Express Link". computeexpresslink.org. 2019-09-23. Retrieved 2024-07-16.
  37. 1 2 3 "Compute Express Link Standard | DesignWare IP | Synopsys". www.synopsys.com.
  38. 1 2 3 4 5 6 CXL Consortium (2021-04-02). Introduction to Compute Express Link™ (CXL™) Technology . Retrieved 2024-07-16 via YouTube.
  39. Consortium, C. X. L. (September 23, 2019). "Introduction to Compute Express Link (CXL): The CPU-To-Device Interconnect Breakthrough". Compute Express Link.
  40. https://www.flashmemorysummit.com/Proceedings2019/08-07-Wednesday/20190807_CTRL-202A-1_Lender.pdf [ bare URL PDF ]
  41. Danny Volkind and Elad Shlisberg (June 15, 2022). "CXL 1.1 vs CXL 2.0 – What's the difference?" (PDF). UnifabriX. Retrieved November 18, 2022.
  42. 1 2 https://www.computeexpresslink.org/_files/ugd/0c1418_a8713008916044ae9604405d10a7773b.pdf [ bare URL PDF ]
  43. "Samsung Electronics Introduces Industry's First 512GB CXL Memory Module" (Press release). Samsung. May 10, 2022.
  44. Mann, Tobias (2022-12-05). "Just How Bad Is CXL Memory Latency?". The Next Platform. Retrieved 2023-02-03.