Cache coherent interconnect for accelerators

Last updated August 07, 2022

The cache coherent interconnect for accelerators (CCIX) protocol is the result of an effort of a joint group of computer, hardware and software component vendors: AMD, ARM, Huawei, Mellanox Technologies, Qualcomm and Xilinx.^[1]

The mission of the CCIX Consortium is to develop and promote adoption of an industry standard specification to enable coherent interconnect technologies between general-purpose processors and acceleration devices for efficient heterogeneous computing.
The CCIX standard is available to member companies with initial products expected in 2017.

Current Status

As of 2021, the future of CCIX as widely used interface to attach accelerators to servers in a cache-coherent fashion is doubtful. Many of the original members of the CCIX Consortium have instead opted to support the competing Compute Express Link (CXL) standard, originally developed by Intel but then opened up, for the same purpose. This has led to processors originally announced with CCIX support, such as the AMD Epyc 7002, never actually shipping with that feature enabled. On the side of servers with ARM processors that also list CCIX in their specifications, such as the TaiShan 200 series from Huawei, many do not support the use of CCIX in practice, as it has not been fully validated.

Related Research Articles

Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. The benefits of NUMA are limited to particular workloads, notably on servers where the data is often associated strongly with certain tasks or users.

HyperTransport (HT), formerly known as Lightning Data Transport (LDT), is a technology for interconnection of computer processors. It is a bidirectional serial/parallel high-bandwidth, low-latency point-to-point link that was introduced on April 2, 2001. The HyperTransport Consortium is in charge of promoting and developing HyperTransport technology.

The Scalable Coherent Interface or Scalable Coherent Interconnect (SCI), is a high-speed interconnect standard for shared memory multiprocessing and message passing. The goal was to scale well, provide system-wide memory coherence and a simple interface; i.e. a standard to replace existing buses in multiprocessor systems with one with no inherent scalability and performance limitations.

Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The most-recently released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.

In computing, remote direct memory access (RDMA) is a direct memory access from the memory of one computer into that of another without involving either one's operating system. This permits high-throughput, low-latency networking, which is especially useful in massively parallel computer clusters.

The RapidIO architecture is a high-performance packet-switched electrical connection technology. RapidIO supports messaging, read/write and cache coherency semantics. RapidIO fabrics guarantee in-order packet delivery, enabling power- and area- efficient protocol implementation in hardware. Based on industry-standard electrical specifications such as those for Ethernet, RapidIO can be used as a chip-to-chip, board-to-board, and chassis-to-chassis interconnect..

The ARM Advanced Microcontroller Bus Architecture (AMBA) is an open-standard, on-chip interconnect specification for the connection and management of functional blocks in system-on-a-chip (SoC) designs. It facilitates development of multi-processor designs with large numbers of controllers and components with a bus architecture. Since its inception, the scope of AMBA has, despite its name, gone far beyond microcontroller devices. Today, AMBA is widely used on a range of ASIC and SoC parts including applications processors used in modern portable mobile devices like smartphones. AMBA is a registered trademark of ARM Ltd.

A multi-core processor is a computer processor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

An Advanced Encryption Standard instruction set is now integrated into many processors. The purpose of the instruction set is to improve the speed and security of applications performing encryption and decryption using Advanced Encryption Standard (AES).

POWER8 2014 family of multi-core microprocessors by IBM

POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors.

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.

HiSilicon Chinese fabless semiconductor manufacturing company, fully owned by Huawei

HiSilicon is a Chinese fabless semiconductor company based in Shenzhen, Guangdong and wholly owned by Huawei. HiSilicon purchases licenses for CPU designs from ARM Holdings, including the ARM Cortex-A9 MPCore, ARM Cortex-M3, ARM Cortex-A7 MPCore, ARM Cortex-A15 MPCore, ARM Cortex-A53, ARM Cortex-A57 and also for their Mali graphics cores. HiSilicon has also purchased licenses from Vivante Corporation for their GC4000 graphics core.

The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as processor specifications, firmware and software with a liberal license, and will be using a collaborative development model with their partners.

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

The Gen-Z Consortium is a trade group of technology vendors involved in designing CPUs, random access memory, servers, storage, and accelerators. The goal was to design an open and royalty-free "memory-semantic" bus protocol, which is not limited by the memory controller of a CPU, to be used in either a switched fabric or a point-to-point device link on a standard connector.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

Epyc is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system markets. Epyc processors share the same microarchitecture as their regular desktop-grade counterparts, but have enterprise-grade features such as higher core counts, more PCI Express lanes, support for larger amounts of RAM, and larger cache memory. They also support multi-chip and dual-socket system configurations by using the Infinity Fabric interconnect.

Compute Express Link (CXL) is an open standard for high-speed central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. CXL is built on the PCI Express (PCIe) physical and electrical interface and includes PCIe-based block input/output protocol (CXL.io) and new cache-coherent protocols for accessing system memory (CXL.cache) and device memory (CXL.mem).

The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. It powers the Fugaku supercomputer, the fastest supercomputer in the world by TOP500 rankings as of June 2020 as well as November 2020, June 2021 and November 2021.

Universal Chiplet Interconnect Express (UCIe) is an open specification for a die-to-die interconnect and serial bus between chiplets. It is co-developed by AMD, Arm, ASE Group, Google Cloud, Intel, Meta, Microsoft, Qualcomm, Samsung, and TSMC.

References

↑ Jeff Defilippi (October 11, 2016). "How do AMBA, CCIX and GenZ address the needs of the data center?". ARM Community blog. Retrieved March 30, 2017.

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Jeff Defilippi (October 11, 2016). "How do AMBA, CCIX and GenZ address the needs of the data center?". ARM Community blog. Retrieved March 30, 2017.

[1]

v t e Technical and de facto standards for wired computer buses
General	System bus Front-side bus Back-side bus Daisy chain Control bus Address bus Bus contention Bus mastering Network on a chip Plug and play List of bus bandwidths
Standards	SS-50 bus S-100 bus Multibus Unibus VAXBI MBus STD Bus SMBus Q-Bus Europe Card Bus ISA STEbus Zorro II Zorro III CAMAC FASTBUS LPC HP Precision Bus EISA VME VXI VXS NuBus TURBOchannel MCA SBus VLB HP GSC bus InfiniBand Ethernet UPA PCI PCI Extended (PCI-X) PXI PCI Express (PCIe) AGP Compute Express Link (CXL) Direct Media Interface (DMI) RapidIO Intel QuickPath Interconnect NVLink HyperTransport Infinity Fabric Intel Ultra Path Interconnect Coherent Accelerator Processor Interface (CAPI) SpaceWire
Storage	ST-506 ESDI IPI SMD Parallel ATA (PATA) SSA DSSI HIPPI Serial ATA (SATA) SCSI Parallel SAS Fibre Channel SATAe PCI Express (via AHCI or NVMe logical device interface)
Peripheral	Apple Desktop Bus Atari SIO DCB Commodore bus HP-IL HIL MIDI RS-232 RS-422 RS-423 RS-485 Lightning DMX512-A IEEE-488 (GPIB) IEEE-1284 (parallel port) UNI/O 1-Wire I²C (ACCESS.bus, PMBus, SMBus) I3C SPI D²B Parallel SCSI Profibus IEEE 1394 (FireWire) USB Camera Link External PCIe Thunderbolt
Audio	ADAT Lightpipe AES3 Intel HD Audio I²S MADI McASP S/PDIF TOSLINK
Portable	PC Card ExpressCard
Embedded	Multidrop bus CoreConnect AMBA (AXI) Wishbone SLIMbus
Interfaces are listed by their speed in the (roughly) ascending order, so the interface at the end of each section should be the fastest. Category

Cache coherent interconnect for accelerators

Contents

Current Status

Related Research Articles

References

External links