Release date | November 16, 2020 |
---|---|
Designed by | AMD |
Fabrication process | |
History | |
Predecessor | AMD FirePro |
Variant | RDNA (consumer, professional) |
CDNA (Compute DNA) is a compute centered Graphics Processing Unit (GPU) microarchitecture made by Advanced Micro Devices (AMD) for datacenters. CDNA is a successor to the Graphics Core Next (GCN) microarchitecture, the other successor is RDNA (Radeon DNA), a consumer graphics focused microarchitecture. CDNA was first announced on March 5th, 2020, [2] and was featured in the AMD Instinct MI100, launched November 16th, 2020. [3] This is CDNA's only produced product, manufactured on TSMC's N7 FinFET process.
The second Iteration of the CDNA line implements a Multi Chip Module (MCM) approach, differing from its predecessor's monolithic approach. Featured in the AMD Instinct MI250x, MI250, this MCM design used an Elevated Fanout Bridge (EFB) [4] to connect the dies. These two products were announced November 8th, 2021, and launched November 11th. The CDNA 2 line contains an additional latecomer using a monolithic design, the MI210. [5] The MI250x and MI250 are the first AMD product to use the Open Compute Project (OCP)'s OCP Accelerator Module (OAM) socket form factor. Lower wattage PCIE versions are available.
The third iteration of CDNA switches to a MCM design utilizing different chiplets manufactured on multiple nodes. Currently consisting of the MI300, this products contains 15 unique dies and is connected with advanced 3D packaging techniques. The MI300 was announced January 5th 2023, and is claimed to launch H2, 2023. [6]
Release date | November 16, 2020 |
---|---|
Fabrication process | TSMC N7 (FinFET) |
History | |
Predecessor | AMD FirePro |
Successor | CDNA 2 |
The CDNA family consists of one die, named Arcturus. The die is 750 millimeters squared, contains 25.6 billion transistors and is manufactured on TSMC's N7 node. [7] The Arcturus die posses 120 Compute Units, and a 4096 bit memory bus, connected to four HBM2 placements, giving the die 32 GB of memory, and just over 1200 GB/s of memory bandwidth. Compared to its predecessor, CDNA has removed all hardware related to graphics acceleration. This removal includes but is not limited to: graphics caches, tessellation hardware, Render Output units (ROPs), and the display engine. CDNA retains VCN for HEVC, H.264, and VP9 decoding. [8] CDNA has also added dedicated matrix compute hardware, similar to those added in Nvidia's Volta Architecure.
The 120 Compute Units (CU) are organized into 4 Asynchronous Compute Engines (ACE), each ACE maintains its own independent command execution and dispatch. At the CU level, CDNA Units are organized similarly to GCN units. Every CU contains four SIMD16, that each execute their 64 thread wavefront (Wave64) over four cycles.
CDNA has had a 20% clock bump for the HBM, resulting in roughly 200 GB/S bandwidth increase vs Vega 20 (GCN 5.0). The die has a shared 4 MB L2 cache that puts out 2kB per clock to the CU's. At the CU level each CU has its own L1 cache, a Local Data Store (LDS) with 64 kB per CU and a 4 kB Global Data Store (GDS), shared by all CUs. This GDS can be used to store control data, reduction operations or act as a small global shared surface. [8] [9]
In October 2022, Samsung demonstrated a Processing-In-Memory (PIM) specialized version of the MI100. In December 2022 Samsung showed off a cluster of 96 modified MI100s, boasting large increases in processing throughput for various workloads and significant reduction in power consumption. [10]
The Individual compute units remain incredibly similar to GCN but with the addition of 4 matrix units per CU. Support for more datatypes were added, with BF16, INT8 and INT4 being added. [8] For an extensive list of operations utilizing the matrix units and new datatypes, please reference the CDNA ISA Reference Guide.
Model (Code name) | Released | Architecture & fab | Transistors & die size | Core | Fillrate [lower-alpha 1] | Processing power (TFLOPS) | Memory | TBP | Software interface | Physical interface | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Vector [lower-alpha 1] [lower-alpha 2] | Matrix [lower-alpha 1] [lower-alpha 2] | |||||||||||||||||||||
Config [lower-alpha 3] | Clock [lower-alpha 1] (MHz) | Texture [lower-alpha 4] (GT/s) | Pixel [lower-alpha 5] (GP/s) | Half (FP16) | Single (FP32) | Double (FP64) | INT8 | BF16 | FP16 | FP32 | FP64 | Bus type & width | Size (GB) | Clock (MT/s) | Bandwidth (GB/s) | |||||||
AMD Instinct MI100 (Arcturus) [11] [12] | Nov 16, 2020 | CDNA TSMC N7 | 25.6×109 750 mm2 | 7680:480:- 120 CU | 1000 1502 | 480 720.96 | - | 15.72 23.10 | 7.86 11.5 | 122.88 184.57 | 61.44 92.28 | 122.88 184.57 | 30.72 46.14 | 15.36 23.07 | HBM2 4096-bit | 32 | 2400 | 1228 | 300 W | PCIe 4.0 ×16 | PCIe ×16 |
Release date | November 8, 2021 |
---|---|
Fabrication process | TSMC N6 |
History | |
Predecessor | CDNA 1 |
Successor | CDNA 3 |
Like CDNA, CDNA 2 also consists of one die, named Aldebaran. This die is estimated to be 790 millimeters squared, and contains 28 billion transistors while being manufactured on TSMC's N6 node. [13] The Aldebaran die contains only 112 Compute units, a 6.67% decrease from Arcturus. Like the previous generation this die contains a 4096 bit memory bus, now using HBM2e with a doubling in capacity, up to 64 GB. The largest change in CDNA 2 is the ability for two dies to be placed on the same package. The MI250x consists of 2 Aldebaran dies, 220 CU (110 per die) and 128 GB of HBM2e. These dies are connected with 4 Infinity fabric links, and addressed as independent GPUs by the host system. [14]
The 112 CU are organized similarly to CDNA, into 4 Asynchronous Compute Engines, each with 28 CUs, instead of the prior generations 30. Like CDNA, each CU contains four SIMD16 units executing a 64 thread wavefront across 4 cycle. The 4 Matrix Engines and vector units have added support for full rate FP64, enabling significant uplift over the prior generation. [15] CDNA2 also revises multiple internal caches, doubling bandwidth across the board.
The memory system in CDNA 2 sports across the board improvements. Starting with the move to HBM2e, doubling the quantity to 64 GB, and increasing bandwidth by roughly one third (~1200GB/s to 1600GB/s). [14] At the cache level. each GCD has a 16 way, 8 MB L2 cache that is partitioned into 32 slices. This cache puts out 4kB per clock, 128B per clock per slice, which is a doubling of the bandwidth from CDNA. [14] Additionally, the 4kB Global Data Store was removed. [15] All caches, including the L2 and LDS have support added for FP64 data.
CDNA2 brings forth the first product with multiple GPUs on the same package. The two gpus are connected by 4 infinity fabric links, with a total bidirectional bandwidth of 400 GB/s. [15] Each die contains 8 Infinity fabric links, each physically implemented with a 16 lane infinity link. When paired with an AMD processor, this will act as infinity fabric. if paired with any other x86 processor, this will fallback to 16 lanes of PCIE 4.0. [15]
The largest up front change is the additional of full rate FP64 support across all compute elements. This results in a 4x increase FP64 matrix calculations, with large increases in FP64 vector calculations. [14] Additionally support for packed FP32 operations were added, with opcodes like 'V_PK_FMA_F32' and 'V_PK_MUL_F32'. [16] Packed FP32 operations can enable up to 2x throughput, but do require code modification. [14] As with CDNA, for further information on CDNA 2 operations, please reference the CDNA 2 ISA Reference Guide.
This section is empty. You can help by adding to it. (April 2024) |
Release date | December 6, 2023 |
---|---|
Fabrication process | TSMC N5 & N6 |
History | |
Predecessor | CDNA 2 |
Unlike its predecessors, CDNA 3 consists of multiple dies, used in a multi chip system, similar to AMD's Zen 2, 3 and 4 line of products. The MI300 Package is comparatively massive, with 9 chiplets produced on 5nm, placed on top of 4 6nm chiplets. [6] This is all combined with 128GB of HBM3, using 8 HBM placements. [17] This Package contains an estimated 146 Billion Transistors. This product is expected to launch the second half of 2023, and more information will become available then. [6] [17]
This section is empty. You can help by adding to it. (April 2024) |
Model (Code name) | Release date | Architecture & fab | Transistors & die size | Core | Fillrate [lower-alpha 1] | Vector Processing power [lower-alpha 1] [lower-alpha 2] (TFLOPS) | Matrix Processing power [lower-alpha 1] [lower-alpha 2] (TFLOPS) | Memory | TBP | Software Interface | Physical Interface | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Config [lower-alpha 3] | Clock [lower-alpha 1] (MHz) | Texture [lower-alpha 4] (GT/s) | Pixel [lower-alpha 5] (GP/s) | Half (FP16) | Single (FP32) | Double (FP64) | INT8 | BF16 | FP16 | FP32 | FP64 | Bus type & width | Size (GB) | Clock (MT/s) | Bandwidth (GB/s) | |||||||
Tesla V100 (PCIE) (GV100) [18] [19] | May 10, 2017 | Volta TSMC 12 nm | 12.1×109 815 mm2 | 5120:320:128:640 80 SM | 1370 | 438.4 | 175.36 | 28.06 | 14.03 | 7.01 | N/A | N/A | N/A | 112.23 | N/A | HBM2 4096 bit | 16 32 | 1750 | 900 | 250 W | PCIe 3.0 ×16 | PCIe ×16 |
Tesla V100 (SXM) (GV100) [20] [21] | May 10, 2017 | 1455 | 465.6 | 186.24 | 29.80 | 14.90 | 7.46 | N/A | N/A | N/A | 119.19 | N/A | 300 W | NVLINK | SXM2 | |||||||
Radeon Instinct MI50 (Vega 20) [22] [23] [24] [25] [26] [27] | Nov 18, 2018 | GCN 5 TSMC 7 nm | 13.2×109 331 mm2 | 3840:240:64 60 CU | 1450 1725 | 348.0 414.0 | 92.80 110.4 | 22.27 26.50 | 11.14 13.25 | 5.568 6.624 | N/A | N/A | 26.5 | 13.3 | ? | HBM2 4096-bit | 16 32 | 2000 | 1024 | 300 W | PCIe 4.0 ×16 | PCIe ×16 |
Radeon Instinct MI60 (Vega 20) [23] [28] [29] [30] | 4096:256:64 64 CU | 1500 1800 | 384.0 460.8 | 96.00 115.2 | 24.58 29.49 | 12.29 14.75 | 6.144 7.373 | N/A | N/A | 32 | 16 | ? | ||||||||||
Tesla A100 (PCIE) (GA100) [31] [32] | May 14, 2020 | Ampere TSMC 7 nm | 54.2×109 826 mm2 | 6912:432:-:432 108 SM | 1065 1410 | 460.08 609.12 | - | 58.89 77.97 | 14.72 19.49 | 7.36 9.75 | 942.24 1247.47 | 235.56 311.87 | 235.56 311.87 | 117.78 155.93 | 14.72 19.49 | HBM2 5120 bit | 40 80 | 3186 | 2039 | 250 W | PCIe 4.0 ×16 | PCIe ×16 |
Tesla A100 (SXM) (GA100)) [33] [34] | 1275 1410 | 550.80 609.12 | - | 70.50 77.97 | 17.63 19.49 | 8.81 9.75 | 1128.04 1247.47 | 282.01 311.87 | 282.01 311.87 | 141.00 155.93 | 17.63 19.49 | 400 W | NVLINK | SXM4 | ||||||||
AMD Instinct MI100 (Arcturus) [35] [36] | Nov 16, 2020 | CDNA TSMC 7 nm | 25.6×109 750 mm2 | 7860:480:-:480 120 CU | 1000 1502 | 480 720.96 | - | ? | 15.72 23.10 | 7.86 11.5 | 122.88 184.57 | 61.44 92.28 | 122.88 184.57 | 30.72 46.14 | 15.36 23.07 | HBM2 4096-bit | 32 | 2400 | 1228 | 300 W | PCIe 4.0 ×16 | PCIe ×16 |
AMD Instinct MI250X (PCIE) (Aldebaran) | Nov 8, 2021 | CDNA 2 TSMC 6 nm | 58×109 1540 mm2 | 14080:880:-:880 220 CU | ||||||||||||||||||
AMD Instinct MI250X (OAM) (Aldebaran) | ||||||||||||||||||||||
Tesla H100 (PCIE) (GH100) | Mar 22, 2022 | Hopper TSMC 4 nm | 80×109 814 mm2 | |||||||||||||||||||
Tesla H100 (SXM) (GH100) |
The Northern Islands series is a family of GPUs developed by Advanced Micro Devices (AMD) forming part of its Radeon-brand, based on the 40 nm process. Some models are based on TeraScale 2 (VLIW5), some on the new TeraScale 3 (VLIW4) introduced with them.
The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process.
Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was launched on January 9, 2012.
The GeForce 900 series is a family of graphics processing units developed by Nvidia, succeeding the GeForce 700 series and serving as the high-end introduction to the Maxwell microarchitecture, named after James Clerk Maxwell. They are produced with TSMC's 28 nm process.
Zen is the codename for the first iteration in a family of computer processor microarchitectures of the same name from AMD. It was first used with their Ryzen series of CPUs in February 2017. The first Zen-based preview system was demonstrated at E3 2016, and first substantially detailed at an event hosted a block away from the Intel Developer Forum 2016. The first Zen-based CPUs, codenamed "Summit Ridge", reached the market in early March 2017, Zen-derived Epyc server processors launched in June 2017 and Zen-based APUs arrived in November 2017.
Zen 2 is a computer processor microarchitecture by AMD. It is the successor of AMD's Zen and Zen+ microarchitectures, and is fabricated on the 7 nm MOSFET node from TSMC. The microarchitecture powers the third generation of Ryzen processors, known as Ryzen 3000 for the mainstream desktop chips, Ryzen 4000U/H and Ryzen 5000U for mobile applications, as Threadripper 3000 for high-end desktop systems, and as Ryzen 4000G for accelerated processing units (APUs). The Ryzen 3000 series CPUs were released on 7 July 2019, while the Zen 2-based Epyc server CPUs were released on 7 August 2019. An additional chip, the Ryzen 9 3950X, was released in November 2019.
Radeon Pro is AMD's brand of professional oriented GPUs. It replaced AMD's FirePro brand in 2016. Compared to the Radeon brand for mainstream consumer/gamer products, the Radeon Pro brand is intended for use in workstations and the running of computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC), high-performance computing/GPGPU applications, and the creation and running of virtual reality programs and games.
AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.
Zen+ is the codename for a computer processor microarchitecture by AMD. It is the successor to the first gen Zen microarchitecture, and was first released in April 2018, powering the second generation of Ryzen processors, known as Ryzen 2000 for mainstream desktop systems, Threadripper 2000 for high-end desktop setups and Ryzen 3000G for accelerated processing units (APUs).
The Radeon RX Vega series is a series of graphics processors developed by AMD. These GPUs use the Graphics Core Next (GCN) 5th generation architecture, codenamed Vega, and are manufactured on 14 nm FinFET technology, developed by Samsung Electronics and licensed to GlobalFoundries. The series consists of desktop graphics cards and APUs aimed at desktops, mobile devices, and embedded applications.
The Radeon RX 5000 series is a series of graphics processors developed by AMD, based on their RDNA architecture. The series is targeting the mainstream mid to high-end segment and is the successor to the Radeon RX Vega series. The launch occurred on July 7, 2019. It is manufactured using TSMC's 7 nm FinFET semiconductor fabrication process.
The AMD Radeon 600 series is a series of graphics processors developed by AMD. Its cards are desktop and mobile rebrands of previous generation Polaris cards, available only for OEMs. The series is targeting the entry-level segment and launched on August 13, 2019.
RDNA 2 is a GPU microarchitecture designed by AMD, released with the Radeon RX 6000 series on November 18, 2020. Alongside powering the RX 6000 series, RDNA 2 is also featured in the SoCs designed by AMD for the PlayStation 5, Xbox Series X/S, and Steam Deck consoles.
Zen 3 is the codename for a CPU microarchitecture by AMD, released on November 5, 2020. It is the successor to Zen 2 and uses TSMC's 7 nm process for the chiplets and GlobalFoundries's 14 nm process for the I/O die on the server chips and 12 nm for desktop chips. Zen 3 powers Ryzen 5000 mainstream desktop processors and Epyc server processors. Zen 3 is supported on motherboards with 500 series chipsets; 400 series boards also saw support on select B450 / X470 motherboards with certain BIOSes. Zen 3 is the last microarchitecture before AMD switched to DDR5 memory and new sockets, which are AM5 for the desktop "Ryzen" chips alongside SP5 and SP6 for the EPYC server platform and sTRX8. According to AMD, Zen 3 has a 19% higher instructions per cycle (IPC) on average than Zen 2.
The Radeon RX 6000 series is a series of graphics processing units developed by AMD, based on their RDNA 2 architecture. It was announced on October 28, 2020 and is the successor to the Radeon RX 5000 series. It consists of the entry-level RX 6400, mid-range RX 6500 XT, high-end RX 6600, RX 6600 XT, RX 6650 XT, RX 6700, RX 6700 XT, upper high-end RX 6750 XT, RX 6800, RX 6800 XT, and enthusiast RX 6900 XT and RX 6950 XT for desktop computers; and the RX 6600M, RX 6700M, and RX 6800M for laptops. A sub-series for mobile, Radeon RX 6000S, was announced in CES 2022, targeting thin and light laptop designs.
RDNA 3 is a GPU microarchitecture designed by AMD, released with the Radeon RX 7000 series on December 13, 2022. Alongside powering the RX 7000 series, RDNA 3 is also featured in the SoCs designed by AMD for the Asus ROG Ally and Lenovo Legion Go consoles.
The Radeon RX 7000 series is a series of graphics processing units developed by AMD, based on their RDNA 3 architecture. It was announced on November 3, 2022 and is the successor to the Radeon RX 6000 series. Currently AMD has announced six graphics cards of the 7000 series: RX 7600, RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 XT and RX 7900 XTX. AMD officially launched the RX 7900 XT and RX 7900 XTX on December 13, 2022. AMD released the RX 7600 on May 25, 2023. AMD released their last two graphics processing units of the RDNA 3 family on September 6, 2023; the 7700 XT and the 7800 XT. As of January 2024, they have also released the RX 7600 XT.