Release date | June 20, 2017 |
---|---|
Designed by | AMD |
Marketed by | AMD |
Architecture | |
Models | MI Series |
Cores | 36-304 Compute Units (CUs) |
Transistors |
|
History | |
Predecessor |
AMD Instinct is AMD's brand of data center GPUs. [1] [2] It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.
The AMD Instinct product line directly competes with Nvidia's Tesla and Intel's Xeon Phi and Data Center GPU lines of machine learning and GPGPU cards.
The brand was originally known as AMD Radeon Instinct, but AMD dropped the Radeon brand from the name before AMD Instinct MI100 was introduced in November 2020.
In June 2022, supercomputers based on AMD's Epyc CPUs and Instinct GPUs took the lead on the Green500 list of the most power-efficient supercomputers with over 50% lead over any other, and held the top first 4 spots. [3] One of them, the AMD-based Frontier is since June 2022 and as of 2023 the fastest supercomputer in the world on the TOP500 list. [4] [5]
Accelerator | Launch date | Architecture | Lithography | Compute Units | Memory | PCIe support | Form factor | Processing power | TBP | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Size | Type | Bandwidth (GB/s) | FP16 | BF16 | FP32 | FP32 matrix | FP64 performance | FP64 matrix | INT8 | INT4 | ||||||||
MI6 | 2016-12-12 [6] | GCN 4 | 14 nm | 36 | 16 GB | GDDR5 | 224 | 3.0 | PCIe | 5.7 TFLOPS | N/A | 5.7 TFLOPS | N/A | 358 GFLOPS | N/A | N/A | N/A | 150 W |
MI8 | GCN 3 | 28 nm | 64 | 4 GB | HBM | 512 | 8.2 TFLOPS | 8.2 TFLOPS | 512 GFLOPS | 175 W | ||||||||
MI25 | GCN 5 | 14 nm | 16 GB | HBM2 | 484 | 26.4 TFLOPS | 12.3 TFLOPS | 768 GFLOPS | 300 W | |||||||||
MI50 | 2018-11-06 [7] | 7 nm | 60 | 1024 | 4.0 | 26.5 TFLOPS | 13.3 TFLOPS | 6.6 TFLOPS | 53 TOPS | 300 W | ||||||||
MI60 | 64 | 32 GB | 29.5 TFLOPS | 14.7 TFLOPS | 7.4 TFLOPS | 59 TOPS | 300 W | |||||||||||
MI100 | 2020-11-16 | CDNA | 120 | 1200 | 184.6 TFLOPS | 92.3 TFLOPS | 23.1 TFLOPS | 46.1 TFLOPS | 11.5 TFLOPS | 184.6 TOPS | 300 W | |||||||
MI210 | 2022-03-22 [8] | CDNA 2 | 6 nm | 104 | 64 GB | HBM2e | 1600 | 181 TFLOPS | 22.6 TFLOPS | 45.3 TFLOPS | 22.6 TFLOPS | 45.3 TFLOPS | 181 TOPS | 300 W | ||||
MI250 | 2021-11-08 [9] | 208 | 128 GB | 3200 | OAM | 362.1 TFLOPS | 45.3 TFLOPS | 90.5 TFLOPS | 45.3 TFLOPS | 90.5 TFLOPS | 362.1 TOPS | 560 W | ||||||
MI250X | 220 | 383 TFLOPS | 47.92 TFLOPS | 95.7 TFLOPS | 47.9 TFLOPS | 95.7 TFLOPS | 383 TOPS | 560 W | ||||||||||
MI300A | 2023-12-06 [10] | CDNA 3 | 6 & 5 nm | 228 | 128 GB | HBM3 | 5300 | 5.0 | APU SH5 socket | 980.6 TFLOPS 1961.2 TFLOPS (with Sparsity) | 122.6 TFLOPS | 61.3 TFLOPS | 122.6 TFLOPS | 1961.2 TOPS 3922.3 TOPS (with Sparsity) | N/A | 550 W 760 W (with liquid cooling) | ||
MI300X | 304 | 192 GB | OAM | 1307.4 TFLOPS 2614.9 TFLOPS (with Sparsity) | 163.4 TFLOPS | 81.7 TFLOPS | 163.4 TFLOPS | 2614.9 TOPS 5229.8 TOPS (with Sparsity) | N/A | 750 W | ||||||||
MI325X | 2024-06-02 [11] | 288 GB | HBM3e | 6000 |
The three initial Radeon Instinct products were announced on December 12, 2016, and released on June 20, 2017, with each based on a different architecture. [12] [13]
The MI6 is a passively cooled, Polaris 10 based card with 16 GB of GDDR5 memory and with a <150 W TDP. [1] [2] At 5.7 TFLOPS (FP16 and FP32), the MI6 is expected to be used primarily for inference, rather than neural network training. The MI6 has a peak double precision (FP64) compute performance of 358 GFLOPS. [14]
The MI8 is a Fiji based card, analogous to the R9 Nano, has a <175W TDP. [1] The MI8 has 4 GB of High Bandwidth Memory. At 8.2 TFLOPS (FP16 and FP32), the MI8 is marked toward inference. The MI8 has a peak (FP64) double precision compute performance 512 GFLOPS. [15]
The MI25 is a Vega based card, utilizing HBM2 memory. The MI25 performance is expected to be 12.3 TFLOPS using FP32 numbers. In contrast to the MI6 and MI8, the MI25 is able to increase performance when using lower precision numbers, and accordingly is expected to reach 24.6 TFLOPS when using FP16 numbers. The MI25 is rated at <300W TDP with passive cooling. The MI25 also provides 768 GFLOPS peak double precision (FP64) at 1/16th rate. [16]
The MI300A and MI300X are data center accelerators that use the CDNA 3 architecture, which is optimized for high-performance computing (HPC) and generative artificial intelligence (AI) workloads. The CDNA 3 architecture features a scalable chiplet design that leverages TSMC’s advanced packaging technologies, such as CoWoS (chip-on-wafer-on-substrate) and InFO (integrated fan-out), to combine multiple chiplets on a single interposer. The chiplets are interconnected by AMD’s Infinity Fabric, which enables high-speed and low-latency data transfer between the chiplets and the host system.
The MI300A is an accelerated processing unit (APU) that integrates 24 Zen 4 CPU cores with four CDNA 3 GPU cores, resulting in a total of 228 CUs in the GPU section, and 128 GB of HBM3 memory. The Zen 4 CPU cores are based on the 5 nm process node and support the x86-64 instruction set, as well as AVX-512 and BFloat16 extensions. The Zen 4 CPU cores can run general-purpose applications and provide host-side computation for the GPU cores. The MI300A has a peak performance of 61.3 TFLOPS of FP64 (122.6 TFLOPS FP64 matrix) and 980.6 TFLOPS of FP16 (1961.2 TFLOPS with sparsity), as well as 5.3 TB/s of memory bandwidth. The MI300A supports PCIe 5.0 and CXL 2.0 interfaces, which allow it to communicate with other devices and accelerators in a heterogeneous system.
The MI300X is a dedicated generative AI accelerator that replaces the CPU cores with additional GPU cores and HBM memory, resulting in a total of 304 CUs (64 cores per CU) and 192 GB of HBM3 memory. The MI300X is designed to accelerate generative AI applications, such as natural language processing, computer vision, and deep learning. The MI300X has a peak performance of 653.7 TFLOPS of TP32 (1307.4 TFLOPS with sparsity) and 1307.4 TFLOPS of FP16 (2614.9 TFLOPS with sparsity), as well as 5.3 TB/s of memory bandwidth. The MI300X also supports PCIe 5.0 and CXL 2.0 interfaces, as well as AMD’s ROCm software stack, which provides a unified programming model and tools for developing and deploying generative AI applications on AMD hardware. [17] [18] [19]
Following software is, as of 2022, regrouped under the Radeon Open Compute meta-project.
The MI6, MI8, and MI25 products all support AMD's MxGPU virtualization technology, enabling sharing of GPU resources across multiple users. [1] [20]
MIOpen is AMD's deep learning library to enable GPU acceleration of deep learning. [1] Much of this extends the GPUOpen's Boltzmann Initiative software. [20] This is intended to compete with the deep learning portions of Nvidia's CUDA library. It supports the deep learning frameworks: Theano, Caffe, TensorFlow, MXNet, Microsoft Cognitive Toolkit, Torch, and Chainer. Programming is supported in OpenCL and Python, in addition to supporting the compilation of CUDA through AMD's Heterogeneous-compute Interface for Portability and Heterogeneous Compute Compiler.
Model (Code name) | Launch | Architecture & fab | Transistors & die size | Core | Fillrate [lower-alpha 1] [lower-alpha 2] [lower-alpha 3] | Processing power [lower-alpha 1] [lower-alpha 4] (TFLOPS) | Memory | TBP | Bus interface | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Config [lower-alpha 5] | Clock [lower-alpha 1] (MHz) | Texture (GT/s) | Pixel (GP/s) | Half | Single | Double | Size (GB) | Bus type & width | Bandwidth (GB/s) | Clock (MT/s) | ||||||
Radeon Instinct MI6 (Polaris 10) [21] [22] [23] [24] [25] [26] | Jun 20, 2017 | GCN 4 GloFo 14LP | 5.7×109 232 mm2 | 2304:144:32 36 CU | 1120 1233 | 161.3 177.6 | 35.84 39.46 | 5.161 5.682 | 5.161 5.682 | 0.323 0.355 | 16 | GDDR5 256-bit | 224 | 7000 | 150 W | PCIe 3.0 ×16 |
Radeon Instinct MI8 (Fiji) [21] [22] [23] [27] [28] [29] | GCN 3 TSMC 28 nm | 8.9×109 596 mm2 | 4096:256:64 64 CU | 1000 | 256.0 | 64.00 | 8.192 | 8.192 | 0.512 | 4 | HBM 4096-bit | 512 | 1000 | 175 W | ||
Radeon Instinct MI25 (Vega 10) [21] [22] [23] [30] [31] [32] [33] | GCN 5 GloFo 14LP | 12.5×109 510 mm2 | 1400 1500 | 358.4 384.0 | 89.60 96.00 | 22.94 24.58 | 11.47 12.29 | 0.717 0.768 | 16 | HBM2 2048-bit | 484 | 1890 | 300 W | |||
Radeon Instinct MI50 (Vega 20) [34] [35] [36] [37] [38] [39] | Nov 18, 2018 | GCN 5 TSMC N7 | 13.2×109 331 mm2 | 3840:240:64 60 CU | 1450 1725 | 348.0 414.0 | 92.80 110.4 | 22.27 26.50 | 11.14 13.25 | 5.568 6.624 | 16 32 | HBM2 4096-bit | 1024 | 2000 | PCIe 4.0 ×16 | |
Radeon Instinct MI60 (Vega 20) [35] [40] [41] [42] | 4096:256:64 64 CU | 1500 1800 | 384.0 460.8 | 96.00 115.2 | 24.58 29.49 | 12.29 14.75 | 6.144 7.373 | 32 | ||||||||
AMD Instinct MI100 (Arcturus) [43] [44] [45] | Nov 16, 2020 | CDNA TSMC N7 | 25.6×109 750 mm2 | 7680:480:- 120 CU | 1000 1502 | 480.0 721.0 | — | 122.9 184.6 | 15.36 23.07 | 7.680 11.54 | 1228.8 | 2400 | ||||
AMD Instinct MI210 (Aldebaran) [46] [47] [48] | Mar 22, 2022 | CDNA 2 TSMC N6 | 28 x 109 ~770 mm2 | 6656:416:- 104 CU (1 × GCD) [lower-alpha 6] | 1000 1700 | 416.0 707.2 | 106.5 181.0 | 13.31 22.63 | 13.31 22.63 | 64 | HBM2E 4096-bit | 1638.4 | 3200 | |||
AMD Instinct MI250 (Aldebaran) [49] [50] [51] | Nov 8, 2021 | 58 x 109 1540 mm2 | 13312:832:- 208 CU (2 × GCD) | 832.0 1414 | 213.0 362.1 | 26.62 45.26 | 26.62 45.26 | 2 × 64 | HBM2E 2 × 4096-bit [lower-alpha 7] | 2 × 1638.4 | 500 W 560 W (Peak) | |||||
AMD Instinct MI250X (Aldebaran) [52] [50] [53] | 14080:880:- 220 CU (2 × GCD) | 880.0 1496 | 225.3 383.0 | 28.16 47.87 | 28.16 47.87 | |||||||||||
AMD Instinct MI300A (Antares) [54] [55] [56] [57] | Dec 6, 2023 | CDNA 3 TSMC N5 & N6 | 146 x 109 1017 mm2 | 14592:912:- 228 CU (6 × XCD) (24 AMD Zen 4 x86 CPU cores) | 2100 | 912.0 1550.4 | 980.6 1961.2 (With Sparsity) | 122.6 | 61.3 122.6 (FP64 Matrix) | 128 | HBM3 8192-bit | 5300 | 5200 | 550 W 760 W (Liquid Cooling) | PCIe 5.0 ×16 | |
AMD Instinct MI300X (Aqua Vanjaram) [58] [59] [60] [61] | 153 x 109 1017 mm2 | 19456:1216:- 304 CU (8 × XCD) | 1216.0 2062.1 | 1307.4 2614.9 (With Sparsity) | 163.4 | 81.7 163.4 (FP64 Matrix) | 192 | 750 W |
Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
Radeon is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Technologies, which was acquired by AMD in 2006 for US$5.4 billion.
AMD FireStream was AMD's brand name for their Radeon-based product line targeting stream processing and/or GPGPU in supercomputers. Originally developed by ATI Technologies around the Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor. The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the AMD FirePro line.
Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was launched on January 9, 2012.
The Radeon 200 series is a series of graphics processors developed by AMD. These GPUs are manufactured on a 28 nm Gate-Last process through TSMC or Common Platform Alliance.
Video Code Engine is AMD's video encoding application-specific integrated circuit implementing the video codec H.264/MPEG-4 AVC. Since 2012 it was integrated into all of their GPUs and APUs except Oland.
Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.
The Radeon 400 series is a series of graphics processors developed by AMD. These cards were the first to feature the Polaris GPUs, using the new 14 nm FinFET manufacturing process, developed by Samsung Electronics and licensed to GlobalFoundries. The Polaris family initially included two new chips in the Graphics Core Next (GCN) family. Polaris implements the 4th generation of the Graphics Core Next instruction set, and shares commonalities with the previous GCN microarchitectures.
The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.
Radeon Pro is AMD's brand of professional oriented GPUs. It replaced AMD's FirePro brand in 2016. Compared to the Radeon brand for mainstream consumer/gamer products, the Radeon Pro brand is intended for use in workstations and the running of computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC), high-performance computing/GPGPU applications, and the creation and running of virtual reality programs and games.
The Radeon RX Vega series is a series of graphics processors developed by AMD. These GPUs use the Graphics Core Next (GCN) 5th generation architecture, codenamed Vega, and are manufactured on 14 nm FinFET technology, developed by Samsung Electronics and licensed to GlobalFoundries. The series consists of desktop graphics cards and APUs aimed at desktops, mobile devices, and embedded applications.
ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), and OpenCL.
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.
RDNA is a graphics processing unit (GPU) microarchitecture and accompanying instruction set architecture developed by AMD. It is the successor to their Graphics Core Next (GCN) microarchitecture/instruction set. The first product lineup featuring RDNA was the Radeon RX 5000 series of video cards, launched on July 7, 2019. The architecture is also used in mobile products. It is manufactured and fabricated with TSMC's N7 FinFET graphics chips used in the Navi series of AMD Radeon graphics cards.
The Radeon RX 6000 series is a series of graphics processing units developed by AMD, based on their RDNA 2 architecture. It was announced on October 28, 2020 and is the successor to the Radeon RX 5000 series. It consists of the entry-level RX 6400, mid-range RX 6500 XT, high-end RX 6600, RX 6600 XT, RX 6650 XT, RX 6700, RX 6700 XT, upper high-end RX 6750 XT, RX 6800, RX 6800 XT, and enthusiast RX 6900 XT and RX 6950 XT for desktop computers; and the RX 6600M, RX 6700M, and RX 6800M for laptops. A sub-series for mobile, Radeon RX 6000S, was announced in CES 2022, targeting thin and light laptop designs.
RDNA 3 is a GPU microarchitecture designed by AMD, released with the Radeon RX 7000 series on December 13, 2022. Alongside powering the RX 7000 series, RDNA 3 is also featured in the SoCs designed by AMD for the Asus ROG Ally and Lenovo Legion Go consoles.
CDNA is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA, a consumer graphics focused microarchitecture.
The Radeon RX 7000 series is a series of graphics processing units developed by AMD, based on their RDNA 3 architecture. It was announced on November 3, 2022 and is the successor to the Radeon RX 6000 series. Currently AMD has announced and released seven graphics cards of the Radeon RX 7000 series: RX 7600, RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 GRE, RX 7900 XT, and RX 7900 XTX. AMD officially launched the RX 7900 XT and RX 7900 XTX on December 13, 2022. AMD released the RX 7600 on May 25, 2023. AMD released their last two models of the RDNA 3 family on September 6, 2023; the 7700 XT and the 7800 XT. As of January 2024, AMD have also released the RX 7600 XT and the RX 7900 GRE.
{{cite web}}
: CS1 maint: multiple names: authors list (link)