Blackwell (microarchitecture)

Last updated

Blackwell
LaunchedMarch 18, 2024 (2024-03-18)
Designed by Nvidia
Manufactured by
Fabrication processTSMC 4NP
Specifications
Memory support HBM3e
PCIe support PCIe 6.0
Supported Graphics APIs
DirectX DirectX 12 Ultimate (Feature Level 12_2)
Direct3D Direct3D 12
Shader Model Shader Model 6.8
OpenCL OpenCL 3.0
OpenGL OpenGL 4.6
CUDA Compute Capability 10.x
Vulkan Vulkan 1.3
Supported Compute APIs
CUDA CUDA Toolkit 10.0
DirectCompute Yes
Media Engine
Encoder(s) supported NVENC
History
Predecessor Ada Lovelace (consumer)
Hopper (datacenter)

Blackwell is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Hopper and Ada Lovelace microarchitectures.

Contents

Named after statistician and mathematician David Blackwell, the name of the Blackwell architecture was leaked in 2022 with the B40 and B100 accelerators being confirmed in October 2023 with an official Nvidia roadmap shown during an investors presentation. [1] and was officially announced at Nvidia's GTC 2024 keynote on March 18, 2024. [2]

History

David Blackwell (1919-2010), eponym of architecture David Blackwell 1999 (re-scanned, cropped).jpg
David Blackwell (1919-2010), eponym of architecture

In March 2022, Nvidia announced Hopper architecture for datacenter for AI accelerators. Demand for Hopper products was high throughout 2023's AI hype. [3] The lead time from order to delivery of H100-based servers was between 36 and 52 weeks due to shortages and high demand. [4] Nvidia reportedly sold 500,000 Hopper-based H100 accelerators in Q3 2023 alone. [4] Nvidia's AI dominance with Hopper products led to the company increasing its market capitalization to over $2 trillion, behind only Microsoft and Apple. [5]

The Blackwell architecture is named after American mathematician David Blackwell who was known for his contributions to the mathematical fields of game theory, probability theory, information theory, and statistics. These areas have influenced or are implemented in transformer-based generative AI model designs or their training algorithms. Blackwell was the first African American scholar to be inducted into the National Academy of Sciences. [6]

In Nvidia's October 2023 Investor Presentation, its datacenter roadmap was updated to include reference to its B100 and B40 accelerators and the Blackwell architecture. [7] [8] Previously, the successor to Hopper was simply named on roadmaps as "Hopper-Next". Nvidia's updated roadmap emphasized the move from a two-year release cadence for datacenter products to yearly releases targeted for x86 and ARM systems.

At the Graphics Technology Conference (GTC) on March 18, 2024, Nvidia officially announced the Blackwell architecture with focus placed on its B100 and B200 datacenter accelerators. Nvidia CEO Jensen Huang said that with Blackwell, "we created a processor for the generative AI era" and emphasized the overall Blackwell platform combining Blackwell accelerators with Nvidia's ARM-based Grace CPU. [9] [10] Nvidia touted endorsements of Blackwell from the CEOs of Google, Meta, Microsoft, OpenAI and Oracle. [10] The keynote did not mention gaming.

Architecture

Blackwell is an architecture designed for both datacenter compute applications and for gaming and workstation applications with dedicated dies for each purpose. The GB100 die is for Blackwell datacenter products while GB200 series dies will be used for GeForce RTX 50 series graphics cards.

Process node

Blackwell is fabricated on the custom 4NP node from TSMC. 4NP is an enhancement of the 4N node used for the Hopper and Ada Lovelace architectures with an increase in transistor density. With the enhanced 4NP node, the GB100 die contains 104 billion transistors, a 30% increase over the 80 billion transistors in the previous generation Hopper GH100 die. [11] As Blackwell cannot reap the benefits that come with a major process node advancement, it must achieve power efficiency and performance gains through underlying architectural changes. [12]

The GB100 die is at the reticle limit of semiconductor fabrication. [13] The reticle limit in semiconductor fabrication is the physical size limit that lithography machines can etch a silicon die. Previously, Nvidia had nearly hit TSMC's reticle limit with GH100's 814mm2 die. In order to not be constrained by die size, Nvidia's B100 accelerator utilizes two GB100 dies in a single package, connected with a 10 TB/s link that Nvidia calls the NV-High Bandwidth Interface (NV-HBI). NV-HBI is based on the NVLink 5.0 protocol. Nvidia CEO Jensen Huang claimed in an interview with CNBC that Nvidia had spent around $10 billion in research and development for Blackwell's NV-HBI die interconnect. Veteran semiconductor engineer Jim Keller, who had worked on AMD's K7, K12 and Zen architectures, criticized this figure and claimed that the same outcome could be achieved for $1 billion through using Ultra Ethernet rather than the proprietry NVLink system. [14] The two connected GB100 dies are able to act like a large monolithic piece of silicon with full cache coherency between both dies. [15] The dual die package totals 208 billion transistors. [13] Those two GB100 dies are placed on top on a silicon interposer produced using TSMC's CoWoS-L 2.5D packaging technique. [16]

Streaming Multiprocessor

CUDA Cores

CUDA Compute Capability 10.0 is added with Blackwell.

Tensor Cores

The Blackwell architecture introduces fifth generation Tensor Cores for AI compute and performing floating-point calculations. In the datacenter, Blackwell adds support for FP4 and FP6 data types with eighth precision floating point processing. [17] The previous Hopper architecture introduced the Transformer Engine to divide FP32 data into FP8 to increase peak compute throughput. Blackwell's second generation Transformer Engine allows FP32 to be divided further, allow a doubling of FP8 compute performance. Using 4-bit data allows greater efficiency and throughput for model inference during generative AI training. [12] Nvidia claims 20 petaflops of FP4 compute with the dual GB100 die B100 accelerator. [18]

See also

Related Research Articles

<span class="mw-page-title-main">Nvidia</span> American multinational technology company

Nvidia Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. It is a software and fabless company which designs and supplies graphics processing units (GPUs), application programming interfaces (APIs) for data science and high-performance computing as well as system on a chip units (SoCs) for the mobile computing and automotive market. Nvidia is also a dominant supplier of artificial intelligence (AI) hardware and software.

The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a chip, transistor count does not represent how advanced the corresponding manufacturing technology is: a better indication of this is transistor density.

<span class="mw-page-title-main">GeForce 200 series</span> Series of GPUs by Nvidia

The GeForce 200 series is a series of Tesla-based GeForce graphics processing units developed by Nvidia.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Volta (microarchitecture)</span> GPU microarchitecture by Nvidia

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

<span class="mw-page-title-main">High Bandwidth Memory</span> Type of memory used on processors that require high transfer rate memory

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

Nvidia Drive is a computer platform by Nvidia, aimed at providing autonomous car and driver assistance functionality powered by deep learning. The platform was introduced at the Consumer Electronics Show (CES) in Las Vegas in January 2015. An enhanced version, the Drive PX 2 was introduced at CES a year later, in January 2016.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFET transistors.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

Nvidia DGX is a line of Nvidia-produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications. The typical design of a DGX system is based upon a rackmount chassis with motherboard that carries high performance x86 server CPUs. The main component of a DGX system is a set of 4 to 8 Nvidia Tesla GPU modules on an independent system board. DGX systems have large heatsinks and powerful fans to adequately cool thousands of watts of thermal output. The GPU modules are typically integrated into the system using a version of the SXM socket or by a PCIe x16 slot.

<span class="mw-page-title-main">AMD Instinct</span> Brand name by AMD; data center GPUs for high-performance-computing, machine learning

AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

<span class="mw-page-title-main">Ampere (microarchitecture)</span> GPU microarchitecture by Nvidia

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.

<span class="mw-page-title-main">Hopper (microarchitecture)</span> GPU microarchitecture designed by Nvidia

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is parallel to Ada Lovelace. It's the latest generation of Nvidia Tesla.

Nvidia GTC is a global artificial intelligence (AI) conference for developers that brings together developers, engineers, researchers, inventors, and IT professionals. Topics focus on AI, computer graphics, data science, machine learning and autonomous machines. Each conference begins with a keynote from Nvidia CEO and founder Jensen Huang, followed by a variety of sessions and talks with experts from around the world.

<span class="mw-page-title-main">RDNA 3</span> GPU microarchitecture by AMD

RDNA 3 is a GPU microarchitecture designed by AMD, released with the Radeon RX 7000 series on December 13, 2022. Alongside powering the RX 7000 series, RDNA 3 is also featured in the SoCs designed by AMD for the Asus ROG Ally and Lenovo Legion Go consoles.

<span class="mw-page-title-main">SXM (socket)</span> High performance computing socket

SXM is a high bandwidth socket solution for connecting Nvidia Compute Accelerators to a system. Each generation of Nvidia Tesla since P100 models, the DGX computer series and the HGX boards come with an SXM socket type that realizes high bandwidth, power delivery and more for the matching GPU daughter cards. Nvidia offers these combinations as an end-user product e.g. in their models of the DGX system series. Current socket generations are SXM for Pascal based GPUs, SXM2 and SXM3 for Volta based GPUs, SXM4 for Ampere based GPUs, and SXM5 for Hopper based GPUs. These sockets are used for specific models of these accelerators, and offer higher performance per card than PCIe equivalents. The DGX-1 system was the first to be equipped with SXM-2 sockets and thus was the first to carry the form factor compatible SXM modules with P100 GPUs and later was unveiled to be capable of allowing upgrading to SXM2 modules with V100 GPUs.

Nvidia BlueField is a line of data processing units (DPUs) designed and produced by Nvidia. Initially developed by Mellanox Technologies, the BlueField IP was acquired by Nvidia in March 2019, when Nvidia acquired Mellanox Technologies for US$6.9 billion. The first Nvidia produced BlueField cards, named BlueField-2, were shipped for review shortly after their announcement at VMworld 2019, and were officially launched at GTC 2020. Also launched at GTC 2020 was the Nvidia BlueField-2X, an Nvidia BlueField card with an Ampere generation graphics processing unit (GPU) integrated onto the same card. BlueField-3 and BlueField-4 DPUs were first announced at GTC 2021, with the tentative launch dates for these cards being 2022 and 2024 respectively.

<span class="mw-page-title-main">CDNA (microarchitecture)</span> AMD compute-focused GPU microarchitecture

CDNA is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA, a consumer graphics focused microarchitecture.

Ada Lovelace, also referred to simply as Lovelace, is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace, one of the first computer programmers. Nvidia announced the architecture along with the GeForce RTX 40 series consumer GPUs and the RTX 6000 Ada Generation workstation graphics card. The Lovelace architecture is fabricated on TSMC's custom 4N process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.

<span class="mw-page-title-main">Radeon RX 7000 series</span> Series of video cards by AMD

The Radeon RX 7000 series is a series of graphics processing units developed by AMD, based on their RDNA 3 architecture. It was announced on November 3, 2022 and is the successor to the Radeon RX 6000 series. Currently AMD has announced six graphics cards of the 7000 series: RX 7600, RX 7600 XT, RX 7700 XT, RX 7800 XT, RX 7900 XT and RX 7900 XTX. AMD officially launched the RX 7900 XT and RX 7900 XTX on December 13, 2022. AMD released the RX 7600 on May 25, 2023. AMD released their last two models of the RDNA 3 family on September 6, 2023; the 7700 XT and the 7800 XT. As of January 2024, they have also released the RX 7600 XT.

References

  1. "Nvidia Corporation - Nvidia Investor Presentation October 2023". Nvidia. Retrieved March 19, 2024.
  2. "Nvidia Blackwell Platform Arrives to Power a New Era of Computing". Nvidia Newsroom. Retrieved March 19, 2024.
  3. Szewczyk, Chris (August 18, 2023). "The AI hype means Nvidia is making shiploads of cash". Tom's Hardware. Retrieved March 24, 2024.
  4. 1 2 Shilov, Anton (November 28, 2023). "Nvidia sold half a million H100 AI GPUs in Q3 thanks to Meta, Facebook — lead times stretch up to 52 weeks: Report". Tom's Hardware. Retrieved March 24, 2024.
  5. King, Ian (March 19, 2024). "Nvidia Looks to Extend AI Dominance With New Blackwell Chips". Yahoo! Finance. Retrieved March 24, 2024.
  6. Lee, Jane Lanhee (March 19, 2024). "Why Nvidia's New Blackwell Chip Is Key to the Next Stage of AI". Bloomberg. Retrieved March 24, 2024.
  7. "Investor Presentation" (PDF). Nvidia. October 2023. Retrieved March 24, 2024.
  8. Garreffa, Anthony (October 10, 2023). "Nvidia's next-gen GB200 'Blackwell' GPU listed on its 2024 data center roadmap". TweakTown. Retrieved March 24, 2024.
  9. Leswing, Kif (March 18, 2024). "Nvidia CEO Jensen Huang announces new AI chips: 'We need bigger GPUs'". CNBC. Retrieved March 24, 2024.
  10. 1 2 Caulfield, Brian (March 18, 2024). "'We Created a Processor for the Generative AI Era,' Nvidia CEO Says". Nvidia. Retrieved March 24, 2024.
  11. Smith, Ryan (March 18, 2024). "Nvidia Blackwell Architecture and B200/B100 Accelerators Announced: Going Bigger With Smaller Data". AnandTech. Retrieved March 24, 2024.
  12. 1 2 Prickett Morgan, Timothy (March 18, 2024). "With Blackwell GPUs, AI Gets Cheaper and Easier, Competing with Nvidia Gets Harder". The Next Platform. Retrieved March 24, 2024.
  13. 1 2 "Nvidia Blackwell Platform Arrives to Power a New Era of Computing". Nvidia Newsroom. March 18, 2024. Retrieved March 24, 2024.
  14. Garreffa, Anthony (April 14, 2024). "Jim Keller laughs at $10B R&D cost for Nvidia Blackwell, should've used ethernet for $1B". TweakTown. Retrieved April 16, 2024.
  15. Hagedoom, Hilbert (March 18, 2024). "Nvidia B200 and GB200 AI GPUs Technical Overview: Unveiled at GTC 2024". Guru3D. Retrieved April 7, 2024.
  16. "Nvidia Blackwell "B100" to feature 2 dies and 192GB of HBM3e memory, B200 with 288GB". VideoCardz. March 17, 2024. Retrieved March 24, 2024.
  17. Edwards, Benj (March 18, 2024). "Nvidia unveils Blackwell B200, the "world's most powerful chip" designed for AI". Ars Technica. Retrieved March 24, 2024.
  18. "Introducing the New Nvidia Blackwell: A Technical Breakdown". BIOS IT. March 17, 2024. Retrieved March 24, 2024.