Blackwell (microarchitecture)

Last updated

Blackwell
LaunchedQ4 2024
Designed by Nvidia
Manufactured by
Fabrication processTSMC 4NP
Codename(s)GB100
GB20x
Product Series
Desktop
Specifications
Memory support GDDR7 (Consumer)
HBM3e (Datacenter)
PCIe support PCIe 5.0 (Consumer)
PCIe 6.0 (Datacenter)
Supported Graphics APIs
DirectX DirectX 12 Ultimate (Feature Level 12_2)
Direct3D Direct3D 12
Shader Model Shader Model 6.8
OpenCL OpenCL 3.0
OpenGL OpenGL 4.6
Vulkan Vulkan 1.4
Supported Compute APIs
CUDA Compute Capability 10.x
DirectCompute Yes
Media Engine
Encoder(s) supported NVENC
History
Predecessor Ada Lovelace (consumer)
Hopper (datacenter)
Successor Rubin

Blackwell is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Hopper and Ada Lovelace microarchitectures.

Contents

Named after statistician and mathematician David Blackwell, the name of the Blackwell architecture was leaked in 2022 with the B40 and B100 accelerators being confirmed in October 2023 with an official Nvidia roadmap shown during an investors presentation. [1] It was officially announced at Nvidia's GTC 2024 keynote on March 18, 2024. [2]

History

David Blackwell (1919-2010) David Blackwell 1999 (re-scanned, cropped).jpg
David Blackwell (1919–2010)

In March 2022, Nvidia announced the Hopper datacenter architecture for AI accelerators. Demand for Hopper products was high throughout 2023's AI hype. [3] The lead time from order to delivery of H100-based servers was between 36 and 52 weeks due to shortages and high demand. [4] Nvidia reportedly sold 500,000 Hopper-based H100 accelerators in Q3 2023 alone. [4] Nvidia's AI dominance with Hopper products led to the company increasing its market capitalization to over $2 trillion, behind only Microsoft and Apple. [5]

The Blackwell architecture is named after American mathematician David Blackwell who was known for his contributions to the mathematical fields of game theory, probability theory, information theory, and statistics. These areas have influenced or are implemented in transformer-based generative AI model designs or their training algorithms. Blackwell was the first African American scholar to be inducted into the National Academy of Sciences. [6]

In Nvidia's October 2023 Investor Presentation, its datacenter roadmap was updated to include reference to its B100 and B40 accelerators and the Blackwell architecture. [7] [8] Previously, the successor to Hopper was simply named on roadmaps as "Hopper-Next". Nvidia's updated roadmap emphasized the move from a two-year release cadence for datacenter products to yearly releases targeted for x86 and ARM systems.

At the Graphics Technology Conference (GTC) on March 18, 2024, Nvidia officially announced the Blackwell architecture with focus placed on its B100 and B200 datacenter accelerators and associated products, such as the eight-GPU HGX B200 board and the 72-GPU NVL72 rack-scale system. [9] Nvidia CEO Jensen Huang said that with Blackwell, "we created a processor for the generative AI era" and emphasized the overall Blackwell platform combining Blackwell accelerators with Nvidia's ARM-based Grace CPU. [10] [11] Nvidia touted endorsements of Blackwell from the CEOs of Google, Meta, Microsoft, OpenAI and Oracle. [11] The keynote did not mention gaming.

It was reported in October 2024 that there was a design flaw in the Blackwell architecture that had been fixed in collaboration with TSMC. [12] According to Huang, the design flaw was "functional" and "caused the yield[s] to be low". [13] By November 2024, Morgan Stanley was reporting that "the entire 2025 production" of Blackwell silicon was "already sold out". [14]

Architecture

Blackwell is an architecture designed for both datacenter compute applications and for gaming and workstation applications with dedicated dies for each purpose.

Process node

Blackwell is fabricated on the custom 4NP node from TSMC. 4NP is an enhancement of the 4N node used for the Hopper and Ada Lovelace architectures. The Nvidia-specific 4NP process likely adds metal layers to the standard TSMC N4P technology. [15] The GB100 die contains 104 billion transistors, a 30% increase over the 80 billion transistors in the previous generation Hopper GH100 die. [16] As Blackwell cannot reap the benefits that come with a major process node advancement, it must achieve power efficiency and performance gains through underlying architectural changes. [17]

The GB100 die is at the reticle limit of semiconductor fabrication. [18] The reticle limit in semiconductor fabrication is the maximum size of features that lithography machines can etch into a silicon die. Previously, Nvidia had nearly hit TSMC's reticle limit with GH100's 814 mm2 die. In order to not be constrained by die size, Nvidia's B100 accelerator utilizes two GB100 dies in a single package, connected with a 10 TB/s link that Nvidia calls the NV-High Bandwidth Interface (NV-HBI). NV-HBI is based on the NVLink 5.0 protocol. Nvidia CEO Jensen Huang claimed in an interview with CNBC that Nvidia had spent around $10 billion in research and development for Blackwell's NV-HBI die interconnect. Veteran semiconductor engineer Jim Keller, who had worked on AMD's K7, K12 and Zen architectures, criticized this figure and claimed that the same outcome could be achieved for $1 billion through using Ultra Ethernet rather than the proprietary NVLink system. [19] The two connected GB100 dies are able to act like a large monolithic piece of silicon with full cache coherency between both dies. [20] The dual die package totals 208 billion transistors. [18] Those two GB100 dies are placed on top of a silicon interposer produced using TSMC's CoWoS-L 2.5D packaging technique. [21]

On the consumer side, Blackwell's largest die, GB202, measures in at 744mm2 which is 20% larger than AD102, Ada Lovelace's largest die. [22] GB202 contains a total of 24,576 CUDA cores, 28.5% more than the 18,432 CUDA cores in AD102. GB202 is the largest consumer die designed by Nvidia since the 754mm2 TU102 die in 2018, based on the Turing microarchitecture. The gap between GB202 and GB203 has also gotten much wider compared to previous generations. GB202 features more than double the number of CUDA cores than GB203 which was not the case with AD102 over AD103.

Streaming multiprocessor

CUDA cores

CUDA Compute Capability 10.0 is added with Blackwell.

Tensor cores

The Blackwell architecture introduces fifth-generation Tensor Cores for AI compute and performing floating-point calculations. In the data center, Blackwell adds support for FP4 and FP6 data types. [23] The previous Hopper architecture introduced the Transformer Engine, software to facilitate quantization of higher-precision models (e.g., FP32) to lower precision, for which Hopper has greater throughput. Blackwell's second generation Transformer Engine adds support for the newer, less-precise FP4 and FP6 types. Using 4-bit data allows greater efficiency and throughput for model inference during generative AI training. [17] Nvidia claims 20 petaflops (excluding the 2x gain the company claims for sparsity) of FP4 compute for the dual-GPU GB200 superchip. [24]

Blackwell dies

Datacenter

Die GB100
Variant(s)
Release dateDec 2024
Cores CUDA Cores
TMUs
ROPs
RT Cores
Tensor Cores
Streaming Multiprocessors
CacheL1
L2
Memory interface
Die size
Transistor count104 bn.
Transistor density
Package socket
ProductsB100
B200

Consumer

Die GB202 GB203 GB205 GB206 GB207
Variant(s)GB202-300-A1GB203-300-A1
GB203-400-A1
Release dateJan 30, 2025Jan 30, 2025Feb 2025Mar 2025TBA
Cores CUDA Cores 24,57610,7526,4004,6082,560
TMUs 76833620014480
ROPs 3841681007240
RT Cores 19284503620
Tensor Cores 76833620014480
SMs19284503620
GPCs127532
CacheL124 MB10.5 MB6.25 MB4.5 MB2.5 MB
L288 MB64 MB40 MB32 MB32 MB
Memory interface512-bit256-bit192-bit128-bit128-bit
Die size744 mm2377 mm2
Transistor count92 bn.
Transistor density
Package socket
Products
ConsumerDesktopRTX 5090RTX 5070 Ti
RTX 5080   
RTX 5070
MobileRTX 5080 Laptop
RTX 5090 Laptop
RTX 5070 Ti LaptopRTX 5070 Laptop
WorkstationDesktop
Mobile

See also

Related Research Articles

<span class="mw-page-title-main">Nvidia</span> American multinational technology company

Nvidia Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, it is a software and fabless company which designs and supplies graphics processing units (GPUs), application programming interfaces (APIs) for data science and high-performance computing, and system on a chip units (SoCs) for mobile computing and the automotive market. Nvidia is also a dominant supplier of artificial intelligence (AI) hardware and software.

<span class="mw-page-title-main">GeForce</span> Brand of GPUs by Nvidia

GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a die, transistor count does not represent how advanced the corresponding manufacturing technology is. A better indication of this is transistor density which is the ratio of a semiconductor's transistor count to its die area.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs. CUDA was created by Nvidia in 2006. When it was first introduced, the name was an acronym for Compute Unified Device Architecture, but Nvidia later dropped the common use of the acronym and now rarely expands it.

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">GeForce 400 series</span> Series of GPUs by Nvidia

The GeForce 400 series is a series of graphics processing units developed by Nvidia, serving as the introduction of the Fermi microarchitecture. Its release was originally slated in November 2009, however, after delays, it was released on March 26, 2010, with availability following in April 2010.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 27, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

<span class="mw-page-title-main">Volta (microarchitecture)</span> GPU microarchitecture by Nvidia

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

Nvidia Drive is a computer platform by Nvidia, aimed at providing autonomous car and driver assistance functionality powered by deep learning. The platform was introduced at the Consumer Electronics Show (CES) in Las Vegas in January 2015. An enhanced version, the Drive PX 2 was introduced at CES a year later, in January 2016.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.

<span class="mw-page-title-main">Hopper (microarchitecture)</span> GPU microarchitecture designed by Nvidia

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.

Nvidia GTC is a global artificial intelligence (AI) conference for developers that brings together developers, engineers, researchers, inventors, and IT professionals. Topics focus on AI, computer graphics, data science, machine learning and autonomous machines. Each conference begins with a keynote from Nvidia CEO and founder Jensen Huang, followed by a variety of sessions and talks with experts from around the world.

<span class="mw-page-title-main">CDNA (microarchitecture)</span> AMD compute-focused GPU microarchitecture

CDNA is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA, a consumer graphics focused microarchitecture.

Ada Lovelace, also referred to simply as Lovelace, is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace, one of the first computer programmers. Nvidia announced the architecture along with the GeForce RTX 40 series consumer GPUs and the RTX 6000 Ada Generation workstation graphics card. The Lovelace architecture is fabricated on TSMC's custom 4N process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.

<span class="mw-page-title-main">GeForce 50 series</span> Upcoming series of GPUs by Nvidia

The GeForce 50 series is an upcoming series of consumer graphics processing units (GPUs) developed by Nvidia as part of its GeForce line of graphics cards, succeeding the GeForce 40 series. Announced at CES 2025, it will debut with the release of the RTX 5080 and RTX 5090 on January 30, 2025. It is based on Nvidia's Blackwell architecture featuring Nvidia RTX's fourth-generation RT cores for hardware-accelerated real-time ray tracing, and fifth-generation deep-learning-focused Tensor Cores. The GPUs are manufactured by TSMC on an improved custom 4NP process node.

References

  1. "Nvidia Corporation - Nvidia Investor Presentation October 2023". Nvidia. Retrieved March 19, 2024.
  2. "Nvidia Blackwell Platform Arrives to Power a New Era of Computing". Nvidia Newsroom. Retrieved March 19, 2024.
  3. Szewczyk, Chris (August 18, 2023). "The AI hype means Nvidia is making shiploads of cash". Tom's Hardware. Retrieved March 24, 2024.
  4. 1 2 Shilov, Anton (November 28, 2023). "Nvidia sold half a million H100 AI GPUs in Q3 thanks to Meta, Facebook — lead times stretch up to 52 weeks: Report". Tom's Hardware. Retrieved March 24, 2024.
  5. King, Ian (March 19, 2024). "Nvidia Looks to Extend AI Dominance With New Blackwell Chips". Yahoo! Finance. Retrieved March 24, 2024.
  6. Lee, Jane Lanhee (March 19, 2024). "Why Nvidia's New Blackwell Chip Is Key to the Next Stage of AI". Bloomberg. Retrieved March 24, 2024.
  7. "Investor Presentation" (PDF). Nvidia. October 2023. Retrieved March 24, 2024.
  8. Garreffa, Anthony (October 10, 2023). "Nvidia's next-gen GB200 'Blackwell' GPU listed on its 2024 data center roadmap". TweakTown. Retrieved March 24, 2024.
  9. "Nvidia GB200 NVL72". Nvidia. Retrieved July 4, 2024.
  10. Leswing, Kif (March 18, 2024). "Nvidia CEO Jensen Huang announces new AI chips: 'We need bigger GPUs'". CNBC. Retrieved March 24, 2024.
  11. 1 2 Caulfield, Brian (March 18, 2024). "'We Created a Processor for the Generative AI Era,' Nvidia CEO Says". Nvidia. Retrieved March 24, 2024.
  12. Gronholt-Pedersen, Jacob; Mukherjee, Supantha (October 23, 2024). "Nvidia's design flaw with Blackwell AI chips now fixed, CEO says". Reuters. Retrieved December 17, 2024.
  13. Shilov, Anton (October 23, 2024). "Nvidia's Jensen Huang admits AI chip design flaw was '100% Nvidia's fault' — TSMC not to blame, now-fixed Blackwell chips are in production". Tom's Hardware. Retrieved December 17, 2024.
  14. Kahn, Jeremy (November 12, 2024). "60 direct reports, but no 1-on-1 meetings: How an unconventional leadership style helped Jensen Huang of Nvidia become one of the most powerful people in business" . Fortune. Retrieved November 16, 2024.
  15. Byrne, Joseph (March 28, 2024). "Monster Nvidia Blackwell GPU Promises 30× Speedup, but Expect 3×". XPU.pub. Retrieved July 4, 2024.
  16. Smith, Ryan (March 18, 2024). "Nvidia Blackwell Architecture and B200/B100 Accelerators Announced: Going Bigger With Smaller Data". AnandTech. Retrieved March 24, 2024.
  17. 1 2 Prickett Morgan, Timothy (March 18, 2024). "With Blackwell GPUs, AI Gets Cheaper and Easier, Competing with Nvidia Gets Harder". The Next Platform. Retrieved March 24, 2024.
  18. 1 2 "Nvidia Blackwell Platform Arrives to Power a New Era of Computing". Nvidia Newsroom. March 18, 2024. Retrieved March 24, 2024.
  19. Garreffa, Anthony (April 14, 2024). "Jim Keller laughs at $10B R&D cost for Nvidia Blackwell, should've used ethernet for $1B". TweakTown. Retrieved April 16, 2024.
  20. Hagedoom, Hilbert (March 18, 2024). "Nvidia B200 and GB200 AI GPUs Technical Overview: Unveiled at GTC 2024". Guru3D. Retrieved April 7, 2024.
  21. "Nvidia Blackwell "B100" to feature 2 dies and 192GB of HBM3e memory, B200 with 288GB". VideoCardz. March 17, 2024. Retrieved March 24, 2024.
  22. "Nvidia GeForce RTX 5090 GB202 GPU die reportedly measures 744 mm2, 20% larger than AD102". VideoCardz. November 22, 2024. Retrieved January 7, 2025.
  23. Edwards, Benj (March 18, 2024). "Nvidia unveils Blackwell B200, the "world's most powerful chip" designed for AI". Ars Technica. Retrieved March 24, 2024.
  24. "Nvidia GB200 NVL72". Nvidia. Retrieved July 4, 2024.