Nvidia BlueField

Last updated

Nvidia BlueField is a line of data processing units (DPUs) designed and produced by Nvidia. Initially developed by Mellanox Technologies, the BlueField IP was acquired by Nvidia in March 2019, when Nvidia acquired Mellanox Technologies for US$6.9 billion. [1] The first Nvidia produced BlueField cards, named BlueField-2, were shipped for review shortly after their announcement at VMworld 2019, and were officially launched at GTC 2020. [2] Also launched at GTC 2020 was the Nvidia BlueField-2X, an Nvidia BlueField card with an Ampere generation graphics processing unit (GPU) integrated onto the same card. [2] BlueField-3 and BlueField-4 DPUs were first announced at GTC 2021, with the tentative launch dates for these cards being 2022 and 2024 respectively. [3]

Contents

Nvidia BlueField cards are targeted for use in datacenters and high performance computing, where latency and bandwidth are important for efficient computation. [4]

BlueField cards differ from network interface controllers in their offloading of functions that would normally be reserved for the CPU, and the presence of CPU cores (typically ARM or MIPS based) and memory support (typically DDR4, though Bluefield-3's release brought support for more exotic memory types such as HBM and DDR5). BlueField cards also run an operating system completely independent from the host system: this is designed to reduce software overhead, as each DPU can function independently of one another and the head unit. [5] This also means that Bluefield cards are capable of allowing remote management of systems that may not typically support it. Bluefield cards can also configure their PCIe bus to function as a host, rather than a device, which lets Bluefield cards connect over a PCIe bridge to another card, such as a compute accelerator, to provide completely network-based, high bandwidth control of a GPU. [6]

The Bluefield X cards are DPU-GPU hybrid cards with a 100 class Nvidia datacenter GPU integrated on the same PCB as the Bluefield DPU. These cards are intended for high power GPU clusters to allow high bandwidth communication without needing to cross the PCIe bus and create an unnecessary load on the CPU where performance may be better allocated to other types of processing. The increase in total external connectivity available to a system in this configuration allows for datasets to be utilized across multiple nodes when they may be too large for any single system to hold in memory.

Models

ModelAnnouncement DateRelease DateNetworking Port OptionsBandwidth CapacityCoresCore TypePCIe GenerationMemory CapacityMemory TypeGPU Accelerator SPECint(2k17-rate) [7] TOPS [7]
BlueField-2October 5, 2020Q2 2021Dual QSFP56 10/25/50/100 Gb

Single QSFP56 200 Gb

200Gbit/s8ARM A724.016/32 GBDDR4N/A90.7
BlueField-2XQ4 2021Nvidia A10060
BlueField-3April 12, 2021Q1 2022Quad/Dual/Single QSFP56400Gbit/s16ARM A785.064 GBDDR5N/A421.5
BlueField-3XN/ANvidia A10075
BlueField-42024OSFP112800Gbit/sTBDTBD160400

H100 CNX & A100 EGX

The H100 CNX and the A100 EGX are NIC/GPU hybrid cards and, while visually similar to a Bluefield-X card, are completely distinct, and do not have the Bluefield system on a chip integration. The cards are instead equipped with a generic ConnectX network interface controller. [8] [9]

Related Research Articles

<span class="mw-page-title-main">Graphics card</span> Expansion card which generates a feed of output images to a display device

A graphics card is a computer expansion card that generates a feed of graphics output to a display device such as a monitor. Graphics cards are sometimes called discrete or dedicated graphics cards to emphasize their distinction to an integrated graphics processor on the motherboard or the central processing unit (CPU). A graphics processing unit (GPU) that performs the necessary computations is the main component in a graphics card, but the acronym "GPU" is sometimes also used to erroneously refer to the graphics card as a whole.

<span class="mw-page-title-main">PCI Express</span> Computer expansion bus standard

PCI Express, officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common motherboard interface for personal computers' graphics cards, sound cards, hard disk drive host adapters, SSDs, Wi-Fi and Ethernet hardware connections. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices, a more detailed error detection and reporting mechanism, and native hot-swap functionality. More recent revisions of the PCIe standard provide hardware support for I/O virtualization.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and image processing. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">NVLink</span> High speed chip interconnect

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).

<span class="mw-page-title-main">Volta (microarchitecture)</span> GPU microarchitecture by Nvidia

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

<span class="mw-page-title-main">High Bandwidth Memory</span> Type of memory used on processors that require high transfer rate memory

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

<span class="mw-page-title-main">Tensor Processing Unit</span> AI accelerator ASIC by Google

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of General-Purpose computing on Graphics Processing Units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">Ampere (microarchitecture)</span> GPU microarchitecture by Nvidia

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.

<span class="mw-page-title-main">Hopper (microarchitecture)</span> GPU microarchitecture designed by Nvidia

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is parallel to Ada Lovelace. It's the latest generation of Nvidia Tesla.

Nvidia GTC is a global artificial intelligence (AI) conference for developers that brings together developers, engineers, researchers, inventors, and IT professionals. Topics focus on AI, computer graphics, data science, machine learning and autonomous machines. Each conference begins with a keynote from Nvidia CEO and founder Jensen Huang, followed by a variety of sessions and talks with experts from around the world.

Selene is a supercomputer developed by Nvidia, capable of achieving 63.460 petaflops, ranking as the fifth fastest supercomputer in the world, when it entered the list. Selene is based on the Nvidia DGX system consisting of AMD CPUs, Nvidia A100 GPUs, and Mellanox HDDR networking. Selene is based on the Nvidia DGX Superpod, which is a high performance turnkey supercomputer solution provided by Nvidia using DGX hardware. DGX Superpod is a tightly integrated system that combines high performance DGX compute nodes with fast storage and high bandwidth networking. It aims to provide a turnkey solution to high-demand machine learning workloads. Selene was built in three months and is the fastest industrial system in the US while being the second-most energy-efficient supercomputing system ever.

<span class="mw-page-title-main">Leonardo (supercomputer)</span> Supercomputer in Italy

Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.

<span class="mw-page-title-main">SXM (socket)</span> High performance computing socket

SXM is a high bandwidth socket solution for connecting Nvidia Compute Accelerators to a system. Each generation of Nvidia Tesla since P100 models, the DGX computer series and the HGX boards come with an SXM socket type that realizes high bandwidth, power delivery and more for the matching GPU daughter cards. Nvidia offers these combinations as an end-user product e.g. in their models of the DGX system series. Current socket generations are SXM for Pascal based GPUs, SXM2 and SXM3 for Volta based GPUs, SXM4 for Ampere based GPUs, and SXM5 for Hopper based GPUs. These sockets are used for specific models of these accelerators, and offer higher performance per card than PCIe equivalents. The DGX-1 system was the first to be equipped with SXM-2 sockets and thus was the first to carry the form factor compatible SXM modules with P100 GPUs and later was unveiled to be capable of allowing upgrading to SXM2 modules with V100 GPUs.

The ARM Neoverse is a group of 64-bit ARM processor cores licensed by Arm Holdings. The cores are intended for datacenter, edge computing, and high-performance computing use. The group consists of ARM Neoverse V-Series, ARM Neoverse N-Series, and ARM Neoverse E-Series.

<span class="mw-page-title-main">CDNA (microarchitecture)</span> AMD compute-focused GPU microarchitecture

CDNA is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA, a consumer graphics focused microarchitecture.

Ada Lovelace, also referred to simply as Lovelace, is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Ampere architecture, officially announced on September 20, 2022. It is named after the English mathematician Ada Lovelace, one of the first computer programmers. Nvidia announced the architecture along with the GeForce RTX 40 series consumer GPUs and the RTX 6000 Ada Generation workstation graphics card. The Lovelace architecture is fabricated on TSMC's custom 4N process which offers increased efficiency over the previous Samsung 8 nm and TSMC N7 processes used by Nvidia for its previous-generation Ampere architecture.

Blackwell is a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to the Hopper and Ada Lovelace microarchitectures.

References

  1. Clifford, Tyler (2020-04-28). "Nvidia completes 'homerun deal' after closing $7 billion acquisition of Mellanox". CNBC. Retrieved 2022-03-28.
  2. 1 2 servethehome (2020-10-05). "NVIDIA BlueField-2 and BlueField-2X DPU Offerings Launched". ServeTheHome. Retrieved 2022-03-28.
  3. Shilov, Anton (2021-04-12). "Nvidia Reveals BlueField-3, BlueField-4 DPUs: 400-800 Gbps, 22-64B Transistors". Tom's Hardware .
  4. "NVIDIA BLUEFIELD-2 DPU - Data Center Infrastructure on a Chip" (PDF). Nvidia.
  5. servethehome (2021-05-29). "DPU vs SmartNIC and the STH NIC Continuum Framework". ServeTheHome. Retrieved 2022-03-29.
  6. servethehome (2021-07-11). "CPU-GPU-NIC PCIe Card Realized with NVIDIA BlueField-2 A100". ServeTheHome. Retrieved 2022-04-05.
  7. 1 2 Mellor, Chris (2021-04-12). "Nvidia unveils BlueField 3 DPU. It's much faster". Blocks and Files. Retrieved 2023-06-28.
  8. servethehome (2020-05-14). "NVIDIA EGX A100 Launched Tesla Plus Mellanox Vision". ServeTheHome. Retrieved 2022-04-05.
  9. "NVIDIA H100 CNX". NVIDIA. Retrieved 2022-03-29.