This article contains promotional content .(January 2024) |
Manufacturer | Nvidia |
---|---|
Release date | 2016 |
The Nvidia DGX (Deep GPU Xceleration) represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.
The core feature of a DGX system is its inclusion of 4 to 8 Nvidia Tesla GPU modules, which are housed on an independent system board. These GPUs can be connected either via a version of the SXM socket or a PCIe x16 slot, facilitating flexible integration within the system architecture. To manage the substantial thermal output, DGX units are equipped with heatsinks and fans designed to maintain optimal operating temperatures.
This framework makes DGX units suitable for computational tasks associated with artificial intelligence and machine learning models.[ according to whom? ]
DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards [1] with 128 GB of total HBM2 memory, connected by an NVLink mesh network. [2] The DGX-1 was announced on the 6th of April in 2016. [3] All models are based on a dual socket configuration of Intel Xeon E5 CPUs, and are equipped with the following features.
The product line is intended to bridge the gap between GPUs and AI accelerators using specific features for deep learning workloads. [4] The initial Pascal-based DGX-1 delivered 170 teraflops of half precision processing, [5] while the Volta-based upgrade increased this to 960 teraflops. [6]
The DGX-1 was first available in only the Pascal-based configuration, with the first generation SXM socket. The later revision of the DGX-1 offered support for first generation Volta cards via the SXM-2 socket. Nvidia offered upgrade kits that allowed users with a Pascal-based DGX-1 to upgrade to a Volta-based DGX-1. [7] [8]
Designed as a turnkey deskside AI supercomputer, the DGX Station is a tower computer that can function completely independently without typical datacenter infrastructure such as cooling, redundant power, or 19 inch racks.
The DGX station was first available with the following specifications. [10]
The DGX station is water-cooled to better manage the heat of almost 1500W of total system components, this allows it to keep a noise range under 35 dB under load. [12] This, among other features, made this system a compelling purchase for customers without the infrastructure to run rackmount DGX systems, which can be loud, output a lot of heat, and take up a large area. This was Nvidia's first venture into bringing high performance computing deskside, which has since remained a prominent marketing strategy for Nvidia. [13]
The successor of the Nvidia DGX-1 is the Nvidia DGX-2, which uses sixteen Volta-based V100 32 GB (second generation) cards in a single unit. It was announced on 27 March in 2018. [14] The DGX-2 delivers 2 Petaflops with 512 GB of shared memory for tackling massive datasets and uses NVSwitch for high-bandwidth internal communication. DGX-2 has a total of 512 GB of HBM2 memory, a total of 1.5 TB of DDR4. Also present are eight 100 Gbit/s InfiniBand cards and 30.72 TB of SSD storage, [15] all enclosed within a massive 10U rackmount chassis and drawing up to 10 kW under maximum load. [16] The initial price for the DGX-2 was $399,000. [17]
The DGX-2 differs from other DGX models in that it contains two separate GPU daughterboards, each with eight GPUs. These boards are connected by an NVSwitch system that allows for full bandwidth communication across all GPUs in the system, without additional latency between boards. [16]
A higher performance variant of the DGX-2, the DGX-2H, was offered as well. The DGX-2H replaced the DGX-2's dual Intel Xeon Platinum 8168's with upgraded dual Intel Xeon Platinum 8174's. This upgrade does not increase core count per system, as both CPUs are 24 cores, nor does it enable any new functions of the system, but it does increase the base frequency of the CPUs from 2.7 GHz to 3.1 GHz. [18] [19] [20]
Announced and released on May 14, 2020. The DGX A100 was the 3rd generation of DGX server, including 8 Ampere-based A100 accelerators. [21] Also included is 15 TB of PCIe gen 4 NVMe storage, [22] 1 TB of RAM, and eight Mellanox-powered 200 GB/s HDR InfiniBand ConnectX-6 NICs. The DGX A100 is in a much smaller enclosure than its predecessor, the DGX-2, taking up only 6 Rack units. [23]
The DGX A100 also moved to a 64 core AMD EPYC 7742 CPU, the first DGX server to not be built with an Intel Xeon CPU. The initial price for the DGX A100 Server was $199,000. [21]
As the successor to the original DGX Station, the DGX Station A100, aims to fill the same niche as the DGX station in being a quiet, efficient, turnkey cluster-in-a-box solution that can be purchased, leased, or rented by smaller companies or individuals who want to utilize machine learning. It follows many of the design choices of the original DGX station, such as the tower orientation, single socket CPU mainboard, a new refrigerant-based cooling system, and a reduced number of accelerators compared to the corresponding rackmount DGX A100 of the same generation. [13] The price for the DGX Station A100 320G is $149,000 and $99,000 for the 160G model, Nvidia also offers Station rental at ~US$9000 per month through partners in the US (rentacomputer.com) and Europe (iRent IT Systems) to help reduce the costs of implementing these systems at a small scale. [24] [25]
The DGX Station A100 comes with two different configurations of the built in A100.
Announced March 22, 2022 [26] and planned for release in Q3 2022, [27] The DGX H100 is the 4th generation of DGX servers, built with 8 Hopper-based H100 accelerators, for a total of 32 PFLOPs of FP8 AI compute and 640 GB of HBM3 Memory, an upgrade over the DGX A100s 640GB HBM2 memory.
This upgrade also increases VRAM bandwidth to 3 TB/s. [28] The DGX H100 increases the rackmount size to 8U to accommodate the 700W TDP of each H100 SXM card. The DGX H100 also has two 1.92 TB SSDs for Operating System storage, and 30.72 TB of Solid state storage for application data.
One more notable addition is the presence of two Nvidia Bluefield 3 DPUs, [29] and the upgrade to 400 Gbit/s InfiniBand via Mellanox ConnectX-7 NICs, double the bandwidth of the DGX A100. The DGX H100 uses new 'Cedar Fever' cards, each with four ConnectX-7 400 GB/s controllers, and two cards per system. This gives the DGX H100 3.2 Tbit/s of fabric bandwidth across Infiniband. [30]
The DGX H100 has two Xeon Platinum 8480C Scalable CPUs (Codenamed Sapphire Rapids) [31] and 2 Terabytes of System Memory. [32]
The DGX H100 was priced at £379,000 or ~US$482,000 at release. [33]
Announced May 2023, the DGX GH200 connects 32 Nvidia Hopper Superchips into a singular superchip, that consists totally of 256 H100 GPUs, 32 Grace Neoverse V2 72-core CPUs, 32 OSFT single-port ConnectX-7 VPI of with 400 Gbit/s InfiniBand and 16 dual-port BlueField-3 VPI with 200 Gbit/s of Mellanox . Nvidia DGX GH200 is designed to handle terabyte-class models for massive recommender systems, generative AI, and graph analytics, offering 19.5 TB of shared memory with linear scalability for giant AI models. [34]
Announced May 2023, the DGX Helios supercomputer features 4 DGX GH200 systems. Each is interconnected with Nvidia Quantum-2 InfiniBand networking to supercharge data throughput for training large AI models. Helios includes 1,024 H100 GPUs.
Announced March 2024, GB200 NVL72 connects 36 Grace Neoverse V2 72-core CPUs and 72 B100 GPUs in a rack-scale design. The GB200 NVL72 is a liquid-cooled, rack-scale solution that boasts a 72-GPU NVLink domain that acts as a single massive GPU . Nvidia DGX GB200 offers 13.5 TB HBM3e of shared memory with linear scalability for giant AI models, less than its predecessor DGX GH200.
The DGX Superpod is a high performance turnkey supercomputer system provided by Nvidia using DGX hardware. [35] It combines DGX compute nodes with fast storage and high bandwidth networking to provide a solution to high demand machine learning workloads. The Selene supercomputer, at the Argonne National Laboratory, is one example of a DGX SuperPod-based system.
Selene, built from 280 DGX A100 nodes, ranked 5th on the TOP500 list for most powerful supercomputers at the time of its completion in June 2020 [36] , and has continued to remain high in performance[ citation needed ]. This same integration is available to any customer with minimal effort on their behalf, and the new Hopper-based SuperPod can scale to 32 DGX H100 nodes, for a total of 256 H100 GPUs and 64 x86 CPUs. This gives the complete SuperPod 20 TB of HBM3 memory, 70.4 TB/s of bisection bandwidth, and up to 1 ExaFLOP of FP8 AI compute. [37] These SuperPods can then be further joined to create larger supercomputers.
The Eos supercomputer, designed, built, and operated by Nvidia, [38] [39] [40] was constructed of 18 H100-based SuperPods, totaling 576 DGX H100 systems, 500 Quantum-2 InfiniBand switches, and 360 NVLink Switches, that allow Eos to deliver 18 EFLOPs of FP8 compute, and 9 EFLOPs of FP16 compute, making Eos the 5th fastest AI supercomputer in the world, according to TOP500 (November 2023 edition).
As Nvidia does not produce any storage devices or systems, Nvidia SuperPods rely on partners to provide high performance storage. Current storage partners for Nvidia Superpods are Dell EMC, DDN, HPE, IBM, NetApp, Pavilion Data, and VAST Data. [41]
Comparison of accelerators used in DGX: [42] [43] [44]
Model | Architecture | Socket | FP32 CUDA cores | FP64 cores (excl. tensor) | Mixed INT32/FP32 cores | INT32 cores | Boost clock | Memory clock | Memory bus width | Memory bandwidth | VRAM | Single precision (FP32) | Double precision (FP64) | INT8 (non-tensor) | INT8 dense tensor | INT32 | FP4 dense tensor | FP16 | FP16 dense tensor | bfloat16 dense tensor | TensorFloat-32 (TF32) dense tensor | FP64 dense tensor | Interconnect (NVLink) | GPU | L1 Cache | L2 Cache | TDP | Die size | Transistor count | Process | Launched |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P100 | Pascal | SXM/SXM2 | N/A | 1792 | 3584 | N/A | 1480 MHz | 1.4 Gbit/s HBM2 | 4096-bit | 720 GB/sec | 16 GB HBM2 | 10.6 TFLOPS | 5.3 TFLOPS | N/A | N/A | N/A | N/A | 21.2 TFLOPS | N/A | N/A | N/A | N/A | 160 GB/sec | GP100 | 1344 KB (24 KB × 56) | 4096 KB | 300 W | 610 mm2 | 15.3 B | TSMC 16FF+ | Q2 2016 |
V100 16GB | Volta | SXM2 | 5120 | 2560 | N/A | 5120 | 1530 MHz | 1.75 Gbit/s HBM2 | 4096-bit | 900 GB/sec | 16 GB HBM2 | 15.7 TFLOPS | 7.8 TFLOPS | 62 TOPS | N/A | 15.7 TOPS | N/A | 31.4 TFLOPS | 125 TFLOPS | N/A | N/A | N/A | 300 GB/sec | GV100 | 10240 KB (128 KB × 80) | 6144 KB | 300 W | 815 mm2 | 21.1 B | TSMC 12FFN | Q3 2017 |
V100 32GB | Volta | SXM3 | 5120 | 2560 | N/A | 5120 | 1530 MHz | 1.75 Gbit/s HBM2 | 4096-bit | 900 GB/sec | 32 GB HBM2 | 15.7 TFLOPS | 7.8 TFLOPS | 62 TOPS | N/A | 15.7 TOPS | N/A | 31.4 TFLOPS | 125 TFLOPS | N/A | N/A | N/A | 300 GB/sec | GV100 | 10240 KB (128 KB × 80) | 6144 KB | 350 W | 815 mm2 | 21.1 B | TSMC 12FFN | |
A100 40GB | Ampere | SXM4 | 6912 | 3456 | 6912 | N/A | 1410 MHz | 2.4 Gbit/s HBM2 | 5120-bit | 1.52 TB/sec | 40 GB HBM2 | 19.5 TFLOPS | 9.7 TFLOPS | N/A | 624 TOPS | 19.5 TOPS | N/A | 78 TFLOPS | 312 TFLOPS | 312 TFLOPS | 156 TFLOPS | 19.5 TFLOPS | 600 GB/sec | GA100 | 20736 KB (192 KB × 108) | 40960 KB | 400 W | 826 mm2 | 54.2 B | TSMC N7 | Q1 2020 |
A100 80GB | Ampere | SXM4 | 6912 | 3456 | 6912 | N/A | 1410 MHz | 3.2 Gbit/s HBM2e | 5120-bit | 1.52 TB/sec | 80 GB HBM2e | 19.5 TFLOPS | 9.7 TFLOPS | N/A | 624 TOPS | 19.5 TOPS | N/A | 78 TFLOPS | 312 TFLOPS | 312 TFLOPS | 156 TFLOPS | 19.5 TFLOPS | 600 GB/sec | GA100 | 20736 KB (192 KB × 108) | 40960 KB | 400 W | 826 mm2 | 54.2 B | TSMC N7 | |
H100 | Hopper | SXM5 | 16896 | 4608 | 16896 | N/A | 1980 MHz | 5.2 Gbit/s HBM3 | 5120-bit | 3.35 TB/sec | 80 GB HBM3 | 67 TFLOPS | 34 TFLOPS | N/A | 1.98 POPS | N/A | N/A | N/A | 990 TFLOPS | 990 TFLOPS | 495 TFLOPS | 67 TFLOPS | 900 GB/sec | GH100 | 25344 KB (192 KB × 132) | 51200 KB | 700 W | 814 mm2 | 80 B | TSMC 4N | Q3 2022 |
H200 | Hopper | SXM5 | 16896 | 4608 | 16896 | N/A | 1980 MHz | 6.3 Gbit/s HBM3e | 6144-bit | 4.8 TB/sec | 141 GB HBM3e | 67 TFLOPS | 34 TFLOPS | N/A | 1.98 POPS | N/A | N/A | N/A | 990 TFLOPS | 990 TFLOPS | 495 TFLOPS | 67 TFLOPS | 900 GB/sec | GH100 | 25344 KB (192 KB × 132) | 51200 KB | 1000 W | 814 mm2 | 80 B | TSMC 4N | Q3 2023 |
B100 | Blackwell | SXM6 | N/A | N/A | N/A | N/A | N/A | 8 Gbit/s HBM3e | 8192-bit | 8 TB/sec | 192 GB HBM3e | N/A | N/A | N/A | 3.5 POPS | N/A | 7 PFLOPS | N/A | 1.98 PFLOPS | 1.98 PFLOPS | 989 TFLOPS | 30 TFLOPS | 1.8 TB/sec | GB100 | N/A | N/A | 700 W | N/A | 208 B | TSMC 4NP | Q4 2024 (expected) |
B200 | Blackwell | SXM6 | N/A | N/A | N/A | N/A | N/A | 8 Gbit/s HBM3e | 8192-bit | 8 TB/sec | 192 GB HBM3e | N/A | N/A | N/A | 4.5 POPS | N/A | 9 PFLOPS | N/A | 2.25 PFLOPS | 2.25 PFLOPS | 1.2 PFLOPS | 40 TFLOPS | 1.8 TB/sec | GB100 | N/A | N/A | 1000 W | N/A | 208 B | TSMC 4NP |
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis and storage systems, software, research and development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.
The Ohio Supercomputer Center (OSC) is a supercomputer facility located on the western end of the Ohio State University campus, just north of Columbus. Established in 1987, the OSC partners with Ohio universities, labs and industries, providing students and researchers with high performance computing, advanced cyberinfrastructure, research and computational science education services.
The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.
The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.
This list compares various amounts of computing power in instructions per second organized by order of magnitude in FLOPS.
Tsubame is a series of supercomputers that operates at the GSIC Center at the Tokyo Institute of Technology in Japan, designed by Satoshi Matsuoka.
ThinkStation is a brand of professional workstations from Lenovo announced in November 2007 and then released in January 2008. They are designed to be used for high-end computing and computer-aided design (CAD) tasks and primarily compete with other enterprise workstation lines, such as Dell's Precision, HP's Z line, Acer's Veriton K series, and Apple's Mac Pro line.
Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.
NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).
Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.
High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.
Summit or OLCF-4 was a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. It held the number 1 position on the TOP500 list from November 2018 to June 2020. As of June 2024, its LINPACK benchmark was clocked at 148.6 petaFLOPS. Summit was decommissioned on November 15, 2024.
Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.
Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.
Christofari — are Christofari (2019), Christofari Neo (2021) supercomputers of Sberbank based on Nvidia corporation hardware Sberbank of Russia and Nvidia. Their main purpose is neural network learning. They are also used for scientific research and commercial calculations.
Inspur Server Series is a series of server computers introduced in 1993 by Inspur, an information technology company, and later expanded to the international markets. The servers were likely among the first originally manufactured by a Chinese company. It is currently developed by Inspur Information and its San Francisco-based subsidiary company - Inspur Systems, both Inspur's spinoff companies. The product line includes GPU Servers, Rack-mounted servers, Open Computing Servers and Multi-node Servers.
Selene is a supercomputer developed by Nvidia, capable of achieving 63.460 petaflops, ranking as the fifth fastest supercomputer in the world, when it entered the list. Selene is based on the Nvidia DGX system consisting of AMD CPUs, Nvidia A100 GPUs, and Mellanox HDDR networking. Selene is based on the Nvidia DGX Superpod, which is a high performance turnkey supercomputer solution provided by Nvidia using DGX hardware. DGX Superpod is a tightly integrated system that combines high performance DGX compute nodes with fast storage and high bandwidth networking. It aims to provide a turnkey solution to high-demand machine learning workloads. Selene was built in three months and is the fastest industrial system in the US while being the second-most energy-efficient supercomputing system ever.
Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.
SXM is a high bandwidth socket solution for connecting Nvidia Compute Accelerators to a system. Each generation of Nvidia Tesla since the P100 models, the DGX computer series and the HGX boards come with an SXM socket type that realizes high bandwidth, power delivery and more for the matching GPU daughter cards. Nvidia offers these combinations as an end-user product e.g. in their models of the DGX system series. Current socket generations are SXM for Pascal based GPUs, SXM2 and SXM3 for Volta based GPUs, SXM4 for Ampere based GPUs, and SXM5 for Hopper based GPUs. These sockets are used for specific models of these accelerators, and offer higher performance per card than PCIe equivalents. The DGX-1 system was the first to be equipped with SXM-2 sockets and thus was the first to carry the form factor compatible SXM modules with P100 GPUs and later was unveiled to be capable of allowing upgrading to SXM2 modules with V100 GPUs.
Taiwania 3 is one of the supercomputers made by Taiwan, and also the newest one. It is placed in the National Center for High-performance Computing of NARLabs. There are 50,400 cores in total with 900 nodes, using Intel Xeon Platinum 8280 2.4 GHz CPU and using CentOS as Operating System. It is an open access for public supercomputer. It is currently open access to scientists and more to do specific research after getting permission from Taiwan's National Center for High-performance Computing. This is the third supercomputer of the Taiwania series. It uses CentOS x86_64 7.8 as its system operator and Slurm Workload Manager as workflow manager to ensure better performance. Taiwania 3 uses InfiniBand HDR100 100 Gbit/s high speed Internet connection to ensure better performance of the supercomputer. The main memory capability is 192 GB. There's currently two Intel Xeon Platinum 8280 2.4 GHz CPU inside each node. The full calculation capability is 2.7PFLOPS. It is launched into operation in November 2020 before schedule due to the needed for COVID-19. It is currently ranked number 227 on Top 500 list of June, 2021 and number 80 on Green 500 list. It is manufactured by Quanta Computer, Taiwan Fixed Network, and ASUS Cloud.
Eight GPU hybrid cube mesh architecture with NVLink
NVIDIA DGX-1 Delivers 75X Faster Training...Note: Caffe benchmark with AlexNet, training 1.28M images with 90 epochs
{{cite web}}
: |last=
has generic name (help){{cite web}}
: |last=
has generic name (help)