NVLink

Last updated
NVLink
NVidia NVLink two lines of text.png
Manufacturer Nvidia
TypeMulti-GPU and CPU technology
Predecessor Scalable Link Interface

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS). [1]

Contents

Principle

NVLink is a wire-based communications protocol for near-range semiconductor communications developed by Nvidia that can be used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20, 25 and 50 Gbit/s (v1.0/v2.0/v3.0+ resp.) per differential pair. For NVLink 1.0 and 2.0 eight differential pairs form a "sub-link" and two "sub-links", one for each direction, form a "link". Starting from NVlink 3.0 only four differential pairs form a "sub-link". For NVLink 2.0 and higher the total data rate for a sub-link is 25 GB/s and the total data rate for a link is 50 GB/s. Each V100 GPU supports up to six links. Thus, each GPU is capable of supporting up to 300 GB/s in total bi-directional bandwidth. [2] [3] NVLink products introduced to date focus on the high-performance application space. Announced May 14, 2020, NVLink 3.0 increases the data rate per differential pair from 25 Gbit/s to 50 Gbit/s while halving the number of pairs per NVLink from 8 to 4. With 12 links for an Ampere-based A100 GPU this brings the total bandwidth to 600 GB/s. [4] Hopper has 18 NVLink 4.0 links enabling a total of 900 GB/s bandwidth. [5] Thus NVLink 2.0, 3.0 and 4.0 all have a 50 GB/s per bidirectional link, and have 6, 12 and 18 links correspondingly.

Performance

The following table shows a basic metrics comparison based upon standard specifications:

InterconnectTransfer rate Line code Modulation Effective payload rate per lane or NVLink (unidirectional)Max total lane length [a] Total Links (NVLink)Total Bandwidth (PCIe x16 or NVLink)Realized in design
PCIe 1.x2.5 GT/s 8b/10b 0.25GB/s20 inches (51 cm)8GB/s
PCIe 2.x5 GT/s8b/10b0.5GB/s20 inches (51 cm)16GB/s
PCIe 3.x8 GT/s 128b/130b 0.99GB/s20 inches (51 cm) [6] 31.51GB/s Pascal, Volta, Turing
PCIe 4.016 GT/s128b/130b1.97GB/s8–12 inches (20–30 cm) [6] 63.02GB/sVolta on Xavier, Ampere, POWER9
PCIe 5.032 GT/s [7] 128b/130b3.94GB/s126.03GB/s Hopper
PCIe 6.064 GT/s236B/256B [8] FLIT PAM4 w/ FEC 7.56 GB/s242GB/s Blackwell
NVLink 1.020 GT/sNRZ20 GB/s4160GB/s Pascal, POWER8+
NVLink 2.025 GT/sNRZ25 GB/s6300GB/s Volta, POWER9
NVLink 3.050 GT/sNRZ25 GB/s12600GB/s Ampere
NVLink 4.050GT/s [9] PAM4 differential-pair25 GB/s18900GB/s Hopper, Nvidia Grace
NVLink 5.0 [10] 100 GT/sPAM4 differential-pair50 GB/s181.8TB/s Blackwell, Nvidia Grace

The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:

SemiconductorBoard/bus delivery variantInterconnectTransmission technology rate (per lane)Lanes per sub-link (out + in)Sub-link data rate (per data direction) [b] Sub-link or unit countTotal data rate (out + in) [b] Total lanes (out + in)Total data rate (out + in) [b]
Nvidia GP100P100 SXM, [11] P100 PCI-E [12] PCIe 3.08 GT/s 16 + 16 [c] 128 Gbit/s = 16 GB/s116 + 16 GB/s [13] 32 [d] 32 GB/s
Nvidia GV100V100 SXM2, [14] V100 PCI-E [15] PCIe 3.08 GT/s 16 + 16 [c] 128 Gbit/s = 16 GB/s116 + 16 GB/s32 [d] 32 GB/s
Nvidia TU104 GeForce RTX 2080, Quadro RTX 5000PCIe 3.08 GT/s 16 + 16 [c] 128 Gbit/s = 16 GB/s116 + 16 GB/s32 [d] 32 GB/s
Nvidia TU102GeForce RTX 2080 Ti, Quadro RTX 6000/8000PCIe 3.08 GT/s 16 + 16 [c] 128 Gbit/s = 16 GB/s116 + 16 GB/s32 [d] 32 GB/s
Nvidia GA100 [16] [17]

Nvidia GA102 [18]

Ampere A100 (SXM4 & PCIe) [19] PCIe 4.016 GT/s16 + 16 [c] 256 Gbit/s = 32 GB/s132 + 32 GB/s32 [d] 64 GB/s
Nvidia GP100P100 SXM, (not available with P100 PCI-E) [20] NVLink 1.020 GT/s8 + 8 [e] 160 Gbit/s = 20 GB/s480 + 80 GB/s64160 GB/s
Nvidia GV100V100 SXM2 [21] (not available with V100 PCI-E)NVLink 2.025 GT/s8 + 8 [e] 200 Gbit/s = 25 GB/s6 [22] 150 + 150 GB/s96300 GB/s
Nvidia TU104 GeForce RTX 2080, Quadro RTX 5000 [23] NVLink 2.025 GT/s8 + 8 [e] 200 Gbit/s = 25 GB/s125 + 25 GB/s1650 GB/s
Nvidia TU102GeForce RTX 2080 Ti, Quadro RTX 6000/8000 [23] NVLink 2.025 GT/s8 + 8 [e] 200 Gbit/s = 25 GB/s250 + 50 GB/s32100 GB/s
Nvidia GA100 [16] [17] Ampere A100 (SXM4 & PCIe) [19] NVLink 3.050 GT/s4 + 4 [e] 200 Gbit/s = 25 GB/s12 [24] 300 + 300 GB/s96600 GB/s
Nvidia GA102 [18] GeForce RTX 3090, Quadro RTX A6000NVLink 3.028.125 GT/s4 + 4 [e] 112.5 Gbit/s = 14.0625 GB/s456.25 + 56.25 GB/s16112.5 GB/s
NVSwitch for Hopper [25] (fully connected 64 port switch)NVLink 4.0106.25 GT/s9 + 9 [e] 450 Gbit/s183600 + 3600 GB/s1287200 GB/s
Nvidia Grace CPU [26] Nvidia GH200 SuperchipPCIe-5 (4x, 16x) @ 512 GB/s
Nvidia Grace CPU [26] Nvidia GH200 SuperchipNVLink-C2C @ 900 GB/s
Nvidia Hopper GPU [26] Nvidia GH200 SuperchipNVLink-C2C @ 900 GB/s
Nvidia Hopper GPU [26] Nvidia GH200 SuperchipNVLink 4 (18x) @ 900 GB/s
  1. PCIe: incl. 5" for PCBs
  2. 1 2 3 Data rate columns are maximum theoretical values.
  3. 1 2 3 4 5 sample value; other fractions for the PCIe lane usage should be possible.
  4. 1 2 3 4 5 a single PCIe lane transfers data over a differential pair.
  5. 1 2 3 4 5 6 7 sample value; NVLink sub-link bundling should be possible.

Real world performance could be determined by applying different encapsulation taxes as well usage rate. Those come from various sources:[ citation needed ]

Those physical limitations usually reduce the data rate to between 90 and 95% of the transfer rate.[ citation needed ] NVLink benchmarks show an achievable transfer rate of about 35.3 Gbit/s[ contradictory ] (host to device) for a 40 Gbit/s (2 sub-lanes uplink) NVLink connection towards a P100 GPU in a system that is driven by a set of IBM POWER8 CPUs. [27]

Usage with plug-in boards

For the various versions of plug-in boards (a yet small number of high-end gaming and professional graphics GPU boards with this feature exist) that expose extra connectors for joining them into a NVLink group, a similar number of slightly varying, relatively compact, PCB based interconnection plugs does exist. Typically only boards of the same type will mate together due to their physical and logical design. For some setups two identical plugs need to be applied for achieving the full data rate. As of now the typical plug is U-shaped with a fine grid edge connector on each of the end strokes of the shape facing away from the viewer. The width of the plug determines how far away the plug-in cards need to be seated to the main board of the hosting computer system - a distance for the placement of the card is commonly determined by the matching plug (known available plug widths are 3 to 5 slots and also depend on board type). [28] [29] The interconnect is often referred as Scalable Link Interface (SLI) from 2004 for its structural design and appearance, even if the modern NVLink based design is of a quite different technical nature with different features in its basic levels compared to the former design. Reported real world devices are: [30]

Service software and programming

For the Tesla, Quadro and Grid product lines, the NVML-API (Nvidia Management Library API) offers a set of functions for programmatically controlling some aspects of NVLink interconnects on Windows and Linux systems, such as component evaluation and versions along with status/error querying and performance monitoring. [38] Further, with the provision of the NCCL library (Nvidia Collective Communications Library) developers in the public space shall be enabled for realizing e.g. powerful implementations for artificial intelligence and similar computation hungry topics atop NVLink. [39] The page "3D Settings" » "Configure SLI, Surround, PhysX" in the Nvidia Control panel and the CUDA sample application "simpleP2P" use such APIs to realize their services in respect to their NVLink features. On the Linux platform, the command line application with sub-command "nvidia-smi nvlink" provides a similar set of advanced information and control. [30]

History

On 5 April 2016, Nvidia announced that NVLink would be implemented in the Pascal-microarchitecture-based GP100 GPU, as used in, for example, Nvidia Tesla P100 products. [40] With the introduction of the DGX-1 high performance computer base it was possible to have up to eight P100 modules in a single rack system connected to up to two host CPUs. The carrier board (...) allows for a dedicated board for routing the NVLink connections – each P100 requires 800 pins, 400 for PCIe + power, and another 400 for the NVLinks, adding up to nearly 1600 board traces for NVLinks alone (...). [41] Each CPU has direct connection to 4 units of P100 via PCIe and each P100 has one NVLink each to the 3 other P100s in the same CPU group plus one more NVLink to one P100 in the other CPU group. Each NVLink (link interface) offers a bidirectional 20 GB/sec up 20 GB/sec down, with 4 links per GP100 GPU, for an aggregate bandwidth of 80 GB/sec up and another 80 GB/sec down. [42] NVLink supports routing so that in the DGX-1 design for every P100 a total of 4 of the other 7 P100s are directly reachable and the remaining 3 are reachable with only one hop. According to depictions in Nvidia's blog-based publications, from 2014 NVLink allows bundling of individual links for increased point to point performance so that for example a design with two P100s and all links established between the two units would allow the full NVLink bandwidth of 80 GB/s between them. [43]

At GTC2017, Nvidia presented its Volta generation of GPUs and indicated the integration of a revised version 2.0 of NVLink that would allow total I/O data rates of 300 GB/s for a single chip for this design, and further announced the option for pre-orders with a delivery promise for Q3/2017 of the DGX-1 and DGX-Station high performance computers that will be equipped with GPU modules of type V100 and have NVLink 2.0 realized in either a networked (two groups of four V100 modules with inter-group connectivity) or a fully interconnected fashion of one group of four V100 modules.

In 2017–2018, IBM and Nvidia delivered the Summit and Sierra supercomputers for the US Department of Energy [44] which combine IBM's POWER9 family of CPUs and Nvidia's Volta architecture, using NVLink 2.0 for the CPU-GPU and GPU-GPU interconnects and InfiniBand EDR for the system interconnects. [45]

In 2020, Nvidia announced that they will no longer be adding new SLI driver profiles on RTX 2000 series and older from January 1, 2021. [46]

See also

Related Research Articles

<span class="mw-page-title-main">Graphics card</span> Expansion card which generates a feed of output images to a display device

A graphics card is a computer expansion card that generates a feed of graphics output to a display device such as a monitor. Graphics cards are sometimes called discrete or dedicated graphics cards to emphasize their distinction to an integrated graphics processor on the motherboard or the central processing unit (CPU). A graphics processing unit (GPU) that performs the necessary computations is the main component in a graphics card, but the acronym "GPU" is sometimes also used to erroneously refer to the graphics card as a whole.

<span class="mw-page-title-main">PCI Express</span> Computer expansion bus standard

PCI Express, officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common motherboard interface for personal computers' graphics cards, capture cards, sound cards, hard disk drive host adapters, SSDs, Wi-Fi, and Ethernet hardware connections. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices, a more detailed error detection and reporting mechanism, and native hot-swap functionality. More recent revisions of the PCIe standard provide hardware support for I/O virtualization.

<span class="mw-page-title-main">GeForce</span> Brand of GPUs by Nvidia

GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards, to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

<span class="mw-page-title-main">Alienware</span> American computer hardware subsidiary of Dell Inc.

Alienware Corporation is an American computer hardware subsidiary brand of Dell. Their product range is dedicated to gaming computers and accessories and can be identified by their alien-themed designs. Alienware was founded in 1996 by Nelson Gonzalez and Alex Aguila. The development of the company is also associated with Frank Azor, Arthur Lewis, Joe Balerdi, and Michael S. Dell (CEO). The company's corporate headquarters is located in The Hammocks, Miami, Florida.

<span class="mw-page-title-main">Scalable Link Interface</span> Brand name; multi-GPU technology by Nvidia

Scalable Link Interface (SLI) is the brand name for a now discontinued multi-GPU technology developed by Nvidia for linking two or more video cards together to produce a single output. SLI is a parallel processing algorithm for computer graphics, meant to increase the available processing power.

<span class="mw-page-title-main">Quadro</span> Brand of Nvidia graphics cards used in workstations

Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine learning from 2000 to 2020.

The GeForce 9 series is the ninth generation of Nvidia's GeForce line of graphics processing units, the first of which was released on February 21, 2008. The products are based on an updated Tesla microarchitecture, adding PCI Express 2.0 support, improved color and z-compression, and built on a 65 nm process, later using 55 nm process to reduce power consumption and die size.

<span class="mw-page-title-main">ThinkStation</span> Line of professional workstations by Lenovo

ThinkStation is a brand of professional workstations from Lenovo announced in November 2007 and then released in January 2008. They are designed to be used for high-end computing and computer-aided design (CAD) tasks and primarily compete with other enterprise workstation lines, such as Dell's Precision, HP's Z line, Acer's Veriton K series, and Apple's Mac Pro line.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 27, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

<span class="mw-page-title-main">Volta (microarchitecture)</span> GPU microarchitecture by Nvidia

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">Turing (microarchitecture)</span> GPU microarchitecture by Nvidia

Turing is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is named after the prominent mathematician and computer scientist Alan Turing. The architecture was first introduced in August 2018 at SIGGRAPH 2018 in the workstation-oriented Quadro RTX cards, and one week later at Gamescom in consumer GeForce 20 series graphics cards. Building on the preliminary work of Volta, its HPC-exclusive predecessor, the Turing architecture introduces the first consumer products capable of real-time ray tracing, a longstanding goal of the computer graphics industry. Key elements include dedicated artificial intelligence processors and dedicated ray tracing processors. Turing leverages DXR, OptiX, and Vulkan for access to ray tracing. In February 2019, Nvidia released the GeForce 16 series GPUs, which utilizes the new Turing design but lacks the RT and Tensor cores.

Ampere is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia as the successor to both the Volta and Turing architectures. It was officially announced on May 14, 2020 and is named after French mathematician and physicist André-Marie Ampère.

<span class="mw-page-title-main">Hopper (microarchitecture)</span> GPU microarchitecture designed by Nvidia

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.

<span class="mw-page-title-main">SXM (socket)</span> High performance computing socket

SXM is a high bandwidth socket solution for connecting Nvidia Compute Accelerators to a system. Each generation of Nvidia Tesla since P100 models, the DGX computer series and the HGX boards come with an SXM socket type that realizes high bandwidth, power delivery and more for the matching GPU daughter cards. Nvidia offers these combinations as an end-user product e.g. in their models of the DGX system series. Current socket generations are SXM for Pascal based GPUs, SXM2 and SXM3 for Volta based GPUs, SXM4 for Ampere based GPUs, and SXM5 for Hopper based GPUs. These sockets are used for specific models of these accelerators, and offer higher performance per card than PCIe equivalents. The DGX-1 system was the first to be equipped with SXM-2 sockets and thus was the first to carry the form factor compatible SXM modules with P100 GPUs and later was unveiled to be capable of allowing upgrading to SXM2 modules with V100 GPUs.

References

  1. Nvidia NVLINK 2.0 arrives in IBM servers next year by Jon Worrel on fudzilla.com on August 24, 2016
  2. "NVIDIA DGX-1 With Tesla V100 System Architecture" (PDF).
  3. "What Is NVLink?". Nvidia. 2014-11-14.
  4. Ryan Smith (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech.
  5. Jacobs, Blair (2022-03-23). "Nvidia reveals next-gen Hopper GPU architecture". Club386. Retrieved 2022-05-04.
  6. 1 2 "PCIe - PCI Express (1.1 / 2.0 / 3.0 / 4.0 / 5.0)". www.elektronik-kompendium.de.
  7. January 2019, Paul Alcorn 17 (17 January 2019). "PCIe 5.0 Is Ready For Prime Time". Tom's Hardware.{{cite web}}: CS1 maint: numeric names: authors list (link)
  8. "The PCIe® 6.0 Specification Webinar Q&A: A Deeper Dive into FLIT Mode, PAM4, and Forward Error Correction (FEC) PCI-SIG". pcisig.com. PCI-SIG. Retrieved 28 November 2024. We considered various FLIT sizes and settled on 256 Bytes with 236 bytes of TLP payload and a TLP efficiency of 92%.
  9. "NVIDIA Blackwell Architecture Technical Overview". NVIDIA. NVIDIA. p. 8. Retrieved 28 November 2024. Fifth-generation NVLink doubles the performance of fourth- generation NVLink in NVIDIA Hopper. While the new NVLink in Blackwell GPUs also uses two high-speed differential pairs in each direction to form a single link as in the Hopper GPU, NVIDIA Blackwell doubles the effective bandwidth per link to 50 GB/sec in each direction.
  10. online, heise. "NVIDIA Tesla P100 [SXM2], 16GB HBM2 (NVTP100-SXM) | heise online Preisvergleich / Deutschland". geizhals.de.
  11. online, heise (14 August 2023). "PNY Tesla P100 [PCIe], 16GB HBM2 (TCSP100M-16GB-PB/NVTP100-16) ab € 4990,00 (2020) | heise online Preisvergleich / Deutschland". geizhals.de.
  12. NVLink Takes GPU Acceleration To The Next Level by Timothy Prickett Morgan at nextplatform.com on May 4, 2016
  13. "NVIDIA Tesla V100 SXM2 16 GB Specs". TechPowerUp. 14 August 2023.
  14. online, heise (14 August 2023). "PNY Quadro GV100, 32GB HBM2, 4x DP (VCQGV100-PB) ab € 10199,00 (2020) | heise online Preisvergleich / Deutschland". geizhals.de.
  15. 1 2 Morgan, Timothy Prickett (May 14, 2020). "Nvidia Unifies AI Compute With "Ampere" GPU". The Next Platform.
  16. 1 2 "Data sheet" (PDF). www.nvidia.com. Retrieved 2020-09-15.
  17. 1 2 "NVIDIA ampere GA102 GPU Architecture Whitepaper" (PDF). nvidia.com. Retrieved 2 May 2023.
  18. 1 2 "Tensor Core GPU" (PDF). nvidia.com. Retrieved 2 May 2023.
  19. Chris Williams (June 20, 2016). "All aboard the PCIe bus for Nvidia's Tesla P100 supercomputer grunt". theregister.co.uk.
  20. online, heise (22 June 2017). "Nvidia Tesla V100: PCIe-Steckkarte mit Volta-Grafikchip und 16 GByte Speicher angekündigt". heise online.
  21. GV100 Blockdiagramm in "GTC17: NVIDIA präsentiert die nächste GPU-Architektur Volta - Tesla V100 mit 5.120 Shadereinheiten und 16 GB HBM2" by Andreas Schilling on hardwareluxx.de on May 10, 2017
  22. 1 2 Angelini, Chris (14 September 2018). "Nvidia's Turing Architecture Explored: Inside the GeForce RTX 2080". Tom's Hardware . p. 7. Retrieved 28 February 2019. TU102 and TU104 are Nvidia's first desktop GPUs rocking the NVLink interconnect rather than a Multiple Input/Output (MIO) interface for SLI support. The former makes two x8 links available, while the latter is limited to one. Each link facilitates up to 50 GB/s of bidirectional bandwidth. So, GeForce RTX 2080 Ti is capable of up to 100 GB/s between cards and RTX 2080 can do half of that.
  23. Schilling, Andreas (22 June 2020). "A100 PCIe: NVIDIA GA100-GPU kommt auch als PCI-Express-Variante". Hardwareluxx. Retrieved 2 May 2023.
  24. "NVLINK AND NVSWITCH". www.nvidia.com. Retrieved 2021-02-07.
  25. 1 2 3 4 "A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think". 23 February 2024.
  26. Eliot Eshelman (January 26, 2017). "Comparing NVLink vs PCI-E with NVIDIA Tesla P100 GPUs on OpenPOWER Servers". microway.com.
  27. 1 2 "NVIDIA Quadro NVLink Grafikprozessor-Zusammenschaltung in Hochgeschwindigkeit". NVIDIA.
  28. 1 2 "Grafik neu erfunden: NVIDIA GeForce RTX 2080 Ti-Grafikkarte". NVIDIA.
  29. 1 2 "NVLink on NVIDIA GeForce RTX 2080 & 2080 Ti in Windows 10". Puget Systems. 5 October 2018.
  30. [ dead link ]
  31. Schilling, Andreas (5 February 2017). "NVIDIA präsentiert Quadro GP100 mit GP100-GPU und 16 GB HBM2". Hardwareluxx.
  32. "NVIDIA GeForce RTX 2080 Founders Edition Graphics Card". NVIDIA.
  33. 1 2 3 "NVIDIA Quadro Graphics Cards for Professional Design Workstations". NVIDIA.
  34. 1 2 "NVIDIA Quadro RTX 6000 und RTX 5000 Ready für Pre-Order". October 1, 2018.
  35. 1 2 3 "NVLink | pny.com". www.pny.com.
  36. "NVIDIA Quadro RTX 8000 Specs". TechPowerUp. 14 August 2023.
  37. "NvLink Methods". docs.nvidia.com.
  38. "NVIDIA Collective Communications Library (NCCL)". NVIDIA Developer. May 10, 2017.
  39. "Inside Pascal: NVIDIA's Newest Computing Platform". 2016-04-05.
  40. Anandtech.com
  41. NVIDIA Unveils the DGX-1 HPC Server: 8 Teslas, 3U, Q2 2016 by anandtech.com on April, 2016
  42. How NVLink Will Enable Faster, Easier Multi-GPU Computing by Mark Harris on November 14, 2014
  43. "Whitepaper: Summit and Sierra Supercomputers" (PDF). 2014-11-01.
  44. "Nvidia Volta, IBM POWER9 Land Contracts For New US Government Supercomputers". AnandTech. 2014-11-17.
  45. "RIP: Nvidia slams the final nail in SLI's coffin, no new profiles after 2020". PC World. 2020-09-18.