Tesla Dojo

Last updated

Tesla Dojo is a supercomputer designed and built by Tesla for computer vision video processing and recognition. [1] It will be used for training Tesla's machine learning models to improve its Full Self-Driving (FSD) advanced driver-assistance system. According to Tesla, it went into production in July 2023. [2]

Contents

Dojo's goal is to efficiently process millions of terabytes of video data captured from real-life driving situations from Tesla's 4+ million cars. [3] This goal led to a considerably different architecture than conventional supercomputer designs. [4] [5]

History

Tesla operates several massively parallel computing clusters for developing its Autopilot advanced driver assistance system. Its primary unnamed cluster using 5,760 Nvidia A100 graphics processing units (GPUs) was touted by Andrej Karpathy in 2021 at the fourth International Joint Conference on Computer Vision and Pattern Recognition (CCVPR 2021) to be "roughly the number five supercomputer in the world" [6] at approximately 81.6 petaflops, based on scaling the performance of the Nvidia Selene supercomputer, which uses similar components. [7] However, the performance of the primary Tesla GPU cluster has been disputed, as it was not clear if this was measured using single-precision or double-precision floating point numbers (FP32 or FP64). [8] Tesla also operates a second 4,032 GPU cluster for training and a third 1,752 GPU cluster for automatic labeling of objects. [9] [10]

The primary unnamed Tesla GPU cluster has been used for processing one million video clips, each ten seconds long, taken from Tesla Autopilot cameras operating in Tesla cars in the real world, running at 36 frames per second. Collectively, these video clips contained six billion object labels, with depth and velocity data; the total size of the data set was 1.5  petabytes. This data set was used for training a neural network intended to help Autopilot computers in Tesla cars understand roads. [6] By August 2022, Tesla had upgraded the primary GPU cluster to 7,360 GPUs. [11]

Dojo was first mentioned by Musk in April 2019 during Tesla's "Autonomy Investor Day". [12] In August 2020, [6] [13] Musk stated it was "about a year away" due to power and thermal issues. [14]

The defining goal of [Dojo] is scalability. We have de-emphasized several mechanisms that you find in typical CPUs, like coherency, virtual memory, and global lookup directories just because these mechanisms do not scale very well... Instead, we have relied on a very fast and very distributed SRAM [static random-access memory] storage throughout the mesh. And this is backed by an order of magnitude higher speed of interconnect than what you find in a typical distributed system.

  Emil Talpes, Tesla hardware engineer,2022 The Next Platform article [5]

Dojo was officially announced at Tesla's Artificial Intelligence (AI) Day on August 19, 2021. [15] Tesla revealed details of the D1 chip and its plans for "Project Dojo", a datacenter that would house 3,000 D1 chips; [16] the first "Training Tile" had been completed and delivered the week before. [9] In October 2021, Tesla released a "Dojo Technology" whitepaper describing the Configurable Float8 (CFloat8) and Configurable Float16 (CFloat16) floating point formats and arithmetic operations as an extension of Institute of Electrical and Electronics Engineers (IEEE) standard 754. [17]

At the follow-up AI Day in September 2022, Tesla announced it had built several System Trays and one Cabinet. During a test, the company stated that Project Dojo drew 2.3  megawatts (MW) of power before tripping a local San Jose, California power substation. [18] At the time, Tesla was assembling one Training Tile per day. [10]

In August 2023, Tesla powered on Dojo for production use as well as a new training cluster configured with 10,000 Nvidia H100 GPUs. [19] [ unreliable source? ]

In January 2024, Musk described Dojo as "a long shot worth taking because the payoff is potentially very high. But it's not something that is a high probability." [20]

Reception

Various analysts have stated Dojo "is impressive, but it won't transform supercomputing", [4] "is a game-changer because it has been developed completely in-house", [21] "will massively accelerate the development of autonomous vehicles", [22] and "could be a game changer for the future of Tesla FSD and for AI more broadly." [1]

On September 11, 2023, Morgan Stanley increased its target price for Tesla stock (TSLA) to US$400 from a prior target of $250 and called the stock its top pick in the electric vehicle sector, stating that Tesla’s Dojo supercomputer could fuel a $500 billion jump in Tesla’s market value. [23]

Technical architecture

The fundamental unit of the Dojo supercomputer is the D1 chip, [24] designed by a team at Tesla led by ex-AMD CPU designer Ganesh Venkataramanan, including Emil Talpes, Debjit Das Sarma, Douglas Williams, Bill Chang, and Rajiv Kurian. [5]

The D1 chip is manufactured by the Taiwan Semiconductor Manufacturing Company (TSMC) using 7 nanometer (nm) semiconductor nodes, has 50 billion transistors and a large die size of 645 mm2 (1.0 square inch). [25]

Updating at Artificial Intelligence (AI) Day in 2022, Tesla announced that Dojo would scale by deploying multiple ExaPODs, in which there would be: [22]

Tesla Dojo architecture overview Tesla Dojo architecture.svg
Tesla Dojo architecture overview

According to Venkataramanan, Tesla's senior director of Autopilot hardware, Dojo will have more than an exaflop (a million teraflops) of computing power. [26] For comparison, according to Nvidia, in August 2021, the (pre-Dojo) Tesla AI-training center used 720 nodes, each with eight Nvidia A100 Tensor Core GPUs for 5,760 GPUs in total, providing up to 1.8 exaflops of performance. [27]

D1 chip

Each node (computing core) of the D1 processing chip is a general purpose 64-bit CPU with a superscalar core. It supports internal instruction-level parallelism, and includes simultaneous multithreading (SMT). It doesn't support virtual memory and uses limited memory protection mechanisms. Dojo software/applications manage chip resources.

Microarchitecture of a node on the D1 chip Dojo node uarch.svg
Microarchitecture of a node on the D1 chip

The D1 instruction set supports both 64-bit scalar and 64-byte single instruction, multiple data (SIMD) vector instructions. [28] The integer unit mixes reduced instruction set computer (RISC-V) and custom instructions, supporting 8, 16, 32, or 64 bit integers. The custom vector math unit is optimized for machine learning kernels and supports multiple data formats, with a mix of precisions and numerical ranges, many of which are compiler composable. [5] Up to 16 vector formats can be used simultaneously. [5]

Node

Each D1 node uses a 32-byte fetch window holding up to eight instructions. These instructions are fed to an eight-wide decoder which supports two threads per cycle, followed by a four-wide, four-way SMT scalar scheduler that has two integer units, two address units, and one register file per thread. Vector instructions are passed further down the pipeline to a dedicated vector scheduler with two-way SMT, which feeds either a 64-byte SIMD unit or four 8×8×4 matrix multiplication units. [28]

The network on-chip (NOC) router links cores into a two-dimensional mesh network. It can send one packet in and one packet out in all four directions to/from each neighbor node, along with one 64-byte read and one 64-byte write to local SRAM per clock cycle. [28]

Hardware native operations transfer data, semaphores and barrier constraints across memories and CPUs. System-wide double data rate 4 (DDR4) synchronous dynamic random-access memory (SDRAM) memory works like bulk storage.

Memory

Each core has a 1.25  megabytes (MB) of SRAM main memory. Load and store speeds reach 400  gigabytes (GB) per second and 270 GB/sec, respectively. The chip has explicit core-to-core data transfer instructions. Each SRAM has a unique list parser that feeds a pair of decoders and a gather engine that feeds the vector register file, which together can directly transfer information across nodes. [5]

Die

Twelve nodes (cores) are grouped into a local block. Nodes are arranged in an 18×20 array on a single die, of which 354 cores are available for applications. [5] The die runs at 2  gigahertz (GHz) and totals 440 MB of SRAM (360 cores × 1.25 MB/core). [5] It reaches 376 teraflops using 16-bit brain floating point (BF16) numbers or using configurable 8-bit floating point (CFloat8) numbers, which is a Tesla proposal, [17] and 22 teraflops at FP32.

Each die comprises 576 bi-directional serializer/deserializer (SerDes) channels along the perimeter to link to other dies, and moves 8 TB/sec across all four die edges. [5] Each D1 chip has a thermal design power of approximately 400 watts. [29]

Training Tile

Tesla Dojo tile Tesla Dojo supercomputer tile.jpg
Tesla Dojo tile

The water-cooled Training Tile packages 25 D1 chips into a 5×5 array. [5] Each tile supports 36 TB/sec of aggregate bandwidth via 40 input/output (I/O) chips - half the bandwidth of the chip mesh network. Each tile supports 10 TB/sec of on-tile bandwidth. Each tile has 11 GB of SRAM memory (25 D1 chips × 360 cores/D1 × 1.25 MB/core). Each tile achieves 9 petaflops at BF16/CFloat8 precision (25 D1 chips × 376 TFLOP/D1). Each tile consumes 15 kilowatts; [5] 288  amperes at 52  volts. [29]

System Tray

Six tiles are aggregated into a System Tray, which is integrated with a host interface. Each host interface includes 512 x86 cores, providing a Linux-based user environment. [18] Previously, the Dojo System Tray was known as the Training Matrix, which includes six Training Tiles, 20 Dojo Interface Processor cards across four host servers, and Ethernet-linked adjunct servers. It has 53,100 D1 cores.

Dojo Interface Processor

Dojo Interface Processor cards (DIP) sit on the edges of the tile arrays and are hooked into the mesh network. Host systems power the DIPs and perform various system management functions. A DIP memory and I/O co-processor hold 32 GB of shared HBM (either HBM2e or HBM3) – as well as Ethernet interfaces that sidestep the mesh network. Each DIP card has 2 I/O processors with 4 memory banks totaling 32 GB with 800 GB/sec of bandwidth.

The DIP plugs into a PCI-Express 4.0 x16 slot that offers 32 GB/sec of bandwidth per card. Five cards per tile edge offer 160 GB/sec of bandwidth to the host servers and 4.5 TB/sec to the tile.

Tesla Transport Protocol

Tesla Transport Protocol (TTP) is a proprietary interconnect over PCI-Express. A 50 GB/sec TTP protocol link runs over Ethernet to access either a single 400 Gb/sec port or a paired set of 200 Gb/sec ports. Crossing the entire two-dimensional mesh network might take 30 hops, while TTP over Ethernet takes only four hops (at lower bandwidth), reducing vertical latency.

Cabinet and ExaPOD

Dojo stacks tiles vertically in a cabinet to minimize the distance and communications time between them. The Dojo ExaPod system includes 120 tiles, totaling 1,062,000 usable cores, reaching 1 exaflops at BF16 and CFloat8 formats. It has 1.3 TB of on-tile SRAM memory and 13 TB of dual in-line high bandwidth memory (HBM).

Software

Dojo supports the framework PyTorch, "Nothing as low level as C or C++, nothing remotely like CUDA". [5] The SRAM presents as a single address space. [5]

Because FP32 has more precision and range than needed for AI tasks, and FP16 does not have enough, Tesla has devised 8- and 16-bit configurable floating point formats (CFloat8 and CFloat16, respectively) which allow the compiler to dynamically set mantissa and exponent precision, accepting lower precision in return for faster vector processing and reduced storage requirements. [5] [17]

Related Research Articles

<span class="mw-page-title-main">Nvidia</span> American multinational technology company

Nvidia Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. It is a software and fabless company which designs and supplies graphics processing units (GPUs), application programming interfaces (APIs) for data science and high-performance computing as well as system on a chip units (SoCs) for the mobile computing and automotive market. Nvidia is also a dominant supplier of artificial intelligence (AI) hardware and software.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">Ohio Supercomputer Center</span> Supercomputer facility at Ohio State University

The Ohio Supercomputer Center (OSC) is a supercomputer facility located on the western end of the Ohio State University campus, just north of Columbus. Established in 1987, the OSC partners with Ohio universities, labs and industries, providing students and researchers with high performance computing, advanced cyberinfrastructure, research and computational science education services.

<span class="mw-page-title-main">Irish Centre for High-End Computing</span> National high-performance computing centre in Ireland

The Irish Centre for High-End Computing (ICHEC) is the national high-performance computing centre in Ireland. It was established in 2005 and provides supercomputing resources, support, training and related services. ICHEC is involved in education and training, including providing courses for researchers.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

<span class="mw-page-title-main">POWER8</span> 2014 family of multi-core microprocessors by IBM

POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors.

<span class="mw-page-title-main">Fermi (microarchitecture)</span> GPU microarchitecture by Nvidia

Fermi is the codename for a graphics processing unit (GPU) microarchitecture developed by Nvidia, first released to retail in April 2010, as the successor to the Tesla microarchitecture. It was the primary microarchitecture used in the GeForce 400 series and GeForce 500 series. All desktop Fermi GPUs were manufactured in 40nm, mobile Fermi GPUs in 40nm and 28nm. Fermi is the oldest microarchitecture from NVIDIA that received support for Microsoft's rendering API Direct3D 12 feature_level 11.

<span class="mw-page-title-main">POWER9</span> 2017 family of multi-core microprocessors by IBM

POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is a discontinued line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">NVLink</span> High speed chip interconnect

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 17, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

<span class="mw-page-title-main">High Bandwidth Memory</span> Type of memory used on processors that require high transfer rate memory

High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.

<span class="mw-page-title-main">Summit (supercomputer)</span> Supercomputer developed by IBM

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, capable of 200 petaFLOPS thus making it the 5th fastest supercomputer in the world after Frontier (OLCF-5), Fugaku, LUMI, and Leonardo, with Frontier being the fastest. It held the number 1 position from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.

Nvidia Drive is a computer platform by Nvidia, aimed at providing autonomous car and driver assistance functionality powered by deep learning. The platform was introduced at the Consumer Electronics Show (CES) in Las Vegas in January 2015. An enhanced version, the Drive PX 2 was introduced at CES a year later, in January 2016.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

Nvidia DGX is a line of Nvidia-produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications. The typical design of a DGX system is based upon a rackmount chassis with motherboard that carries high performance x86 server CPUs. The main component of a DGX system is a set of 4 to 8 Nvidia Tesla GPU modules on an independent system board. DGX systems have large heatsinks and powerful fans to adequately cool thousands of watts of thermal output. The GPU modules are typically integrated into the system using a version of the SXM socket or by a PCIe x16 slot.

<span class="mw-page-title-main">AMD Instinct</span> Brand name by AMD; professional GPUs for high-performance-computing, machine learning

AMD Instinct is AMD's brand of professional GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is parallel to Ada Lovelace. It's the latest generation of Nvidia Tesla.

Christofari — are Christofari (2019), Christofari Neo (2021) supercomputers of Sberbank based on Nvidia corporation hardware Sberbank of Russia and Nvidia. Their main purpose is neural network learning. They are also used for scientific research and commercial calculations.

<span class="mw-page-title-main">Leonardo (supercomputer)</span> Supercomputer in Italy

Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.

References

  1. 1 2 Bleakley, Daniel (2023-06-22). "Tesla to start building its FSD training supercomputer "Dojo" next month". The Driven. Retrieved 2023-06-30.
  2. "Tesla jumps as analyst predicts $600 billion value boost from Dojo". Reuters. 2023-09-11. Retrieved 2023-09-11.
  3. Dickens, Steven (September 11, 2023). "Tesla's Dojo Supercomputer: A Paradigm Shift In Supercomputing?". Forbes. Retrieved 2023-09-12.
  4. 1 2 Vigliarolo, Brandon (August 25, 2021). "Tesla's Dojo is impressive, but it won't transform supercomputing". TechRepublic. Retrieved August 25, 2021.
  5. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Morgan, Timothy Prickett (August 23, 2022). "Inside Tesla's innovative and homegrown 'Dojo' AI supercomputer". The Next Platform. Retrieved 12 April 2023.
  6. 1 2 3 Peckham, Oliver (June 22, 2021). "Ahead of 'Dojo,' Tesla Reveals Its Massive Precursor Supercomputer". HPCwire.
  7. Swinhoe, Dan (June 23, 2021). "Tesla details pre-Dojo supercomputer, could be up to 80 petaflops". Data Center Dynamics. Retrieved 14 April 2023.
  8. Raden, Neil (September 28, 2021). "Tesla's Dojo supercomputer - sorting out fact from hype". diginomica. Retrieved 14 April 2023.
  9. 1 2 Swinhoe, Dan (August 20, 2021). "Tesla details Dojo supercomputer, reveals Dojo D1 chip and training tile module". Data Center Dynamics. Retrieved 14 April 2023.
  10. 1 2 "Tesla begins installing Dojo supercomputer cabinets, trips local substation". Data Center Dynamics. October 3, 2022. Retrieved 14 April 2023.
  11. Trader, Tiffany (August 16, 2022). "Tesla Bulks Up Its GPU-Powered AI Super — Is Dojo Next?". HPCwire. Retrieved 14 April 2023.
  12. Brown, Mike (August 19, 2020). "Tesla Dojo: Why Elon Musk says full self-driving is set for 'quantum leap'". Inverse. Archived from the original on February 25, 2021. Retrieved September 5, 2021.
  13. Elon Musk [@elonmusk] (August 14, 2020). "Tesla is developing a NN training computer called Dojo to process truly vast amounts of video data. It's a beast! Please consider joining our AI or computer/chip teams if this sounds interesting" (Tweet) via Twitter.
  14. Elon Musk [@elonmusk] (August 19, 2020). "Dojo V1.0 isn't done yet. About a year away. Not just about the chips. Power & cooling problem is hard" (Tweet) via Twitter.
  15. Jin, Hyunjoo (August 20, 2021). "Musk says Tesla likely to launch humanoid robot prototype next year". Reuters . Retrieved August 20, 2021.
  16. Morris, James (August 20, 2021). "Elon Musk Aims To End Employment As We Know It With A Robot Humanoid" . Forbes. Retrieved 13 April 2023.
  17. 1 2 3 "Tesla Dojo Technology: A Guide to Tesla's Configurable Floating Point Formats & Arithmetic" (PDF). Tesla, Inc. Archived from the original (PDF) on October 12, 2021.
  18. 1 2 Lambert, Fred (October 1, 2022). "Tesla unveils new Dojo supercomputer so powerful it tripped the power grid". Electrek. Retrieved 13 April 2023.
  19. Shilov, Anton (2023-08-28). "Tesla's $300 Million AI Cluster Is Going Live Today". Tom's Hardware. Retrieved 2023-09-03.
  20. Kolodny, Lora (2024-03-21). "Elon Musk companies are gobbling up Nvidia hardware even as Tesla aims to build rival supercomputer". CNBC. Retrieved 2024-03-22.
  21. Shetty, Kamalesh Mohanarangam, Amrita (2022-09-02). "Tesla's Dojo Supercomputer: A Game-Changer in the Quest for Fully Autonomous Vehicles". Frost & Sullivan. Retrieved 2023-06-30.{{cite web}}: CS1 maint: multiple names: authors list (link)
  22. 1 2 Morris, James (October 6, 2022). "Tesla's Biggest News At AI Day Was The Dojo Supercomputer, Not The Optimus Robot" . Forbes. Retrieved 13 April 2023.
  23. Thorbecke, Catherine (2023-09-11). "Tesla shares jump after Morgan Stanley predicts Dojo supercomputer could add $500 billion in market value | CNN Business". CNN. Retrieved 2023-09-12.
  24. Bellan, Rebecca; Alamalhodaei, Aria (August 20, 2021). "Top four highlights of Elon Musk's Tesla AI Day". techcrunch.com. Techcrunch. Retrieved August 20, 2021.
  25. Kostovic, Aleksandar (2021-08-20). "Tesla Packs 50 Billion Transistors Onto D1 Dojo Chip Designed to Conquer Artificial Intelligence Training". Tom's Hardware. Retrieved 2023-06-30.
  26. Novet, Jordan (August 20, 2021). "Tesla unveils chip to train A.I. models inside its data centers". cnbc.com. CNBC. Retrieved August 20, 2021.
  27. Shahan, Zachary (August 19, 2021). "NVIDIA: Tesla's AI-Training Supercomputers Powered By Our GPUs". CleanTechnica . Archived from the original on August 19, 2021.
  28. 1 2 3 Talpes, Emil; Sarma, Debjit Das; Williams, Doug; Arora, Sahil; Kunjan, Thomas; Floering, Benjamin; Jalote, Ankit; Hsiong, Christopher; Poorna, Chandrasekhar; Samant, Vaidehi; Sicilia, John; Nivarti, Anantha Kumar; Ramachandran, Raghuvir; Fischer, Tim; Herzberg, Ben (2023-05-15). "The Microarchitecture of DOJO, Tesla's Exa-Scale Computer". IEEE Micro. 43 (3): 31–39. doi:10.1109/MM.2023.3258906. ISSN   0272-1732.
  29. 1 2 Hamilton, James (August 2021). "Tesla Project Dojo Overview". Perspectives.