Tensor Processing Unit

Last updated
Tensor Processing Unit
Tensor Processing Unit 3.0.jpg
Tensor Processing Unit 3.0
Designer Google
Introduced2015 [1]
Type Neural network
Machine learning

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. [2] Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

Contents

Comparison to CPUs and GPUs

Compared to a graphics processing unit, TPUs are designed for a high volume of low precision computation (e.g. as little as 8-bit precision) [3] with more input/output operations per joule, without hardware for rasterisation/texture mapping. [4] The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center rack, according to Norman Jouppi. [5]

Different types of processors are suited for different types of machine learning models. TPUs are well suited for CNNs, while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages for RNNs. [6]

History

According to Jonathan Ross, one of the original TPU engineers, [1] and later the founder of Groq, three separate groups at Google were developing AI accelerators, with the TPU being the design that was ultimately selected. He was not aware of systolic arrays at the time and upon learning the term thought "Oh, that's called a systolic array? It just seemed to make sense." [7]

The tensor processing unit was announced in May 2016 at Google I/O, when the company said that the TPU had already been used inside their data centers for over a year. [5] [4] Google's 2017 paper describing its creation cites previous systolic matrix multipliers of similar architecture built in the 1990s. [8] The chip has been specifically designed for Google's TensorFlow framework, a symbolic math library which is used for machine learning applications such as neural networks. [9] However, as of 2017 Google still used CPUs and GPUs for other types of machine learning. [5] Other AI accelerator designs are appearing from other vendors also and are aimed at embedded and robotics markets.

Google's TPUs are proprietary. Some models are commercially available, and on February 12, 2018, The New York Times reported that Google "would allow other companies to buy access to those chips through its cloud-computing service." [10] Google has said that they were used in the AlphaGo versus Lee Sedol series of human-versus-machine Go games, [4] as well as in the AlphaZero system, which produced Chess, Shogi and Go playing programs from the game rules alone and went on to beat the leading programs in those games. [11] Google has also used TPUs for Google Street View text processing and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day. [5] It is also used in RankBrain which Google uses to provide search results. [12]

Google provides third parties access to TPUs through its Cloud TPU service as part of the Google Cloud Platform [13] and through its notebook-based services Kaggle and Colaboratory. [14] [15]

Products

Tensor Processing Unit products [16] [17] [18]
TPUv1TPUv2TPUv3TPUv4 [17] [19] TPUv5e [20] TPUv5p [21] [22] v6e (Trillium) [23] [24]
Date introduced2015201720182021202320232024
Process node 28 nm16 nm16 nm7 nmUnstatedUnstated
Die size (mm2)331< 625< 700< 400300-350Unstated
On-chip memory (MiB)2832323248112
Clock speed (MHz)7007009401050Unstated1750
Memory8 GiB DDR3 16 GiB HBM 32 GiB HBM32 GiB HBM16 GB HBM95 GB HBM32 GB
Memory bandwidth34 GB/s600 GB/s900 GB/s1200 GB/s819 GB/s2765 GB/s1640 GB/s
TDP (W)75280220170Not ListedNot Listed
TOPS (Tera Operations Per Second)2345123275197 (bf16)

393 (int8)

459 (bf16)

918 (int8)

918 (bf16)

1836 (int8)

TOPS/W0.310.160.561.62Not ListedNot Listed

First generation TPU

The first-generation TPU is an 8-bit matrix multiplication engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 331  mm 2. The clock speed is 700  MHz and it has a thermal design power of 28–40  W. It has 28  MiB of on chip memory, and 4  MiB of 32-bit accumulators taking the results of a 256×256 systolic array of 8-bit multipliers. [8] Within the TPU package is 8  GiB of dual-channel 2133 MHz DDR3 SDRAM offering 34 GB/s of bandwidth. [18] Instructions transfer data to or from the host, perform matrix multiplications or convolutions, and apply activation functions. [8]

Second generation TPU

The second-generation TPU was announced in May 2017. [25] Google stated the first-generation TPU design was limited by memory bandwidth and using 16 GB of High Bandwidth Memory in the second-generation design increased bandwidth to 600 GB/s and performance to 45 teraFLOPS. [18] The TPUs are then arranged into four-chip modules with a performance of 180 teraFLOPS. [25] Then 64 of these modules are assembled into 256-chip pods with 11.5 petaFLOPS of performance. [25] Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate in floating point, introducing the bfloat16 format invented by Google Brain. This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on the Google Compute Engine for use in TensorFlow applications. [26]

Third generation TPU

The third-generation TPU was announced on May 8, 2018. [27] Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation. [28] [29] This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.

Fourth generation TPU

On May 18, 2021, Google CEO Sundar Pichai spoke about TPU v4 Tensor Processing Units during his keynote at the Google I/O virtual conference. TPU v4 improved performance by more than 2x over TPU v3 chips. Pichai said "A single v4 pod contains 4,096 v4 chips, and each pod has 10x the interconnect bandwidth per chip at scale, compared to any other networking technology.” [30] An April 2023 paper by Google claims TPU v4 is 5-87% faster than an Nvidia A100 at machine learning benchmarks. [31]

There is also an "inference" version, called v4i, [32] that does not require liquid cooling. [33]

Fifth generation TPU

In 2021, Google revealed the physical layout of TPU v5 is being designed with the assistance of a novel application of deep reinforcement learning. [34] Google claims TPU v5 is nearly twice as fast as TPU v4, [35] and based on that and the relative performance of TPU v4 over A100, some speculate TPU v5 as being as fast as or faster than an H100. [36]

Similar to the v4i being a lighter-weight version of the v4, the fifth generation has a "cost-efficient" [37] version called v5e. [20] In December 2023, Google announced TPU v5p which is claimed to be competitive with the H100. [38]

Sixth generation TPU

In May 2024, at the Google I/O conference, Google announced TPU v6, which became available in preview in October 2024. [39] Google claimed a 4.7 times performance increase relative to TPU v5e, [40] via larger matrix multiplication units and an increased clock speed. High bandwidth memory (HBM) capacity and bandwidth have also doubled. A pod can contain up to 256 Trillium units. [41]

Edge TPU

In July 2018, Google announced the Edge TPU. The Edge TPU is Google's purpose-built ASIC chip designed to run machine learning (ML) models for edge computing, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs [42] ). In January 2019, Google made the Edge TPU available to developers with a line of products under the Coral brand. The Edge TPU is capable of 4 trillion operations per second with 2 W of electrical power. [43]

The product offerings include a single-board computer (SBC), a system on module (SoM), a USB accessory, a mini PCI-e card, and an M.2 card. The SBC Coral Dev Board and Coral SoM both run Mendel Linux OS – a derivative of Debian. [44] [45] The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (including Raspberry Pi).

The machine learning runtime used to execute models on the Edge TPU is based on TensorFlow Lite. [46] The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU [47] ). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to either be trained using the TensorFlow quantization-aware training technique, or since late 2019 it's also possible to use post-training quantization.

On November 12, 2019, Asus announced a pair of single-board computer (SBCs) featuring the Edge TPU. The Asus Tinker Edge T and Tinker Edge R Board designed for IoT and edge AI. The SBCs officially support Android and Debian operating systems. [48] [49] ASUS has also demonstrated a mini PC called Asus PN60T featuring the Edge TPU. [50]

On January 2, 2020, Google announced the Coral Accelerator Module and Coral Dev Board Mini, to be demonstrated at CES 2020 later the same month. The Coral Accelerator Module is a multi-chip module featuring the Edge TPU, PCIe and USB interfaces for easier integration. The Coral Dev Board Mini is a smaller SBC featuring the Coral Accelerator Module and MediaTek 8167s SoC. [51] [52]

Pixel Neural Core

On October 15, 2019, Google announced the Pixel 4 smartphone, which contains an Edge TPU called the Pixel Neural Core. Google describe it as "customized to meet the requirements of key camera features in Pixel 4", using a neural network search that sacrifices some accuracy in favor of minimizing latency and power use. [53]

Google Tensor

Google followed the Pixel Neural Core by integrating an Edge TPU into a custom system-on-chip named Google Tensor, which was released in 2021 with the Pixel 6 line of smartphones. [54] The Google Tensor SoC demonstrated "extremely large performance advantages over the competition" in machine learning-focused benchmarks; although instantaneous power consumption also was relatively high, the improved performance meant less energy was consumed due to shorter periods requiring peak performance. [55]

Lawsuit

In 2019, Singular Computing, founded in 2009 by Joseph Bates, a visiting professor at MIT, [56] filed suit against Google alleging patent infringement in TPU chips. [57] By 2020, Google had successfully lowered the number of claims the court would consider to just two: claim 53 of US 8407273   filed in 2012 and claim 7 of US 9218156   filed in 2013, both of which claim a dynamic range of 10-6 to 106 for floating point numbers, which the standard float16 cannot do (without resorting to subnormal numbers) as it only has five bits for the exponent. In a 2023 court filing, Singular Computing specifically called out Google's use of bfloat16, as that exceeds the dynamic range of float16. [58] Singular claims non-standard floating point formats were non-obvious in 2009, but Google retorts that the VFLOAT [59] format, with configurable number of exponent bits, existed as prior art in 2002. [60] By January 2024, subsequent lawsuits by Singular had brought the number of patents being litigated up to eight. Towards the end of the trial later that month, Google agreed to a settlement with undisclosed terms. [61] [62]

See also

Related Research Articles

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

Processor may refer to:

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">Nvidia Tesla</span> Nvidias line of general purpose GPUs

Nvidia Tesla is the former name for a line of products developed by Nvidia targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 series, and have continued to accompany the release of new chips. They are programmable using the CUDA or OpenCL APIs.

<span class="mw-page-title-main">Pascal (microarchitecture)</span> GPU microarchitecture by Nvidia

Pascal is the codename for a GPU microarchitecture developed by Nvidia, as the successor to the Maxwell architecture. The architecture was first introduced in April 2016 with the release of the Tesla P100 (GP100) on April 5, 2016, and is primarily used in the GeForce 10 series, starting with the GeForce GTX 1080 and GTX 1070, which were released on May 27, 2016, and June 10, 2016, respectively. Pascal was manufactured using TSMC's 16 nm FinFET process, and later Samsung's 14 nm FinFET process.

<span class="mw-page-title-main">Volta (microarchitecture)</span> GPU microarchitecture by Nvidia

Volta is the codename, but not the trademark, for a GPU microarchitecture developed by Nvidia, succeeding Pascal. It was first announced on a roadmap in March 2013, although the first product was not announced until May 2017. The architecture is named after 18th–19th century Italian chemist and physicist Alessandro Volta. It was Nvidia's first chip to feature Tensor Cores, specially designed cores that have superior deep learning performance over regular CUDA cores. The architecture is produced with TSMC's 12 nm FinFET process. The Ampere microarchitecture is the successor to Volta.

Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, data storage, data analytics, and machine learning, alongside a set of management tools. It runs on the same infrastructure that Google uses internally for its end-user products, such as Google Search, Gmail, and Google Docs, according to Verma et al. Registration requires a credit card or bank account details.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for training and inference of neural networks. It is one of the most popular deep learning frameworks, alongside others such as PyTorch and PaddlePaddle. It is free and open-source software released under the Apache License 2.0.

<span class="mw-page-title-main">Movidius</span> American computer processor chip design company

Movidius is a company based in San Mateo, California, that designs low-power processor chips for computer vision. The company was acquired by Intel in September 2016.

A vision processing unit (VPU) is an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks.

An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">AMD Instinct</span> Brand of data center GPUs by AMD

AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

Google AI is a division of Google dedicated to artificial intelligence. It was announced at Google I/O 2017 by CEO Sundar Pichai.

The Pixel Visual Core (PVC) is a series of ARM-based system in package (SiP) image processors designed by Google. The PVC is a fully programmable image, vision and AI multi-core domain-specific architecture (DSA) for mobile devices and in future for IoT. It first appeared in the Google Pixel 2 and 2 XL which were introduced on October 19, 2017. It has also appeared in the Google Pixel 3 and 3 XL. Starting with the Pixel 4, this chip was replaced with the Pixel Neural Core.

<span class="mw-page-title-main">Hopper (microarchitecture)</span> GPU microarchitecture designed by Nvidia

Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.

<span class="mw-page-title-main">CDNA (microarchitecture)</span> AMD compute-focused GPU microarchitecture

CDNA is a compute-centered graphics processing unit (GPU) microarchitecture designed by AMD for datacenters. Mostly used in the AMD Instinct line of data center graphics cards, CDNA is a successor to the Graphics Core Next (GCN) microarchitecture; the other successor being RDNA, a consumer graphics focused microarchitecture.

A domain-specific architecture (DSA) is a programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain. The term is often used in contrast to general-purpose architectures, such as CPUs, that are designed to operate on any computer program.

<span class="mw-page-title-main">Groq</span> American technology company

Groq, Inc. is an American artificial intelligence (AI) company that builds an AI accelerator application-specific integrated circuit (ASIC) that they call the Language Processing Unit (LPU) and related hardware to accelerate the inference performance of AI workloads.

References

  1. 1 2 Jouppi, Norman; et al. (2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit". Proceedings of the 44th annual international symposium on computer architecture. International Symposium on Computer Architecture. Toronto: Association for Computing Machinery. pp. 1–12. doi:10.1145/3079856.3080246.
  2. "Cloud Tensor Processing Units (TPUs)". Google Cloud. Retrieved 20 July 2020.
  3. Armasu, Lucian (2016-05-19). "Google's Big Chip Unveil For Machine Learning: Tensor Processing Unit With 10x Better Efficiency (Updated)". Tom's Hardware. Retrieved 2016-06-26.
  4. 1 2 3 Jouppi, Norm (May 18, 2016). "Google supercharges machine learning tasks with TPU custom chip". Google Cloud Platform Blog. Retrieved 2017-01-22.
  5. 1 2 3 4 "Google's Tensor Processing Unit explained: this is what the future of computing looks like". TechRadar. Retrieved 2017-01-19.
  6. Wang, Yu Emma; Wei, Gu-Yeon; Brooks, David (2019-07-01). "Benchmarking TPU, GPU, and CPU Platforms for Deep Learning". arXiv: 1907.10701 [cs.LG].
  7. Tensor Processing Unit on LinkedIn
  8. 1 2 3 Jouppi, Norman P.; Young, Cliff; Patil, Nishant; Patterson, David; Agrawal, Gaurav; Bajwa, Raminder; Bates, Sarah; Bhatia, Suresh; Boden, Nan; Borchers, Al; Boyle, Rick; Cantin, Pierre-luc; Chao, Clifford; Clark, Chris; Coriell, Jeremy; Daley, Mike; Dau, Matt; Dean, Jeffrey; Gelb, Ben; Ghaemmaghami, Tara Vazir; Gottipati, Rajendra; Gulland, William; Hagmann, Robert; Ho, C. Richard; Hogberg, Doug; Hu, John; Hundt, Robert; Hurt, Dan; Ibarz, Julian; Jaffey, Aaron; Jaworski, Alek; Kaplan, Alexander; Khaitan, Harshit; Koch, Andy; Kumar, Naveen; Lacy, Steve; Laudon, James; Law, James; Le, Diemthu; Leary, Chris; Liu, Zhuyuan; Lucke, Kyle; Lundin, Alan; MacKean, Gordon; Maggiore, Adriana; Mahony, Maire; Miller, Kieran; Nagarajan, Rahul; Narayanaswami, Ravi; Ni, Ray; Nix, Kathy; Norrie, Thomas; Omernick, Mark; Penukonda, Narayana; Phelps, Andy; Ross, Jonathan; Ross, Matt; Salek, Amir; Samadiani, Emad; Severn, Chris; Sizikov, Gregory; Snelham, Matthew; Souter, Jed; Steinberg, Dan; Swing, Andy; Tan, Mercedes; Thorson, Gregory; Tian, Bo; Toma, Horia; Tuttle, Erick; Vasudevan, Vijay; Walter, Richard; Wang, Walter; Wilcox, Eric; Yoon, Doe Hyun (June 26, 2017). In-Datacenter Performance Analysis of a Tensor Processing Unit™. Toronto, Canada. arXiv: 1704.04760 .
  9. "TensorFlow: Open source machine learning" "It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
  10. Metz, Cade (12 February 2018). "Google Makes Its Special A.I. Chips Available to Others". The New York Times. Retrieved 2018-02-12.
  11. McGourty, Colin (6 December 2017). "DeepMind's AlphaZero crushes chess". chess24.com.
  12. "Google's Tensor Processing Unit could advance Moore's Law 7 years into the future". PCWorld. Retrieved 2017-01-19.
  13. "Frequently Asked Questions | Cloud TPU". Google Cloud. Retrieved 2021-01-14.
  14. "Google Colaboratory". colab.research.google.com. Retrieved 2021-05-15.
  15. "Use TPUs | TensorFlow Core". TensorFlow. Retrieved 2021-05-15.
  16. Jouppi, Norman P.; Yoon, Doe Hyun; Ashcraft, Matthew; Gottscho, Mark (June 14, 2021). Ten lessons from three generations that shaped Google's TPUv4i (PDF). International Symposium on Computer Architecture. Valencia, Spain. doi:10.1109/ISCA52012.2021.00010. ISBN   978-1-4503-9086-6.
  17. 1 2 "System Architecture | Cloud TPU". Google Cloud. Retrieved 2022-12-11.
  18. 1 2 3 Kennedy, Patrick (22 August 2017). "Case Study on the Google TPU and GDDR5 from Hot Chips 29". Serve The Home. Retrieved 23 August 2017.
  19. Stay tuned, more information on TPU v4 is coming soon, retrieved 2020-08-06.
  20. 1 2 Cloud TPU v5e Inference Public Preview, retrieved 2023-11-06.
  21. Cloud TPU v5p Google Cloud. retrieved 2024-04-09
  22. Cloud TPU v5p Training, retrieved 2024-04-09.
  23. "Introducing Trillium, sixth-generation TPUs". Google Cloud Blog. Retrieved 2024-05-29.
  24. "TPU v6e". Google Cloud. Retrieved 2024-11-10.
  25. 1 2 3 Bright, Peter (17 May 2017). "Google brings 45 teraflops tensor flow processors to its compute cloud". Ars Technica. Retrieved 30 May 2017.
  26. Kennedy, Patrick (17 May 2017). "Google Cloud TPU Details Revealed". Serve The Home. Retrieved 30 May 2017.
  27. Frumusanu, Andre (8 May 2018). "Google I/O Opening Keynote Live-Blog" . Retrieved 9 May 2018.
  28. Feldman, Michael (11 May 2018). "Google Offers Glimpse of Third-Generation TPU Processor". Top 500. Retrieved 14 May 2018.
  29. Teich, Paul (10 May 2018). "Tearing Apart Google's TPU 3.0 AI Coprocessor". The Next Platform. Retrieved 14 May 2018.
  30. "Google Launches TPU v4 AI Chips". www.hpcwire.com. 20 May 2021. Retrieved June 7, 2021.
  31. Jouppi, Norman (2023-04-20). "TPU v4: An Optically Reconfigurable Supercomputer for Machine Learning with Hardware Support for Embeddings". arXiv: 2304.01433 [cs.AR].
  32. Kennedy, Patrick (2023-08-29). "Google Details TPUv4 and its Crazy Optically Reconfigurable AI Network". servethehome.com. Retrieved 2023-12-16.
  33. "Why did Google develop its own TPU chip? In-depth disclosure of team members". censtry.com. 2021-10-20. Retrieved 2023-12-16.
  34. Mirhoseini, Azalia; Goldie, Anna (2021-06-01). "A graph placement methodology for fast chip design" (PDF). Nature . 594 (7962): 207–212. doi:10.1038/s41586-022-04657-6. PMID   35361999. S2CID   247855593 . Retrieved 2023-06-04.
  35. Vahdat, Amin (2023-12-06). "Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer" . Retrieved 2024-04-08.
  36. Afifi-Sabet, Keumars (2023-12-23). "Google is rapidly turning into a formidable opponent to BFF Nvidia — the TPU v5p AI chip powering its hypercomputer is faster and has more memory and bandwidth than ever before, beating even the mighty H100". TechRadar . Retrieved 2024-04-08.
  37. "Expanding our AI-optimized infrastructure portfolio: Introducing Cloud TPU v5e and announcing A3 GA". 2023-08-29. Retrieved 2023-12-16.
  38. "Enabling next-generation AI workloads: Announcing TPU v5p and AI Hypercomputer". 2023-12-06. Retrieved 2024-04-09.
  39. Lohmeyer, Mark (2024-10-30). "Powerful infrastructure innovations for your AI-first future".
  40. Velasco, Alan (2024-05-15). "Google Cloud Unveils Trillium, Its 6th-Gen TPU With A 4.7X AI Performance Leap". HotHardware . Retrieved 2024-05-15.
  41. "Introducing Trillium, sixth-generation TPUs". Google Cloud Blog. Retrieved 2024-05-17.
  42. "Cloud TPU". Google Cloud. Retrieved 2021-05-21.
  43. "Edge TPU performance benchmarks". Coral. Retrieved 2020-01-04.
  44. "Dev Board". Coral. Retrieved 2021-05-21.
  45. "System-on-Module (SoM)". Coral. Retrieved 2021-05-21.
  46. "Bringing intelligence to the edge with Cloud IoT". Google Blog. 2018-07-25. Retrieved 2018-07-25.
  47. "Retrain an image classification model on-device". Coral. Retrieved 2019-05-03.
  48. "組込み総合技術展&IoT総合技術展「ET & IoT Technology 2019」に出展することを発表". Asus.com (in Japanese). Retrieved 2019-11-13.
  49. Shilov, Anton. "ASUS & Google Team Up for 'Tinker Board' AI-Focused Credit-Card Sized Computers". Anandtech.com. Retrieved 2019-11-13.
  50. Aufranc, Jean-Luc (2019-05-29). "ASUS Tinker Edge T & CR1S-CM-A SBC to Feature Google Coral Edge TPU & NXP i.MX 8M Processor". CNX Software - Embedded Systems News. Retrieved 2019-11-14.
  51. "New Coral products for 2020". Google Developers Blog. Retrieved 2020-01-04.
  52. "Accelerator Module". Coral. Retrieved 2020-01-04.
  53. "Introducing the Next Generation of On-Device Vision Models: MobileNetV3 and MobileNetEdgeTPU". Google AI Blog. Retrieved 2020-04-16.
  54. Gupta, Suyog; White, Marie (November 8, 2021). "Improved On-Device ML on Pixel 6, with Neural Architecture Search". Google AI Blog. Retrieved 16 December 2022.
  55. Frumusanu, Andrei (November 2, 2021). "Google's Tensor inside of Pixel 6, Pixel 6 Pro: A Look into Performance & Efficiency | Google's IP: Tensor TPU/NPU". AnandTech. Retrieved 16 December 2022.
  56. Hardesty, Larry (2011-01-03). "The surprising usefulness of sloppy arithmetic". MIT . Retrieved 2024-01-10.
  57. Bray, Hiawatha (2024-01-10). "Local inventor challenges Google in billion-dollar patent fight". Boston Globe . Boston. Archived from the original on 2024-01-10. Retrieved 2024-01-10.
  58. "SINGULAR COMPUTING LLC, Plaintiff, v. GOOGLE LLC, Defendant: Amended Complaint for Patent Infringement" (PDF). rpxcorp.com. RPX Corporation. 2020-03-20. Retrieved 2024-01-10.
  59. Wang, Xiaojun; Leeser, Miriam (2010-09-01). "VFloat: A Variable Precision Fixed- and Floating-Point Library for Reconfigurable Hardware". ACM Transactions on Reconfigurable Technology and Systems. 3 (3): 1–34. doi:10.1145/1839480.1839486 . Retrieved 2024-01-10.
  60. "Singular Computing LLC v. Google LLC". casetext.com. 2023-04-06. Retrieved 2024-01-10.
  61. Calkins, Laurel Brubaker (January 24, 2024). "Google Settles AI-Chip Suit That Had Sought Over $5 Billion". Bloomberg Law.
  62. Brittain, Blake; Raymond, Ray (January 24, 2024). "Google settles AI-related chip patent lawsuit that sought $1.67 bln". Reuters.