GP5 chip

Last updated
The GP5 chip GP5 chip next to pair of dice.png
The GP5 chip

The GP5 is a co-processor accelerator built to accelerate discrete belief propagation on factor graphs and other large-scale tensor product operations for machine learning. It is related to, and anticipated by a number of years, the Google Tensor Processing Unit

It is designed to run as a co-processor with another controller (such as a CPU (x86) or an ARM/MIPS/Tensilica core). It was developed as the culmination of DARPA's Analog Logic program [1]

The GP5 has a fairly exotic architecture, resembling neither a GPU nor a DSP, and leverages massive fine-grained and coarse-grained parallelism. It is deeply pipelined. The different algorithmic tasks involved in performing belief propagation updates are performed by independent, heterogeneous compute units. The performance of the chip is governed by the structure of the machine learning workload being evaluated. In typical cases, the GP5 is roughly 100 times faster and 100 times more energy efficient than a single core of a modern core i7 performing a comparable task. It is roughly 10 times faster and 1000 times more energy efficient than a state-of-the art GPU. It is roughly 1000 times faster and 10 times more energy efficient than a state-of-the-art ARM processor. It was benchmarked on typical machine learning and inference workloads that included protein side-chain folding, turbo error correction decoding, stereo vision, signal noise reduction, and others.

Analog Devices, Inc. acquired the intellectual property for the GP5 when it acquired Lyric Semiconductor, Inc. in 2011.

Related Research Articles

<span class="mw-page-title-main">Field-programmable gate array</span> Array of logic gates that are reprogrammable

A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed post manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices ("PLDs"). They consist of an array of programmable logic blocks with a connecting grid, that can be configured "in the field" to interconnect with other logic blocks to perform various digital functions. FPGAs are often used in limited (low) quantity production of custom-made products, and in research and development, where the higher cost of individual FPGAs is not as important, and where creating and manufacturing a custom circuit wouldn't be feasible. Other applications for FPGAs include the telecommunications, automotive, aerospace, and industrial sectors, which benefit from their flexibility, high signal processing speed, and parallel processing abilities.

<span class="mw-page-title-main">Microcontroller</span> Small computer on a single integrated circuit

A microcontroller or microcontroller unit (MCU) is a small computer on a single integrated circuit. A microcontroller contains one or more CPUs along with memory and programmable input/output peripherals. Program memory in the form of NOR flash, OTP ROM or ferroelectric RAM is also often included on chip, as well as a small amount of RAM. Microcontrollers are designed for embedded applications, in contrast to the microprocessors used in personal computers or other general purpose applications consisting of various discrete chips.

<span class="mw-page-title-main">System on a chip</span> Micro-electronic component

A system on a chip or system-on-chip is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include on-chip central processing unit (CPU), memory interfaces, input/output devices and interfaces, and secondary storage interfaces, often alongside other components such as radio modems and a graphics processing unit (GPU) – all on a single substrate or microchip. SoCs may contain digital and also analog, mixed-signal and often radio frequency signal processing functions.

<span class="mw-page-title-main">Semiconductor Research Corporation</span> American technology research consortium

Semiconductor Research Corporation (SRC), commonly known as SRC, is a high-technology research consortium active in the semiconductor industry. It is a leading semiconductor research consortium. Todd Younkin is the incumbent president and chief executive officer of the company.

In electronic design, a semiconductor intellectual property core, IP core or IP block is a reusable unit of logic, cell, or integrated circuit layout design that is the intellectual property of one party. IP cores can be licensed to another party or owned and used by a single party. The term comes from the licensing of the patent or source code copyright that exists in the design. Designers of system on chip (SoC), application-specific integrated circuits (ASIC) and systems of field-programmable gate array (FPGA) logic can use IP cores as building blocks.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

{{For|hardware accelerators

<span class="mw-page-title-main">Cypress PSoC</span> Type of integrated circuit

PSoC is a family of microcontroller integrated circuits by Cypress Semiconductor. These chips include a CPU core and mixed-signal arrays of configurable integrated analog and digital peripherals.

The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a chip, transistor count does not represent how advanced the corresponding manufacturing technology is: a better indication of this is transistor density.

Intrinsity was a privately held Austin, Texas based fabless semiconductor company; it was founded in 1997 as EVSX on the remnants of Exponential Technology and changed its name to Intrinsity in 2000. It had around 100 employees and supplied tools and services for highly efficient semiconductor logic design, enabling high performance microprocessors with fewer transistors and low power consumption. The acquisition of the firm by Apple Inc. was confirmed on April 27, 2010.

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">Arm Holdings</span> British multinational semiconductor and software design company

Arm Holdings plc is a British semiconductor and software design company based in Cambridge, England, whose primary business is the design of central processing unit (CPU) cores that implement the ARM architecture family of instruction sets. It also designs other chips, provides software development tools under the DS-5, RealView and Keil brands, and provides systems and platforms, system-on-a-chip (SoC) infrastructure and software. As a "holding" company, it also holds shares of other companies. Since 2016, it has been majority owned by Japanese conglomerate SoftBank Group.

Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.

<span class="mw-page-title-main">Exynos</span> Family of ARM based system-on-a-chips made by Samsung

The Samsung Exynos, formerly Hummingbird (Korean: 엑시노스), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs.

<span class="mw-page-title-main">Mali (processor)</span> Series of graphics processing units produced by ARM Holdings

The Mali and Immortalis series of graphics processing units (GPUs) and multimedia processors are semiconductor intellectual property cores produced by Arm Holdings for licensing in various ASIC designs by Arm partners.

In computing and computer science, a processor or processing unit is an electrical component that performs operations on an external data source, usually memory or some other data stream. It typically takes the form of a microprocessor, which can be implemented on a single or a few tightly integrated metal–oxide–semiconductor integrated circuit chips. In the past, processors were constructed using multiple individual vacuum tubes, multiple individual transistors, or multiple integrated circuits.

Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

<span class="mw-page-title-main">Tensor Processing Unit</span> AI accelerator ASIC by Google

Tensor Processing Unit (TPU) is an AI accelerator application-specific integrated circuit (ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.

The Pixel Visual Core (PVC) is a series of ARM-based system in package (SiP) image processors designed by Google. The PVC is a fully programmable image, vision and AI multi-core domain-specific architecture (DSA) for mobile devices and in future for IoT. It first appeared in the Google Pixel 2 and 2 XL which were introduced on October 19, 2017. It has also appeared in the Google Pixel 3 and 3 XL. Starting with the Pixel 4, this chip was replaced with the Pixel Neural Core.

References

  1. DARPA FA8750-07-C-0231