Huang's law is the observation in computer science and engineering that advancements in graphics processing units (GPUs) are growing at a rate much faster than with traditional central processing units (CPUs). The observation is in contrast to Moore's law that predicted the number of transistors in a dense integrated circuit (IC) doubles about every two years. [1] Huang's law states that the performance of GPUs will more than double every two years. [2] The hypothesis is subject to questions about its validity.
The observation was made by Jensen Huang, the chief executive officer of Nvidia, at its 2018 GPU Technology Conference (GTC) held in San Jose, California. [3] He observed that Nvidia's GPUs were "25 times faster than five years ago" whereas Moore's law would have expected only a ten-fold increase. [2] As microchip components become smaller, it became harder for chip advancement to meet the speed of Moore's law. [4]
In 2006, Nvidia's GPU had a 4x performance advantage over other CPUs. In 2018 the Nvidia GPU was 20 times faster than a comparable CPU node: the GPUs were 1.7x faster each year. Moore's law would predict a doubling every two years, however Nvidia's GPU performance was more than tripled every two years, fulfilling Huang's law. [5]
Huang's law claims that a synergy between hardware, software, and artificial intelligence makes the new 'law' possible. [A] Huang said, "The innovation isn't just about chips," he said, "It's about the entire stack." He said that graphics processors especially are important to a new paradigm. [3] Elimination of bottlenecks can speed up the process and create advantages in getting to the goal. "Nvidia is a one trick pony," Huang has said. [7] According to Huang: "Accelerated computing is liberating, … Let’s say you have an airplane that has to deliver a package. It takes 12 hours to deliver it. Instead of making the plane go faster, concentrate on how to deliver the package faster, look at 3D printing at the destination." The object "… is to deliver the goal faster." [7]
For artificial intelligence tasks, Huang said that training the convolutional network AlexNet took six days on two of Nvidia's GTX 580 processors to complete the training process but only 18 minutes on a modern DGX-2 AI server, resulting in a speed-up factor of 500. Compared to Moore's law, which focuses purely on CPU transistors, Huang's law describes a combination of advances in architecture, interconnects, memory technology, and algorithms. [2] [6]
Bharath Ramsundar wrote that deep learning is being coupled with "[i]mprovements in custom architecture". For example, machine learning systems have been implemented in the blockchain world, where Bitmain assaulted "many cryptocurrencies by designing custom mining ASICs (application-specific integrated circuits)" which had been envisioned as undoable. "Nvidia's grand achievement however is in making the case that these improvement in architectures are not merely isolated victories for specific applications but perhaps broadly applicable to all of computer science." They have suggested that broad harnessing of GPUs and the GPU stack (cf., CPU stack) can deliver "dramatic growth in deep learning architecture." "The magic" of Huang's law promise is that as nascent deep learning powered software becomes more availed, the improvements from GPU scaling and more generally from architectural improvements" will concretely improve "performance and behavior of modern software stacks." [8]
There has been criticism. Journalist Joel Hruska writing in ExtremeTech in 2020 said "there is no such thing as Huang's Law", calling it an "illusion" that rests on the gains made possible by Moore's law; and that it is too soon to determine a law exists. [9] The research nonprofit Epoch has found that, between 2006 and 2021, GPU price performance (in terms of FLOPS/$) has tended to double approximately every 2.5 years, much slower than predicted by Huang's law. [10]
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that designs and develops central processing units (CPUs), graphics processing units (GPUs), field-programmable gate arrays (FPGAs), system-on-chip (SoC), and high-performance compute solutions. AMD serves a wide range of business and consumer markets, including gaming, data centers, artificial intelligence (AI), and embedded systems.
Nvidia Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang, Chris Malachowsky, and Curtis Priem, it is a software and fabless company which designs and supplies graphics processing units (GPUs), application programming interfaces (APIs) for data science and high-performance computing, as well as system on a chip units (SoCs) for mobile computing and the automotive market. Nvidia is also a dominant supplier of artificial intelligence (AI) hardware and software.
GeForce is a brand of graphics processing units (GPUs) designed by Nvidia and marketed for the performance market. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, intended for the high-margin PC gaming market, and later diversification of the product line covered all tiers of the PC graphics market, ranging from cost-sensitive GPUs integrated on motherboards to mainstream add-in retail boards. Most recently, GeForce technology has been introduced into Nvidia's line of embedded application processors, designed for electronic handhelds and mobile handsets.
A coprocessor is a computer processor used to supplement the functions of the primary processor. Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices. By offloading processor-intensive tasks from the main processor, coprocessors can accelerate system performance. Coprocessors allow a line of computers to be customized, so that customers who do not need the extra performance do not need to pay for it.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
In computing, a northbridge is a microchip that comprises the core logic chipset architecture on motherboards to handle high-performance tasks, especially for older personal computers. It is connected directly to a CPU via the front-side bus (FSB), and is usually used in conjunction with a slower southbridge to manage communication between the CPU and other parts of the motherboard.
The transistor count is the number of transistors in an electronic device. It is the most common measure of integrated circuit complexity. The rate at which MOS transistor counts have increased generally follows Moore's law, which observes that transistor count doubles approximately every two years. However, being directly proportional to the area of a die, transistor count does not represent how advanced the corresponding manufacturing technology is. A better indication of this is transistor density which is the ratio of a semiconductor's transistor count to its die area.
In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs. CUDA was created by Nvidia in 2006. When it was first introduced, the name was an acronym for Compute Unified Device Architecture, but Nvidia later dropped the common use of the acronym and now rarely expands it.
Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in the state of Washington. The chip was to be released in 2010 as the core of a consumer 3D graphics card, but these plans were cancelled due to delays and disappointing early performance figures. The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010 and its technology was passed on to the Xeon Phi. The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing.
Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.
This list compares various amounts of computing power in instructions per second organized by order of magnitude in FLOPS.
Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory". Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores into one die constituting a system on a chip (SoC).
Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.
High Bandwidth Memory (HBM) is a computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators, network devices, high-performance datacenter AI ASICs, as on-package cache in CPUs and on-package RAM in upcoming CPUs, and FPGAs and in some supercomputers. The first HBM memory chip was produced by SK Hynix in 2013, and the first devices to use HBM were the AMD Fiji GPUs in 2015.
An AI accelerator, deep learning processor or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and computer vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFETs.
HEAVY.AI is an American-based software company, that uses graphics processing units (GPUs) and central processing units (CPUs) to query and visualize big data. The company was founded in 2013 by Todd Mostak and Thomas Graham and is headquartered in San Francisco, California.
Intel Xe, earlier known unofficially as Gen12, is a GPU architecture developed by Intel.
Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.
Nvidia GTC is a global artificial intelligence (AI) conference for developers that brings together developers, engineers, researchers, inventors, and IT professionals. Topics focus on AI, computer graphics, data science, machine learning and autonomous machines. Each conference begins with a keynote from Nvidia CEO and founder Jensen Huang, followed by a variety of sessions and talks with experts from around the world.
Graphics processors are on a supercharged development path that eclipses Moore's Law. … GPUs are also advancing more quickly than CPUs because they rely upon a parallel architecture, Jesse Clayton, an Nvidia senior manager, pointed out in another session."
... there are two dynamics controlling the computing industry today – the end of Moore's law and software that can write itself, artificial intelligence, or AI. ... We can study where bottlenecks are. New software systems make the application go faster, not just the chip.