In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consumed. This rate is typically measured by performance on the LINPACK benchmark when trying to compare between computing systems.
System designers building parallel computers, such as Google's hardware, pick CPUs based on their (other than Green500) performance per watt of power, because the cost of powering the CPU outweighs the cost of the CPU itself.
The performance and power consumption metrics used depend on the definition; reasonable measures of performance are FLOPS, MIPS, or the score for any performance benchmark. Several measures of power usage may be employed, depending on the purposes of the metric; for example, a metric might only consider the electrical power delivered to a machine directly, while another might include all power necessary to run a computer, such as cooling and monitoring systems. The power measurement is often the average power used while running the benchmark, but other measures of power usage may be employed (e.g. peak power, idle power).
For example, the early UNIVAC I computer performed approximately 0.015 operations per watt-second (performing 1,905 operations per second (OPS), while consuming 125 kW). The Fujitsu FR-V VLIW/vector processor system on a chip in the 4 FR550 core variant released 2005 performs 51 Giga-OPS with 3 watts of power consumption resulting in 17 billion operations per watt-second. This is an improvement by over a trillion times in 54 years.
Most of the power a computer uses is converted into heat, so a system that takes fewer watts to do a job will require less cooling to maintain a given operating temperature. Reduced cooling demands makes it easier to quiet a computer. Lower energy consumption can also make it less costly to run, and reduce the environmental impact of powering the computer (see green computing). If installed where there is limited climate control, a lower power computer will operate at a lower temperature, which may make it more reliable. In a climate controlled environment, reductions in direct power use may also create savings in climate control energy.
Computing energy consumption is sometimes also measured by reporting the energy required to run a particular benchmark, for instance EEMBC EnergyBench. Energy consumption figures for a standard workload may make it easier to judge the effect of an improvement in energy efficiency.
Performance (in operations/second) per watt can also be written as operations/watt-second, or operations/joule, since 1 watt = 1 joule/second.
FLOPS per watt is a common measure. Like the FLOPS (Floating Point Operations Per Second) metric it is based on, the metric is usually applied to scientific computing and simulations involving many floating point calculations.
As of June 2016 [update] , the Green500 list rates the two most efficient supercomputers highest – those are both based on the same manycore accelerator PEZY-SCnp Japanese technology in addition to Intel Xeon processors – both at RIKEN, the top one at 6673.8 MFLOPS/watt; and the third ranked is the Chinese-technology Sunway TaihuLight (a much bigger machine, that is the ranked 2nd on TOP500, the others are not on that list) at 6051.3 MFLOPS/watt.
In June 2012, the Green500 list rated BlueGene/Q, Power BQC 16C as the most efficient supercomputer on the TOP500 in terms of FLOPS per watt, running at 2,100.88 MFLOPS/watt.
On 9 June 2008, CNN reported that IBM's Roadrunner supercomputer achieves 376 MFLOPS/watt.
In November 2010, IBM machine, Blue Gene/Q achieves 1,684 MFLOPS/watt.
As part of Intel's Tera-Scale research project, the team produced an 80-core CPU that can achieve over 16,000 MFLOPS/watt. The future of that CPU is not certain.
Microwulf, a low cost desktop Beowulf cluster of four dual-core Athlon 64 X2 3800+ computers, runs at 58 MFLOPS/watt.
Kalray has developed a 256-core VLIW CPU that achieves 25,000 MFLOPS/watt. Next generation is expected to achieve 75,000 MFLOPS/watt. However, in 2019 their latest chip for embedded is 80-core and claims up to 4 TFLOPS at 20 W.
Adapteva announced the Epiphany V, a 1024-core 64-bit RISC processor intended to achieve 75 GFLOPS/watt, while they later announced that the Epiphany V was "unlikely" to become available as a commercial product
US Patent 10,020,436, July 2018 claims three intervals of 100, 300, and 600 GFLOPS/watt.
The Green500 list ranks computers from the TOP500 list of supercomputers in terms of energy efficiency, typically measured as LINPACK FLOPS per watt.
As of November 2012 [update] , an Appro International, Inc. Xtreme-X supercomputer ( Beacon ) topped the Green500 list with 2499 LINPACK MFLOPS/W. Beacon is deployed by NICS of the University of Tennessee and is a GreenBlade GB824M, Xeon E5-2670 based, eight cores (8C), 2.6 GHz, Infiniband FDR, Intel Xeon Phi 5110P computer.
As of June 2013 [update] , the Eurotech supercomputer Eurora at Cineca topped the Green500 list with 3208 LINPACK MFLOPS/W. The Cineca Eurora supercomputer is equipped with two Intel Xeon E5-2687W CPUs and two PCI-e connected NVIDIA Tesla K20 accelerators per node. Water cooling and electronics design allows for very high densities to be reached with a peak performance of 350 TFLOPS per rack.
As of November 2014 [update] , the L-CSC supercomputer of the Helmholtz Association at the GSI in Darmstadt Germany topped the Green500 list with 5271 MFLOPS/W and was the first cluster to surpass an efficiency of 5 GFLOPS/W. It runs on Intel Xeon E5-2690 Processors with the Intel Ivy Bridge Architecture and AMD FirePro S9150 GPU Accelerators. It uses in rack watercooling and Cooling Towers to reduce the energy required for cooling.
As of August 2015 [update] , the Shoubu supercomputer of RIKEN outside Tokyo Japan topped the Green500 list with 7032 MFLOPS/W. The then-top three supercomputers of the list used PEZY-SC accelerators (GPU-like that use OpenCL) by PEZY Computing with 1024 cores each and 6–7 GFLOPS/W efficiency.
As of June 2019 [update] , DGX SaturnV Volta, using "NVIDIA DGX-1 Volta36, Xeon E5-2698v4 20C 2.2GHz, Infiniband EDR, NVIDIA Tesla V100", tops Green500 list with 15,113 MFLOPS/W, while ranked only 469th on Top500. It's only a little bit more efficient than the much bigger Summit ranked 2nd while 1st on Top500 with 14,719 MFLOPS/W, using IBM POWER9 CPUs while also with Nvidia Tesla V100 GPUs.
|1||16.876||A64FX prototype||Fujitsu A64FX|
Fujitsu A64FX 48C 2GHz, Tofu interconnect D
|Fujitsu|| Numazu ||1.999|
Xeon D-1571 16C 1.3GHz, Infiniband EDR, PEZY-SC2 700Mhz
|PEZY Computing K.K.||JAMSTEC Yokohama Institute for Earth Sciences, Yokohama ||1.303|
|3||15.771||AiMOS||IBM Power System AC922|
IBM POWER9 20C 3.45GHz, Dual-rail Mellanox EDR Infiniband, NVIDIA Volta GV100
|IBM|| Rensselaer Polytechnic Institute, Troy,||8.045|
|4||15.574||Satori||IBM Power System AC922|
IBM POWER9 20C 2.4GHz, Infiniband EDR, NVIDIA Tesla V100 SXM2
|IBM||MIT/MGHPCC, Holyoke, Massachusetts,||1.464|
|5||14.719||Summit|| IBM Power System AC922|
IBM POWER9 22C 3.07GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband
|IBM|| Oak Ridge National Laboratory, Oak Ridge, Tennessee ||148.600|
|6||14.423||AI Bridging Cloud Infrastructure (ABCI)||Primergy CX2570 M4|
Xeon Gold, Tesla V100 SXM2,Infiniband EDR
|Fujitsu||Joint Center for Advanced High Performance Computing, Kashiwa ||19.880|
|7||14.131||MareNostrum P9 CTE||IBM Power System AC922|
IBM POWER9 22C 3.1GHz, Dual-rail Mellanox EDR Infiniband, NVIDIA Tesla V100
|IBM|| Barcelona Supercomputing Center, Barcelona,||1.145|
|8||13.704||TSUBAME3.0||SGI ICE XA|
IP139-SXM2, Xeon E5-2680v4 14C 2.4GHz, Intel Omni-Path, NVIDIA Tesla P100 SXM2
|Hewlett-Packard|| Tokyo Institute of Technology, Tokyo,||8.045|
|9||13.065||PANGEA III||IBM Power System AC922|
A III - IBM Power System AC922, IBM POWER9 18C 3.45GHz, Dual-rail Mellanox EDR Infiniband, NVIDIA Volta GV100
|IBM|| Total S.A., Pau,||17.860|
|10||12.723||Sierra||IBM Power System AC922|
- IBM Power System AC922, IBM POWER9 22C 3.1GHz, NVIDIA Volta GV100, Dual-rail Mellanox EDR Infiniband
|IBM|| Lawrence Livermore National Laboratory, Livermore,||94.640|
Graphics processing units (GPU) have continued to increase in energy usage, while CPUs designers have recently focused on improving performance per watt. High performance GPUs may draw large amount of power and hence, intelligent techniques are required to manage GPU power consumption.Measures like 3DMark2006 score per watt can help identify more efficient GPUs. However that may not adequately incorporate efficiency in typical use, where much time is spent doing less demanding tasks.
With modern GPUs, energy usage is an important constraint on the maximum computational capabilities that can be achieved. GPU designs are usually highly scalable, allowing the manufacturer to put multiple chips on the same video card, or to use multiple video cards that work in parallel. Peak performance of any system is essentially limited by the amount of power it can draw and the amount of heat it can dissipate. Consequently, performance per watt of a GPU design translates directly into peak performance of a system that uses that design.
Since GPUs may also be used for some general purpose computation, sometimes their performance is measured in terms also applied to CPUs, such as FLOPS per watt.
While performance per watt is useful, absolute power requirements are also important. Claims of improved performance per watt may be used to mask increasing power demands. For instance, though newer generation GPU architectures may provide better performance per watt, continued performance increases can negate the gains in efficiency, and the GPUs continue to consume large amounts of power.
Benchmarks that measure power under heavy load may not adequately reflect typical efficiency. For instance, 3DMark stresses the 3D performance of a GPU, but many computers spend most of their time doing less intense display tasks (idle, 2D tasks, displaying video). So the 2D or idle efficiency of the graphics system may be at least as significant for overall energy efficiency. Likewise, systems that spend much of their time in standby or soft off are not adequately characterized by just efficiency under load. To help address this some benchmarks, like SPECpower, include measurements at a series of load levels.
The efficiency of some electrical components, such as voltage regulators, decreases with increasing temperature, so the power used may increase with temperature. Power supplies, motherboards, and some video cards are some of the subsystems affected by this. So their power draw may depend on temperature, and the temperature or temperature dependence should be noted when measuring.
Performance per watt also typically does not include full life-cycle costs. Since computer manufacturing is energy intensive, and computers often have a relatively short lifespan, energy and materials involved in production, distribution, disposal and recycling often make up significant portions of their cost, energy use, and environmental impact.
Energy required for climate control of the computer's surroundings is often not counted in the wattage calculation, but it can be significant.
SWaP (space, wattage and performance) is a Sun Microsystems metric for data centers, incorporating energy and space:
Where performance is measured by any appropriate benchmark, and space is size of the computer.
performing 376 million calculations for every watt of electricity used.
IBM... BlueGene/Q system .. setting a record in power efficiency with a value of 1,680 Mflops/watt, more than twice that of the next best system.
A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over a hundred quadrillion FLOPS (petaFLOPS). Since November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems. Additional research is being conducted in China, the United States, the European Union, Taiwan and Japan to build faster, more powerful and technologically superior exascale supercomputers.
In computing, floating point operations per second is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases it is a more accurate measure than measuring instructions per second.
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and across the USA. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable computational research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.
The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.
Manycore processors are specialist multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.
Tianhe-I, Tianhe-1, or TH-1 is a supercomputer capable of an Rmax of 2.5 petaFLOPS. Located at the National Supercomputing Center of Tianjin, China, it was the fastest computer in the world from October 2010 to June 2011 and is one of the few petascale supercomputers in the world.
Eurotech is a company dedicated to the research, development, production and marketing of miniature computers (NanoPCs) and high performance computers (HPCs).
Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer becoming the world's fastest in June 2011.
Tsubame is a series of supercomputers that operates at the GSIC Center at the Tokyo Institute of Technology in Japan, designed by Satoshi Matsuoka.
Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.
The DEGIMA is a high performance computer cluster used for hierarchical N-body simulations at the Nagasaki Advanced Computing Center, Nagasaki University.
Xeon Phi is a series of x86 manycore processors designed and made by Intel. It is intended for use in supercomputers, servers, and high-end workstations. Its architecture allows use of standard programming languages and application programming interfaces (APIs) such as OpenMP.
Titan or OLCF-3 was a supercomputer built by Cray at Oak Ridge National Laboratory for use in a variety of science projects. Titan was an upgrade of Jaguar, a previous supercomputer at Oak Ridge, that uses graphics processing units (GPUs) in addition to conventional central processing units (CPUs). Titan was the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it became available to researchers in early 2013. The initial cost of the upgrade was US$60 million, funded primarily by the United States Department of Energy.
Appro was a developer of supercomputing supporting High Performance Computing (HPC) markets focused on medium- to large-scale deployments. Appro was based in Milpitas, California with a computing center in Houston, Texas, and a manufacturing and support subsidiary in South Korea and Japan.
Tianhe-2 or TH-2 is a 33.86-petaflops supercomputer located in the National Supercomputer Center in Guangzhou, China. It was developed by a team of 1,300 scientists and engineers.
XK7 is a supercomputing platform, produced by Cray, launched on October 29, 2012. XK7 is the second platform from Cray to use a combination of central processing units ("CPUs") and graphical processing units ("GPUs") for computing; the hybrid architecture requires a different approach to programming to that of CPU-only supercomputers. Laboratories that host XK7 machines host workshops to train researchers in the new programming languages needed for XK7 machines. The platform is used in Titan, the world's second fastest supercomputer in the November 2013 list as ranked by the TOP500 organization. Other customers include the Swiss National Supercomputing Centre which has a 272 node machine and Blue Waters has a machine that has Cray XE6 and XK7 nodes that performs at approximately 1 petaFLOPS (1015 floating-point operations per second).
QPACE 2 is a massively parallel and scalable supercomputer. It was designed for applications in lattice quantum chromodynamics but is also suitable for a wider range of applications..
Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge National Laboratory, which as of November 2019 is the fastest supercomputer in the world, capable of 200 petaFLOPS. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS. As of November 2019, the supercomputer is also the 3rd most energy efficient in the world with a measured power efficiency of 14.668 gigaFLOPS/watt. Summit is the first supercomputer to reach exaop speed, achieving 1.88 exaops during a genomic analysis and is expected to reach 3.3 exaops using mixed precision calculations.
Galileo is a 1.1 petaFLOPS supercomputer located at CINECA in Bologna, Italy.
Gyoukou is a supercomputer developed by ExaScaler and PEZY Computing.