Performance per watt

Last updated

In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consumed. This rate is typically measured by performance on the LINPACK benchmark when trying to compare between computing systems: an example using this is the Green500 list of supercomputers. Performance per watt has been suggested to be a more sustainable measure of computing than Moore's Law. [1]

Contents

System designers building parallel computers, such as Google's hardware, pick CPUs based on their performance per watt of power, because the cost of powering the CPU outweighs the cost of the CPU itself. [2]

Spaceflight computers have hard limits on the maximum power available and also have hard requirements on minimum real-time performance. A ratio of processing speed to required electrical power is more useful than raw processing speed. [3]

Definition

The performance and power consumption metrics used depend on the definition; reasonable measures of performance are FLOPS, MIPS, or the score for any performance benchmark. Several measures of power usage may be employed, depending on the purposes of the metric; for example, a metric might only consider the electrical power delivered to a machine directly, while another might include all power necessary to run a computer, such as cooling and monitoring systems. The power measurement is often the average power used while running the benchmark, but other measures of power usage may be employed (e.g. peak power, idle power).

For example, the early UNIVAC I computer performed approximately 0.015 operations per watt-second (performing 1,905 operations per second (OPS), while consuming 125 kW). The Fujitsu FR-V VLIW/vector processor system on a chip in the 4 FR550 core variant released 2005 performs 51 Giga-OPS with 3 watts of power consumption resulting in 17 billion operations per watt-second. [4] [5] This is an improvement by over a trillion times in 54 years.

Most of the power a computer uses is converted into heat, so a system that takes fewer watts to do a job will require less cooling to maintain a given operating temperature. Reduced cooling demands makes it easier to quiet a computer. Lower energy consumption can also make it less costly to run, and reduce the environmental impact of powering the computer (see green computing). If installed where there is limited climate control, a lower power computer will operate at a lower temperature, which may make it more reliable. In a climate controlled environment, reductions in direct power use may also create savings in climate control energy.

Computing energy consumption is sometimes also measured by reporting the energy required to run a particular benchmark, for instance EEMBC EnergyBench. Energy consumption figures for a standard workload may make it easier to judge the effect of an improvement in energy efficiency.

When performance is defined as operations/ second , then performance per watt can be written as operations/ watt-second . Since a watt is one joule /second, then performance per watt can also be written as operations/joule.

FLOPS per watt

Exponential growth of supercomputer performance per watt based on data from the Green500 list. The red crosses denote the most power efficient computer, while the blue ones denote the computer ranked#500. Green500 evolution.svg
Exponential growth of supercomputer performance per watt based on data from the Green500 list. The red crosses denote the most power efficient computer, while the blue ones denote the computer ranked#500.

FLOPS per watt is a common measure. Like the FLOPS (Floating Point Operations Per Second) metric it is based on, the metric is usually applied to scientific computing and simulations involving many floating point calculations.

Examples

As of June 2016, the Green500 list rates the two most efficient supercomputers highest  those are both based on the same manycore accelerator PEZY-SCnp Japanese technology in addition to Intel Xeon processors  both at RIKEN, the top one at 6673.8 MFLOPS/watt; and the third ranked is the Chinese-technology Sunway TaihuLight (a much bigger machine, that is the ranked 2nd on TOP500, the others are not on that list) at 6051.3 MFLOPS/watt. [6]

In June 2012, the Green500 list rated BlueGene/Q, Power BQC 16C as the most efficient supercomputer on the TOP500 in terms of FLOPS per watt, running at 2,100.88 MFLOPS/watt. [7]

In November 2010, IBM machine, Blue Gene/Q achieves 1,684 MFLOPS/watt. [8] [9]

On 9 June 2008, CNN reported that IBM's Roadrunner supercomputer achieves 376 MFLOPS/watt. [10] [11]

As part of the Intel Tera-Scale research project, the team produced an 80-core CPU that can achieve over 16,000 MFLOPS/watt. [12] [13] The future of that CPU is not certain.

Microwulf, a low cost desktop Beowulf cluster of four dual-core Athlon 64 X2 3800+ computers, runs at 58 MFLOPS/watt. [14]

Kalray has developed a 256-core VLIW CPU that achieves 25,000 MFLOPS/watt. Next generation is expected to achieve 75,000 MFLOPS/watt. [15] However, in 2019 their latest chip for embedded is 80-core and claims up to 4 TFLOPS at 20 W. [16]

Adapteva announced the Epiphany V, a 1024-core 64-bit RISC processor intended to achieve 75 GFLOPS/watt, [17] [18] while they later announced that the Epiphany V was "unlikely" to become available as a commercial product

US Patent 10,020,436, July 2018 claims three intervals of 100, 300, and 600 GFLOPS/watt.

GPU efficiency

Graphics processing units (GPU) have continued to increase in energy usage, while CPUs designers have recently[ when? ] focused on improving performance per watt. High performance GPUs may draw large amount of power, therefore intelligent techniques are required to manage GPU power consumption. Measures like 3DMark2006 score per watt can help identify more efficient GPUs. [19] However that may not adequately incorporate efficiency in typical use, where much time is spent doing less demanding tasks. [20]

With modern GPUs, energy usage is an important constraint on the maximum computational capabilities that can be achieved. GPU designs are usually highly scalable, allowing the manufacturer to put multiple chips on the same video card, or to use multiple video cards that work in parallel. Peak performance of any system is essentially limited by the amount of power it can draw and the amount of heat it can dissipate. Consequently, performance per watt of a GPU design translates directly into peak performance of a system that uses that design.

Since GPUs may also be used for some general purpose computation, sometimes their performance is measured in terms also applied to CPUs, such as FLOPS per watt.

Challenges

While performance per watt is useful, absolute power requirements are also important. Claims of improved performance per watt may be used to mask increasing power demands. For instance, though newer generation GPU architectures may provide better performance per watt, continued performance increases can negate the gains in efficiency, and the GPUs continue to consume large amounts of power. [22]

Benchmarks that measure power under heavy load may not adequately reflect typical efficiency. For instance, 3DMark stresses the 3D performance of a GPU, but many computers spend most of their time doing less intense display tasks (idle, 2D tasks, displaying video). So the 2D or idle efficiency of the graphics system may be at least as significant for overall energy efficiency. Likewise, systems that spend much of their time in standby or soft off are not adequately characterized by just efficiency under load. To help address this some benchmarks, like SPECpower, include measurements at a series of load levels. [23]

The efficiency of some electrical components, such as voltage regulators, decreases with increasing temperature, so the power used may increase with temperature. Power supplies, motherboards, and some video cards are some of the subsystems affected by this. So their power draw may depend on temperature, and the temperature or temperature dependence should be noted when measuring. [24] [25]

Performance per watt also typically does not include full life-cycle costs. Since computer manufacturing is energy intensive, and computers often have a relatively short lifespan, energy and materials involved in production, distribution, disposal and recycling often make up significant portions of their cost, energy use, and environmental impact. [26] [27]

Energy required for climate control of the computer's surroundings is often not counted in the wattage calculation, but it can be significant. [28]

Other energy efficiency measures

SWaP (space, wattage and performance) is a Sun Microsystems metric for data centers, incorporating power and space:

Where performance is measured by any appropriate benchmark, and space is size of the computer. [29]

Reduction of power, mass, and volume is also important for spaceflight computers. [3]

See also

Energy efficiency benchmarks
Other

Notes and references

  1. Aitken, Rob; Fellow; Technology, Director of; Arm (12 July 2021). "Performance per Watt Is the New Moore's Law". Arm Blueprint. Retrieved 16 July 2021.
  2. Power could cost more than servers, Google warns, CNET, 2006
  3. 1 2 D. J. Shirley; and M. K. McLelland. "The Next-Generation SC-7 RISC Spaceflight Computer". p. 1, 2.
  4. "Fujitsu Develops Multi-core Processor for High-Performance Digital Consumer Products" (Press release). Fujitsu. 7 February 2020. Archived from the original on 25 March 2019. Retrieved 8 August 2020.
  5. FR-V Single-Chip Multicore Processor:FR1000 Archived 2015-04-02 at the Wayback Machine Fujitsu
  6. "Green500 List for June 2016".
  7. "The Green500 List". Green500. Archived from the original on 3 July 2012.
  8. "Top500 Supercomputing List Reveals Computing Trends". 20 July 2010. IBM... BlueGene/Q system .. setting a record in power efficiency with a value of 1,680 Mflops/watt, more than twice that of the next best system.
  9. "IBM Research A Clear Winner in Green 500". 18 November 2010.
  10. "Government unveils world's fastest computer". CNN. Archived from the original on 10 June 2008. performing 376 million calculations for every watt of electricity used.
  11. "IBM Roadrunner Takes the Gold in the Petaflop Race". Archived from the original on 13 June 2008.
  12. "Intel squeezes 1.8 TFlops out of one processor". TG Daily. Archived from the original on 3 December 2007.
  13. "Teraflops Research Chip". Intel Technology and Research.
  14. Joel Adams. "Microwulf: Power Efficiency". Microwulf: A Personal, Portable Beowulf Cluster.
  15. "MPPA MANYCORE - Many-core processors - KALRAY - Agile Performance".
  16. "Kalray announces the Tape-Out of Coolidge on TSMC 16NM process technology". Kalray. 31 July 2019. Retrieved 12 August 2019.
  17. Olofsson, Andreas. "Epiphany-V: A 1024-core 64-bit RISC processor" . Retrieved 6 October 2016.
  18. Olofsson, Andreas. "Epiphany-V: A 1024 processor 64-bit RISC System-On-Chip" (PDF). Retrieved 6 October 2016.
  19. Atwood, Jeff (18 August 2006). "Video Card Power Consumption". Archived from the original on 8 September 2008. Retrieved 26 March 2008.
  20. "Video card power consumption". Xbit Labs. Archived from the original on 4 September 2011.
  21. "PSA: Performance Doesn't Scale Linearly with Wattage (Aka testing M1 versus a Zen 3 5600X at the same Power Draw)". 29 November 2020.
  22. Tim Smalley. "Performance per What?". Bit Tech. Retrieved 21 April 2008.
  23. "SPEC launches standardized energy efficiency benchmark". ZDNet. Archived from the original on 16 December 2007.
  24. Mike Chin. "Asus EN9600GT Silent Edition Graphics Card". Silent PC Review. p. 5. Retrieved 21 April 2008.
  25. Mike Chin (19 March 2008). "80 Plus expands podium for Bronze, Silver & Gold". Silent PC Review. Retrieved 21 April 2008.
  26. Mike Chin. "Life Cycle Analysis and Eco PC Review". Eco PC Review. Archived from the original on 4 March 2008.
  27. Eric Williams (2004). "Energy intensity of computer manufacturing: hybrid assessment combining process and economic input-output methods". Environ. Sci. Technol. 38 (22): 6166–74. Bibcode:2004EnST...38.6166W. doi:10.1021/es035152j. PMID   15573621.
  28. Wu-chun Feng (2005). "The Importance of Being Low Power in High Performance Computing". CT Watch Quarterly. 1 (5).
  29. Greenhill, David. "SWaP Space Watts and Power" (PDF). US EPA Energystar. Retrieved 14 November 2013.

Further reading

Related Research Articles

Processor design is a subfield of computer science and computer engineering (fabrication) that deals with creating a processor, a key component of computer hardware.

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

Very long instruction word (VLIW) refers to instruction set architectures that are designed to exploit instruction-level parallelism (ILP). A VLIW processor allows programs to explicitly specify instructions to execute in parallel, whereas conventional central processing units (CPUs) mostly allow programs to specify instructions to execute in sequence only. VLIW is intended to allow higher performance without the complexity inherent in some other designs.

Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.

<span class="mw-page-title-main">IBM Blue Gene</span> Series of supercomputers by IBM

Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption.

Underclocking, also known as downclocking, is modifying a computer or electronic circuit's timing settings to run at a lower clock rate than is specified. Underclocking is used to reduce a computer's power consumption, increase battery life, reduce heat emission, and it may also increase the system's stability, lifespan/reliability and compatibility. Underclocking may be implemented by the factory, but many computers and components may be underclocked by the end user. Underclocking is the opposite of overclocking.

The Whetstone benchmark is a synthetic benchmark for evaluating the performance of computers. It was first written in ALGOL 60 in 1972 at the Technical Support Unit of the Department of Trade and Industry in the United Kingdom. It was derived from statistics on program behaviour gathered on the KDF9 computer at NPL National Physical Laboratory, using a modified version of its Whetstone ALGOL 60 compiler. The workload on the machine was represented as a set of frequencies of execution of the 124 instructions of the Whetstone Code. The Whetstone Compiler was built at the Atomic Power Division of the English Electric Company in Whetstone, Leicestershire, England, hence its name. Dr. B.A. Wichman at NPL produced a set of 42 simple ALGOL 60 statements, which in a suitable combination matched the execution statistics.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">FR-V (microprocessor)</span>

The Fujitsu FR-V is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while increasing performance per watt and hardware efficiency. The family was presented in 1999. Its design was influenced by the VPP500/5000 models of the Fujitsu VP/2000 vector processor supercomputer line.

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.

Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.

<span class="mw-page-title-main">Computer architecture</span> Set of rules describing computer system

In computer science and computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the description may include the instruction set architecture design, microarchitecture design, logic design, and implementation.

Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single integrated circuit.

<span class="mw-page-title-main">Supercomputing in Japan</span> Overview of supercomputing in Japan

Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer being the world's fastest from June 2011 to June 2012, and Fugaku holding the lead from June 2020 until June 2022.

The DEGIMA is a high performance computer cluster used for hierarchical N-body simulations at the Nagasaki Advanced Computing Center, Nagasaki University.

<span class="mw-page-title-main">Titan (supercomputer)</span> American supercomputer

Titan or OLCF-3 was a supercomputer built by Cray at Oak Ridge National Laboratory for use in a variety of science projects. Titan was an upgrade of Jaguar, a previous supercomputer at Oak Ridge, that uses graphics processing units (GPUs) in addition to conventional central processing units (CPUs). Titan was the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it became available to researchers in early 2013. The initial cost of the upgrade was US$60 million, funded primarily by the United States Department of Energy.

XK7 is a supercomputing platform, produced by Cray, launched on October 29, 2012. XK7 is the second platform from Cray to use a combination of central processing units ("CPUs") and graphical processing units ("GPUs") for computing; the hybrid architecture requires a different approach to programming to that of CPU-only supercomputers. Laboratories that host XK7 machines host workshops to train researchers in the new programming languages needed for XK7 machines. The platform is used in Titan, the world's second fastest supercomputer in the November 2013 list as ranked by the TOP500 organization. Other customers include the Swiss National Supercomputing Centre which has a 272 node machine and Blue Waters has a machine that has Cray XE6 and XK7 nodes that performs at approximately 1 petaFLOPS (1015 floating-point operations per second).

<span class="mw-page-title-main">QPACE2</span> Massively parallel and scalable supercomputer

QPACE 2 is a massively parallel and scalable supercomputer. It was designed for applications in lattice quantum chromodynamics but is also suitable for a wider range of applications..

<span class="mw-page-title-main">Summit (supercomputer)</span> Supercomputer developed by IBM

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. As of June 2024, it is the 9th fastest supercomputer in the world on the TOP500 list. It held the number 1 position on this list from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.