POWER7

Last updated

POWER7
Power7 4ghz 9119 8way chipTop sonic84 IMG 1422.jpg
IBM Power7 4 GHz 8-way CPU (de-lidded) from an IBM 9119
General information
Launched2010
Designed by IBM
Performance
Max. CPU clock rate 2.4 GHz to 4.25 GHz
Cache
L1 cache 32+32 KB/core
L2 cache256 KB/core
L3 cache4 MB/core
Architecture and classification
Technology node 45 nm
Instruction set Power ISA (Power ISA v.2.06)
Physical specifications
Cores
  • 4, 6, 8
History
Predecessor(s) POWER6
Successor(s) POWER8

POWER7 is a family of superscalar multi-core microprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6 and POWER6+. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, VT; T. J. Watson Research Center, NY; Bromont, QC [1] and IBM Deutschland Research & Development GmbH, Böblingen, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010. [2] [3]

Contents

IBM Power7 4 GHz 8-way CPU and IHS from an IBM 9119 Power7 4ghz 9119 8way IHSBottom ChipTop sonic84 IMG 1418.jpg
IBM Power7 4 GHz 8-way CPU and IHS from an IBM 9119
IBM Power7 4 GHz 8-way CPU IHS top from an IBM 9119 Power7 4ghz 9119 8way IHStop sonic84 IMG 1417.jpg
IBM Power7 4 GHz 8-way CPU IHS top from an IBM 9119
IBM Power7 4 GHz 8-way CPU bottom from an IBM 9119 Power7 4ghz 9119 8way underside interposerRemoved sonic84 IMG 1415.jpg
IBM Power7 4 GHz 8-way CPU bottom from an IBM 9119
IBM Power7 4 GHz 8-way CPU removable interposer from an IBM 9119 Power7 4ghz 9119 8way underside withInterposer sonic84 IMG 1416.jpg
IBM Power7 4 GHz 8-way CPU removable interposer from an IBM 9119

History

IBM won a $244 million DARPA contract in November 2006 to develop a petascale supercomputer architecture before the end of 2010 in the HPCS project. The contract also states that the architecture shall be available commercially. IBM's proposal, PERCS (Productive, Easy-to-use, Reliable Computer System), which won them the contract, is based on the POWER7 processor, AIX operating system and General Parallel File System. [4]

One feature that IBM and DARPA collaborated on is modifying the addressing and page table hardware to support global shared memory space for POWER7 clusters. This enables research scientists to program a cluster as if it were a single system, without using message passing. From a productivity standpoint, this is essential since some scientists are not conversant with MPI or other parallel programming techniques used in clusters. [5]

Design

The POWER7 superscalar multi-core architecture was a substantial evolution from the POWER6 design, focusing more on power efficiency through multiple cores and simultaneous multithreading (SMT). [6] The POWER6 architecture was built from the ground up to maximize processor frequency at the cost of power efficiency. It achieved a remarkable 5 GHz. While the POWER6 features a dual-core processor, each capable of two-way simultaneous multithreading (SMT), the IBM POWER 7 processor has up to eight cores, and four threads per core, for a total capacity of 32 simultaneous threads. [7]

IBM stated at ISCA 29 [8] that peak performance was achieved by high frequency designs with 10–20 FO4 delays per pipeline stage at the cost of power efficiency. However, the POWER6 binary floating-point unit achieves a "6-cycle, 13-FO4 pipeline". [9] [ clarification needed ] Therefore, the pipeline for the POWER7 CPU has been changed again, just as it was for the POWER5 and POWER6 designs. In some respects, this rework is similar to Intel's turn in 2005 that left the P4 7th-generation x86 microarchitecture.

Specifications

The POWER7 is available with 4, 6, or 8 physical cores per microchip, in a 1 to 32-way design, with up to 1024 SMTs and a slightly different microarchitecture and interfaces for supporting extended/Sub-Specifications in reference to the Power ISA and/or different system architectures. For example, in the Supercomputing (HPC) System Power 775 it is packaged as a 32-way quad-chip-module (QCM) with 256 physical cores and 1024 SMTs. [10] There is also a special TurboCore mode that can turn off half of the cores from an eight-core processor, but those 4 cores have access to all the memory controllers and L3 cache at increased clock speeds. This makes each core's performance higher which is important for workloads which require the fastest sequential performance at the cost of reduced parallel performance. TurboCore mode can reduce "software costs in half for those applications that are licensed per core, while increasing per core performance from that software." [11] The new IBM Power 780 scalable, high-end servers featuring the new TurboCore workload optimizing mode and delivering up to double performance per core of POWER6 based systems. [11]

Each core is capable of four-way simultaneous multithreading (SMT). The POWER7 has approximately 1.2 billion transistors and is 567 mm2 large fabricated on a 45 nm process. A notable difference from POWER6 is that the POWER7 executes instructions out-of-order instead of in-order. Despite the decrease in maximum frequency compared to POWER6 (4.25 GHz vs 5.0 GHz), each core has higher performance than the POWER6, while each processor has up to 4 times the number of cores.

POWER7 has these specifications: [12] [13]

The technical specification further specifies: [15]

Each POWER7 processor core implements aggressive out-of-order (OoO) instruction execution to drive high efficiency in the use of available execution paths. The POWER7 processor has an Instruction Sequence Unit that is capable of dispatching up to six instructions per cycle to a set of queues. Up to eight instructions per cycle can be issued to the Instruction Execution units.

This gives the following theoretical single precision (SP) performance figures (based on a 4.14 GHz 8 core implementation):

4 64-bit SIMD units per core, and a 128-bit SIMD VMX unit per core, can do 12 Multiply-Adds per cycle, giving 24 SP FP ops per cycle. At 4.14 GHz, that gives 4.14 billion * 24 = 99.36 SP GFLOPS, and at 8 cores, 794.88 SP GFLOPS.

Peak double precision (DP) performance is roughly half of peak SP performance.

For comparison, Intel's 2013 Haswell architecture CPUs can do 16 DP FLOPs or 32 SP FLOPs per cycle (8/16 DP/SP fused multiply-add spread across 2× 256-bit AVX2 FP vector units). [16] At 3.4 GHz (i7-4770) this translates into 108.8 SP GFLOPS per core and 435.2 SP GFLOPS peak performance across the 4-core chip, giving roughly similar levels of performance per core, without taking into account the effects or benefits of Intel's Turbo Boost technology.

This theoretical peak performance comparison holds in practice too, with the POWER7 and the i7-4770 obtaining similar scores in the SPEC CPU2006 floating point benchmarks (single-threaded): 71.5 [17] for POWER7 versus 74.0 [18] for i7-4770.

Notice that the POWER7 chip significantly outperformed (2×–5×) the i7 in some benchmarks (bwaves, cactusADM, lbm) while also being significantly slower (2x-3x) in most others. This is indicative of major architectural differences between the two chips / mainboards / memory systems etc.: they were designed with different workloads in mind.

However, overall, in a very broad sense, one can say that the floating-point performance of the POWER7 is similar to that of the Haswell i7.

POWER7+

IBM introduced the POWER7+ processor at the Hot Chips 24 conference in August 2012. It is an updated version with higher speeds, more cache and integrated accelerators. It is manufactured on a 32 nm fabrication process. [19]

The first boxes to ship with the POWER7+ processors were IBM Power 770 and 780 servers. The chips have up to 80 MB of L3 cache (10 MB/core), improved clock speeds (up to 4.4 GHz) and 20 LPARs per core. [20]

Products

As of October 2011, the range of POWER7-based systems including IBM Power Systems "Express" models (710, 720, 730, 740 and 750), Enterprise models (770, 780 and 795) and High Performance computing models (755 and 775). Enterprise models differ in having Capacity on Demand capabilities. Maximum specifications are shown in the table below.

IBM POWER7 and POWER7+ servers
NameNumber of socketsNumber of coresCPU clock frequency
710 Express164.2 GHz
710 Express184.2 GHz
720 Express (8202-E4B, POWER7) [21] 183.0 GHz
720 Express (8202-E4D, POWER7+) [22] 183.6 GHz
730 Express2124.2 GHz
730 Express2163.6 GHz or 4.2 GHz
740 Express2124.2 GHz
740 Express2163.6 GHz or 4.2 GHz
750 Express4243.72 GHz
750 Express4323.22 GHz or 3.61 GHz
7554323.61 GHz
7708483.7 GHz
7708643.3 GHz
775 (Per Node)322563.83 GHz
780 (MaxCore mode)8643.92 GHz
780 (TurboCore mode)8324.14 GHz
780 (4 Socket Node)16963.44 GHz
795321923.72 GHz
795 (MaxCore mode)322564.0 GHz
795 (TurboCore mode)321284.25 GHz

IBM also offers 5 POWER7 based BladeCenters. [23] Specifications are shown in the table below.

IBM POWER7 blade servers
NameNumber of coresCPU clock frequencyBlade slots required
BladeCenter PS70043.0 GHz1
BladeCenter PS70183.0 GHz1
BladeCenter PS702163.0 GHz2
BladeCenter PS703162.4 GHz1
BladeCenter PS704322.4 GHz2

The following are supercomputer projects that use the POWER7 processor:

See also

Related Research Articles

<span class="mw-page-title-main">Instructions per second</span> Measure of a computers processing speed

Instructions per second (IPS) is a measure of a computer's processor speed. For complex instruction set computers (CISCs), different instructions take different amounts of time, so the value measured depends on the instruction mix; even for comparing processors in the same family the IPS measurement can be problematic. Many reported IPS values have represented "peak" execution rates on artificial instruction sequences with few branches and no cache contention, whereas realistic workloads typically lead to significantly lower IPS values. Memory hierarchy also greatly affects processor performance, an issue barely considered in IPS calculations. Because of these problems, synthetic benchmarks such as Dhrystone are now generally used to estimate computer performance in commonly used applications, and raw IPS has fallen into disuse.

<span class="mw-page-title-main">Hyper-threading</span> Proprietary simultaneous multithreading implementation by Intel

Hyper-threading is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations performed on x86 microprocessors. It was introduced on Xeon server processors in February 2002 and on Pentium 4 desktop processors in November 2002. Since then, Intel has included this technology in Itanium, Atom, and Core 'i' Series CPUs, among others.

<span class="mw-page-title-main">Xeon</span> Line of Intel server and workstation processors

Xeon is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded markets. It was introduced in June 1998. Xeon processors are based on the same architecture as regular desktop-grade CPUs, but have advanced features such as support for error correction code (ECC) memory, higher core counts, more PCI Express lanes, support for larger amounts of RAM, larger cache memory and extra provision for enterprise-grade reliability, availability and serviceability (RAS) features responsible for handling hardware exceptions through the Machine Check Architecture (MCA). They are often capable of safely continuing execution where a normal processor cannot due to these extra RAS features, depending on the type and severity of the machine-check exception (MCE). Some also support multi-socket systems with two, four, or eight sockets through use of the Ultra Path Interconnect (UPI) bus, which replaced the older QuickPath Interconnect (QPI) bus.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

Cell is a 64-bit multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.

<span class="mw-page-title-main">POWER5</span> 2004 family of multiprocessors by IBM

The POWER5 is a microprocessor developed and fabricated by IBM. It is an improved version of the POWER4. The principal improvements are support for simultaneous multithreading (SMT) and an on-die memory controller. The POWER5 is a dual-core microprocessor, with each core supporting one physical thread and two logical threads, for a total of two physical threads and four logical threads.

The megahertz myth, or in more recent cases the gigahertz myth, refers to the misconception of only using clock rate to compare the performance of different microprocessors. While clock rates are a valid way of comparing the performance of different speeds of the same model and type of processor, other factors such as an amount of execution units, pipeline depth, cache hierarchy, branch prediction, and instruction sets can greatly affect the performance when considering different processors. For example, one processor may take two clock cycles to add two numbers and another clock cycle to multiply by a third number, whereas another processor may do the same calculation in two clock cycles. Comparisons between different types of processors are difficult because performance varies depending on the type of task. A benchmark is a more thorough way of measuring and comparing computer performance.

<span class="mw-page-title-main">POWER6</span> 2007 family of multiprocessors by IBM

The POWER6 is a microprocessor developed by IBM that implemented the Power ISA v.2.05. When it became available in systems in 2007, it succeeded the POWER5+ as IBM's flagship Power microprocessor. It is claimed to be part of the eCLipz project, said to have a goal of converging IBM's server hardware where practical.

<span class="mw-page-title-main">Multi-core processor</span> Microprocessor with more than one processing unit

A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

The AMD Bulldozer Family 15h is a microprocessor microarchitecture for the FX and Opteron line of processors, developed by AMD for the desktop and server markets. Bulldozer is the codename for this family of microarchitectures. It was released on October 12, 2011, as the successor to the K10 microarchitecture.

The SPARC64 V (Zeus) is a SPARC V9 microprocessor designed by Fujitsu. The SPARC64 V was the basis for a series of successive processors designed for servers, and later, supercomputers.

An AES instruction set is a set of instructions that are specifically designed to perform AES encryption and decryption operations efficiently. These instructions are typically found in modern processors and can greatly accelerate AES operations compared to software implementations. An AES instruction set includes instructions for key expansion, encryption, and decryption using various key sizes.

<span class="mw-page-title-main">Intel Core</span> Line of CPUs by Intel

Intel Core is a line of multi-core central processing units (CPUs) for midrange, embedded, workstation and enthusiast computer markets marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets.

The IBM A2 is an open source massively multicore capable and multithreaded 64-bit Power ISA processor core designed by IBM using the Power ISA v.2.06 specification. Versions of processors based on the A2 core range from a 2.3 GHz version with 16 cores consuming 65 W to a less powerful, four core version, consuming 20 W at 1.4 GHz.

<span class="mw-page-title-main">POWER8</span> 2014 family of multi-core microprocessors by IBM

POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">Espresso (processor)</span> 32-bit CPU for the Wii U

Espresso is the codename of the 32-bit central processing unit (CPU) used in Nintendo's Wii U video game console. It was designed by IBM, and was produced using a 45 nm silicon-on-insulator process. The Espresso chip resides together with a GPU from AMD on an MCM manufactured by Renesas. It was revealed at E3 2011 in June 2011 and released in November 2012.

IBM Power microprocessors are designed and sold by IBM for servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power line of microprocessors has been used in IBM's RS/6000, AS/400, pSeries, iSeries, System p, System i, and Power Systems lines of servers and supercomputers. They have also been used in data storage devices and workstations by IBM and by other server manufacturers like Bull and Hitachi.

References

  1. Authier, Isabelle (17 February 2011). "IBM Bromont au coeur de Watson" [IBM Bromont at the heart of Watson]. Cyberpresse (in French). Archived from the original on 19 February 2011. Retrieved 17 February 2011.
  2. "IBM Unveils New POWER7 Systems To Manage Increasingly Data-Intensive Services". IBM. 8 February 2010. Retrieved 13 September 2010.
  3. "New POWER7 workload optimizing systems". YouTube . IBM. 5 February 2010. Archived from the original on 8 February 2011. Retrieved 22 February 2010.
  4. "Cray, IBM picked for U.S. petaflop computer effort". EE Times . 22 November 2006. Retrieved 13 November 2022.
  5. 1 2 "Hot Chips XXI Preview". Real World Technologies. Retrieved 17 August 2009.
  6. Kanter, David. "New Information on POWER7" . Retrieved 11 August 2011.
  7. Varhol, Peter (9 February 2010). "IBM Launches POWER 7 Processor February 9, 2010" . Retrieved 11 August 2011.
  8. "ISCA 29 Conference Notes" . Retrieved 11 August 2011.
  9. "IBM Tips Power6 Processor Architecture". Information Week. 6 February 2006. Retrieved 6 February 2006.
  10. "IBM Power Systems 775 HPC Solution" (PDF). Retrieved 28 April 2020.
  11. 1 2 "IBM Unveils New POWER7 Systems To Manage Increasingly Data-Intensive Services". IBM.com. Retrieved 11 August 2011.
  12. "IBM in Education – Business & Technology Solutions". IBM. Archived from the original on 4 October 2012. Retrieved 8 July 2009.
  13. "IBM's 8-core POWER7: twice the muscle, half the transistors". Ars Technica. September 2009. Retrieved 1 September 2009.
  14. "Bluewater HW specifications". National Center for Supercomputing Applications. Archived from the original on 23 January 2010. Retrieved 31 December 2009.
  15. "IBM Power 770 and 780 Technical Overview and Introduction" (PDF). IBM . Retrieved 21 August 2011.
  16. Anand Lal Shimpi (5 October 2012). "Intel's Haswell Architecture Analyzed: Building a New PC and a New Intel". Anandtech .
  17. "SPEC CFP2006 Result, IBM Power 780 Server (3.86 GHz, 16 core)".
  18. "SPEC CFP2006 Result, Intel DH87MC Motherboard (Intel Core i7-4770)".
  19. "Hot Chips: Update für IBMs Power7". Archived from the original on 18 May 2015. Retrieved 30 August 2012.
  20. Morgan, Timothy Prickett (3 October 2012). "Power7+ chips debut in fat IBM midrange systems". The Register .
  21. "IBM Power 720 and 740 Technical Overview and Introduction" (PDF). IBM Redbooks. IBM. 3 December 2012. p. 9. Retrieved 13 May 2021.
  22. "IBM Power 720 and 740 Technical Overview and Introduction" (PDF). IBM Redbooks. IBM. 16 May 2013. p. 9. Retrieved 3 June 2021.
  23. "IBM Power Systems hardware - Blade servers". IBM. Retrieved 30 January 2012.