Power10

Last updated

Power10
IBM Power10 SCM.jpg
Power10 SCM
General information
Launched2021
Designed by IBM, OpenPower partners
Common manufacturer
Performance
Max. CPU clock rate +3.5 GHz to +4 GHz
Cache
L1 cache 48+32 KB per core
L2 cache2 MB per core
L3 cache120 MB per chip
Architecture and classification
Technology node 7 nm
Microarchitecture P10
Instruction set Power ISA (Power ISA v.3.1)
Physical specifications
Cores
  • 15 SMT8 cores
    30 SMT4 cores
Package
  • OLGA SCM and DCM
Socket
  • 1–16
History
Predecessor POWER9

Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, and announced in August 2020 at the Hot Chips conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server.

Contents

The processor is designed to have 15 cores available, but a spare core will be included during manufacture to cost-effectively allow for yield issues.

Power10-based processors will be manufactured by Samsung using a 7 nm process with 18 layers of metal and 18 billion transistors on a 602 mm2 silicon die. [1] [2] [3] [4]

The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads. [5]

Design

Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. The core is eight-way multithreaded (SMT8) and has 48 KB instruction and 32 KB data L1 caches, a 2 MB large L2 cache and a very large translation lookaside buffer (TLB) with 4096 entries. [3] Latency cycles to the different cache stages and TLB has been reduced significantly. Each core has eight execution slices each with one floating-point unit (FPU), arithmetic logic unit (ALU), branch predictor, load–store unit and SIMD-engine, able to be fed 128-bit (64+64) instructions from the new prefix/fuse instructions of the Power ISA v.3.1. Each execution slice can handle 20 instructions each, backed up by a shared 512-entry instruction table, and fed to 128-entry-wide (64 single-threaded) load queue and 80-entry (40 single-threaded) wide store queue. Better branch prediction features have doubled the accuracy. A core has four matrix math assist (MMA) engines, [6] for better handling of SIMD code, especially for matrix multiplication instructions where AI inference workloads have a 20-fold performance increase. [7]

The processor has two "hemispheres" with eight cores each, sharing a 64 MB L3 cache for a total of 16 cores and 128 MB L3 caches. Due to yield issues, at least one core is always disabled, reducing L3 cache by 8 MB to a usable total of 15 cores and 120 MB L3 cache. Each chip also has eight crypto accelerators offloading common algorithms such as AES and SHA-3.

Increased clock gating and reworked microarchitecture at every stage, together with the fuse/prefix instructions enabling more work with fewer work units, and smarter cache with lower memory latencies and effective address tagging reducing cache misses, enables the Power10 core to consume half the power as POWER9. Combined with the improvements in the compute facilities by up to 30% makes the whole processor perform 2.6× better per watt than its predecessor. And in the case of mounting two cores on the same module, up to 3 times as fast in the same power budget.

As the cores can act like eight logical processors each the 15-core processor looks like 120 cores to the operating system. On a dual-chip module, that becomes 240 simultaneous threads per socket.

I/O

The chips have completely reworked memory and I/O architectures, using the open Coherent Accelerator Processor Interface (OpenCAPI) and Open Memory Interface (OMI). Using serial memory communications to off chip controllers reduces signaling lanes to and from the chip, increases the bandwidth and allows the processor to be flexible in its memory technology,. [4]

Power10 supports a wide range of memory types, including DDR3 through DDR5, GDDR, HBM, or Persistent Storage Memory. These configurations can be changed by the customer to best fit the use case intended for the system.

Power10 enables encrypting of data with no performance penalty at every stage from RAM, across accelerators and cluster nodes to data at rest.

Power10 comes with PowerAXON facility enabling chip to chip, system to system and OpenCAPI bus for accelerators, I/O and other high performance cache coherent peripherals. It manages the communications between nodes in a 16x socket single chip module (SCM) cluster or a 4x socket dual chip module (DCM) cluster. It also manages the memory semantics for clustering of systems enabling load/store access from the core up to 2 PB of RAM on the entire Power10 cluster. IBM calls this feature Memory Inception.

Both OMI and PowerAXON can handle 1 TB/s communications off the chip.

Power10 includes PCIe 5. The SCM has 32x and the DCM has 64x PCIe 5 lanes. The decision to remove NVLink support from Power10 was made due to PCIe 5.0's bandwidth capabilities rendering NVLink support obsolete for the use cases that Power10 was designed for. [3] Support for NVLink on-chip was previously a unique selling point for POWER8 and POWER9.

Variants

The Power10 chip is available in two variants, defined by firmware in the packaging. Even though the chips are physically identical and the difference is set in firmware, it cannot be changed by the user nor IBM after manufacturing. [8]

Modules

The Power10 comes in three flip-chip plastic land grid array (FC-PLGA) packages: one single chip module (SCM) and two dual-chip modules (DCM and eSCM).

Systems

Enterprise

The IBM Power E1080, codename Denali, is the top end Power10 computer by IBM. It's made of 1-4× Central Electronics Complex (CEC) nodes, each one taking up 5Us of space. Each node has 4× Power10 SCM, configurable with 10, 12, or 15 SMT8 cores per processor, and up to 16 TB OMI-DDR4 RAM. The Power E1080 natively runs PowerVM running AIX, IBM i and little-endian Linux. [12] An E1080 system also needs a 2U high System Control Unit for monitoring and configuration.

The Power E1080 also supports up to sixteen I/O expansion drawers, four per CEC node. Each expansion drawer is connected to the respective CEC node by two PCIe fanout modules, and has twelve FHFL PCIe slots. Four of these slots are PCIe 3.0 x16, while the remaining eight are PCIe 3.0 x8. A maximum specification configuration allows the Power E1080 to support 192 single slot PCIe cards across a 16 socket system. [13]

Mid-range

Scale-out

The S-models can run Linux, IBM i and AIX. The L-models are made for Linux, but are allowed to run AIX and IBM i on up to 25% of available CPU cores. [10]

Operating system support

Comparison with earlier POWER CPUs

The change to a 7-nm fabrication process results in significantly higher performance per watt.

The PowerAXON facility now extends all the way to 2  PB of unified clustered memory space, shared across multiple cluster nodes, and includes support for PCIe 5.

New SIMD instructions and new data types including bfloat16, INT4(INTEGER) and INT8(BIGINT). [16] [17] are aimed at improving AI workloads.

Unlike earlier POWER9 and POWER8 CPUs, Power10 requires closed source, third party firmware in security sensitive areas of the CPU module, along with additional closed source, third party firmware in the required off-module memory controller. [18]

Branding

Power10 is unusual in that its name is not capitalised like POWER9 and all other previous POWER processors are. This change is one part in IBM's rebranding of their Power Systems offering, which beginning with Power10 is now just "Power". Power10 also has a logo. [19]

See also

Related Research Articles

<span class="mw-page-title-main">Athlon</span> Brand of microprocessors by AMD

Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by AMD. The original Athlon was the first seventh-generation x86 processor and the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop SoC architecture, and Socket AM4 Zen (microarchitecture). The modern Zen-based Athlon with a Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor.

<span class="mw-page-title-main">Opteron</span> Server and workstation processor line by AMD

Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture. It was released on April 22, 2003, with the SledgeHammer core (K8) and was intended to compete in the server and workstation markets, particularly in the same segment as the Intel Xeon processor. Processors based on the AMD K10 microarchitecture were announced on September 10, 2007, featuring a new quad-core configuration. The last released Opteron CPUs are the Piledriver-based Opteron 4300 and 6300 series processors, codenamed "Seoul" and "Abu Dhabi" respectively.

<span class="mw-page-title-main">POWER5</span> 2004 family of multiprocessors by IBM

The POWER5 is a microprocessor developed and fabricated by IBM. It is an improved version of the POWER4. The principal improvements are support for simultaneous multithreading (SMT) and an on-die memory controller. The POWER5 is a dual-core microprocessor, with each core supporting one physical thread and two logical threads, for a total of two physical threads and four logical threads.

<span class="mw-page-title-main">LGA 1155</span> Intel CPU socket

LGA 1155, also called Socket H2, is a zero insertion force flip-chip land grid array (LGA) CPU socket designed by Intel for their CPUs based on the Sandy Bridge and Ivy Bridge microarchitectures.

The z196 microprocessor is a chip made by IBM for their zEnterprise 196 and zEnterprise 114 mainframe computers, announced on July 22, 2010. The processor was developed over a three-year time span by IBM engineers from Poughkeepsie, New York; Austin, Texas; and Böblingen, Germany at a cost of US$1.5 billion. Manufactured at IBM's Fishkill, New York fabrication plant, the processor began shipping on September 10, 2010. IBM stated that it was the world's fastest microprocessor at the time.

IBM Storwize systems were virtualizing RAID computer data storage systems with raw storage capacities up to 32 PB. Storwize is based on the same software as IBM SAN Volume Controller (SVC).

<span class="mw-page-title-main">POWER8</span> 2014 family of multi-core microprocessors by IBM

POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors.

IBM Power microprocessors are designed and sold by IBM for servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power line of microprocessors has been used in IBM's RS/6000, AS/400, pSeries, iSeries, System p, System i, and Power Systems lines of servers and supercomputers. They have also been used in data storage devices and workstations by IBM and by other server manufacturers like Bull and Hitachi.

<span class="mw-page-title-main">POWER9</span> 2017 family of multi-core microprocessors by IBM

POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">Epyc</span> AMD brand for server microprocessors

Epyc is a brand of multi-core x86-64 microprocessors designed and sold by AMD, based on the company's Zen microarchitecture. Introduced in June 2017, they are specifically targeted for the server and embedded system markets.

The z14 is a microprocessor made by IBM for their z14 mainframe computers, announced on July 17, 2017. Manufactured at GlobalFoundries' East Fishkill, New York fabrication plant. IBM stated that it is the world's fastest microprocessor by clock rate at 5.2 GHz, with a 10% increased performance per core and 30% for the whole chip compared to its predecessor the z13.

IBM FlashCore Modules (FCM) are solid state technology computer data storage modules using PCI Express attachment and the NVMe command set. They are offered as an alternative to industry-standard 2.5" NVMe SSDs in selected arrays from the IBM FlashSystem family, with raw storage capacities of 4.8 TB, 9.6 TB, 19.2 TB and 38.4 TB. FlashCore modules support hardware self-encryption and real-time inline hardware data compression up to 115.2 TB address space, without performance impact.

<span class="mw-page-title-main">LGA 1200</span> CPU socket for Intel desktop processors

LGA 1200, also known as Socket H5, is a zero insertion force flip-chip land grid array (LGA) socket, compatible with Intel desktop processors Comet Lake and Rocket Lake (11th-gen) desktop CPUs, which was released in April 2020.

Raptor Lake is Intel's codename for the 13th and 14th generations of Intel Core processors based on a hybrid architecture, utilizing Raptor Cove performance cores and Gracemont efficient cores. Like Alder Lake, Raptor Lake is fabricated using Intel's Intel 7 process. Raptor Lake features up to 24 cores and 32 threads and is socket compatible with Alder Lake systems. Like earlier generations, Raptor Lake processors also need accompanying chipsets.

References

  1. Dr. Cutress, Ian (August 17, 2020). "Hot Chips 2020 Live Blog: IBM's POWER10 Processor on Samsung 7nm". AnandTech .
  2. Quach, Katyanna (August 17, 2020). "IBM takes Power10 processors down to 7nm with Samsung, due to ship by end of 2021". The Register .
  3. 1 2 3 Schilling, Andreas (August 17, 2020). "IBM Power10 offers 30 cores with SMT8, PCIe 5.0 and DDR5". Hardware LUXX (in German).
  4. 1 2 Kennedy, Patrick (August 17, 2020). "IBM POWER10 Searching for the Holy Grail of Compute". ServeTheHome.
  5. "IBM Reveals Next-Generation IBM POWER10 Processor". IBM . August 17, 2020.
  6. Jose Moreira, Puneeth Bhat A H and Satish Kumar Sadasivam (April 15, 2021). Matrix-Multiply Assist Best Practices Guide.
  7. Russell, John (August 17, 2020). "IBM Debuts Power10; Touts New Memory Scheme, Security, and Inferencing". HPCwire.
  8. Prickett Morgan, Timothy (August 31, 2020). "IBM's Possible Designs For Power10 Systems". IT Jungle.
  9. 1 2 Giuliano Anselmi, Marc Gregorutti, Stephen Lutz, Michael Malicdem, Guido Somers, Tsvetomir Spasov (July 11, 2022). "IBM Power E1050 Technical Overview and Introduction" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  10. 1 2 Giuliano Anselmi, Young Hoon Cho, Andrew Laidlaw, Armin Röll, Tsvetomir Spasov (July 19, 2022). "IBM Power S1014, S1022s, S1022, and S1024 Technical Overview and Introduction" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  11. GitHub/OpenPower/Rainier source
  12. This is what the most powerful server in the world looks like
  13. Giuliano Anselmi, Manish Arora, Ivaylo Bozhinov, Dinil Das, Turgut Genc, Bartlomiej Grabowski, Madison Lee, Armin Röll (December 9, 2021). "IBM Power E1080 Technical Overview and Introduction" (PDF).{{cite web}}: CS1 maint: multiple names: authors list (link)
  14. Larabel, Michael (August 9, 2020). "Linux 5.9 Brings More IBM POWER10 Support, New/Faster SCV System Call ABI". Phoronix .
  15. 1 2 Prickett Morgan, Timothy (August 6, 2019). "Talking High Bandwidth with IBM's POWER10 Architect". The Next Platform.
  16. Patrizio, Andy (August 18, 2020). "IBM details next-gen POWER10 processor". Network World.
  17. "Data type aliases". IBM . August 26, 2020.
  18. "It's not just OMI that's the trouble with POWER10". September 8, 2021.
  19. No More Shouting The Name “Power” (Well, Except In Our Title Here)