QPACE2

Last updated
QPACE 2 prototype at the University of Regensburg Qpace2 rack.jpg
QPACE 2 prototype at the University of Regensburg

QPACE 2 (QCD Parallel Computing Engine) is a massively parallel and scalable supercomputer. It was designed for applications in lattice quantum chromodynamics but is also suitable for a wider range of applications..

Contents

Overview

QPACE 2 is a follow-up to the QPACE supercomputer [1] and the iDataCool hot-water cooling project. [2] It is a combined effort of the particle physics group at the University of Regensburg and the Italian company Eurotech. The academic design team consisted of about 10 junior and senior physicists. Details of the project are described in. [3]

QPACE 2 uses Intel Xeon Phi processors (a.k.a. KNC), interconnected by a combination of PCI Express (abbreviated PCIe) and FDR InfiniBand. The main features of the QPACE 2 prototype installed at the University of Regensburg are

The prototype is a one-rack installation that consists of 64 nodes with 15,872 physical cores in total and a peak performance of 310 TFlop/s. It was deployed in the summer of 2015 [4] and is being used for simulations of lattice quantum chromodynamics. In November 2015, QPACE 2 was ranked #500 on the Top500 list of the most powerful supercomputers [5] and #15 on the Green 500 list of the most energy-efficient supercomputers of the world. [6]

QPACE 2 was funded by the German Research Foundation (DFG) in the framework of SFB/TRR-55 and by Eurotech.

Architecture

Many current supercomputers are hybrid architectures that use accelerator cards with a PCIe interface to boost the compute performance. In general, server processors support only a limited number of accelerators due to the limited number of PCIe lanes (typically 40 for the Intel Haswell architecture). The common approach to integrate multiple accelerators cards into the host system is to arrange multiple server processors, typically two or four, as distributed shared memory systems. This approach allows for a higher number of accelerators per compute node due to the higher number of PCIe lanes. However, it also comes with several disadvantages:

The QPACE 2 architecture addresses these disadvantages by a node design in which a single low-power Intel Haswell E3 host CPU accommodates four Xeon Phi 7120X accelerator cards for computational power and one dual-port FDR InfiniBand network interface card for external communication. To achieve this, the components within a node are interconnected by a PCIe switch with 96 lanes.

The QPACE 2 rack contains 64 compute nodes (and thus 256 Xeon Phi accelerators in total). 32 nodes each are on the front- and backside of the rack. The power subsystem consists of 48 power supplies that deliver an aggregate peak power of 96 kW. QPACE 2 relies on a warm-water cooling solution to achieve this packaging and power density.

Compute node

QPACE 2 schematic node design Qpace2 node.png
QPACE 2 schematic node design

The QPACE 2 node consists of commodity hardware interconnected by PCIe. The midplane hosts a 96-lane PCIe switch (PEX8796 by Avago, formerly PLX Technology), provides six 16-lane PCIe Gen3 slots, and delivers power to all slots. One slot is used for the CPU card, which is a PCIe form factor card containing one Intel Haswell E3-1230L v3 server processor with 16 GB DDR3 memory as well as a microcontroller to monitor and control the node. Four slots are used for Xeon Phi 7120X cards with 16 GB GDDR5 each, and one slot for a dual-port FDR InfiniBand network interface card (Connect-IB by Mellanox).

The midplane and the CPU card were designed for the QPACE 2 project but can be reused for other projects or products.

The low-power Intel E3-1230L v3 server CPU is energy-efficient, but weak in computational power compared to other server processors available around 2015 (and in particular weaker than most accelerator cards). The CPU does not contribute significantly to the compute power of the node. It is merely running the operating system and system-relevant drivers. Technically, the CPU serves as a root complex for the PCIe fabric. The PCIe switch extends the host CPU's limited number of PCIe lanes to a total of 80 lanes, therefore enabling a multitude of components (4x Xeon Phi and 1x InfiniBand, each x16 PCIe) to be connected to the CPU as PCIe endpoints. This architecture also allows the Xeon Phis to do peer-to-peer communication via PCIe and to directly access the external network without having to go through the host CPU.

Each QPACE 2 node comprises 248 physical cores (host CPU: 4, Xeon Phi: 61 each). Host processor and accelerators support multithreading. The number of logical cores per node is 984.

The design of the node is not limited to the components used in QPACE 2. In principle, any cards supporting PCIe, e.g., accelerators such as GPUs and other network technologies than InfiniBand, can be used as long as form factor and power specifications are met.

Networks

8x8 hyper-crossbar. Each of the 64 nodes (with 2 ports each) is connected to one switch in the x (red) and one switch in the y (blue) direction. The switches (indicated by rectangles) are arranged in a 2x2 mesh. Hypercrossbar.svg
8x8 hyper-crossbar. Each of the 64 nodes (with 2 ports each) is connected to one switch in the x (red) and one switch in the y (blue) direction. The switches (indicated by rectangles) are arranged in a 2x2 mesh.

The intra-node communication proceeds via the PCIe switch without host CPU involvement. The inter-node communication is based on FDR InfiniBand. The topology of the InfiniBand network is a two-dimensional hyper-crossbar. This means that a two-dimensional mesh of InfiniBand switches is built, and the two InfiniBand ports of a node are connected to one switch in each of the dimensions. The hyper-crossbar topology was first introduced by the Japanese CP-PACS collaboration of particle physicists. [7]

The InfiniBand network is also used for I/O to a Lustre file system.

The CPU card provides two Gigabit Ethernet interfaces that are used to control the nodes and to boot the operating system.

Cooling

Midplane with a single water-cooled Xeon Phi and 5 empty slots. Mars knc small.jpg
Midplane with a single water-cooled Xeon Phi and 5 empty slots.

The nodes of the QPACE 2 supercomputer are cooled by water using an innovative concept based on roll-bond technology. [8] Water flows through a roll-bond plate made of aluminum which is thermally coupled to the hot components via aluminum or copper interposers and thermal grease or thermal interface material. All components of the node are cooled in this way. The performance of the cooling concept allows for free cooling year-round.

The power consumption of a node was measured to be up to 1400 Watt in synthetic benchmarks. Around 1000 Watt are needed for typical computations in lattice quantum chromodynamics.

System software

The diskless nodes are operated using a standard Linux distribution (CentOS 7), which is booted over the Ethernet network. The Xeon Phis are running the freely available Intel Manycore Platform Software Stack (MPSS). The InfiniBand communication is based on the OFED stack, which is freely available as well.

See also

Related Research Articles

<span class="mw-page-title-main">InfiniBand</span> High-speed, low-latency computer networking bus used in supercomputing

InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used as either a direct or switched interconnect between servers and storage systems, as well as an interconnect between storage systems. It is designed to be scalable and uses a switched fabric network topology. By 2014, it was the most commonly used interconnect in the TOP500 list of supercomputers, until about 2016.

<span class="mw-page-title-main">Quadrics (company)</span>

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

<span class="mw-page-title-main">Altix</span> Supercomputer family

Altix is a line of server computers and supercomputers produced by Silicon Graphics, based on Intel processors. It succeeded the MIPS/IRIX-based Origin 3000 servers.

<span class="mw-page-title-main">MareNostrum</span> Supercomputer in the Barcelona Supercomputing Center

MareNostrum is the main supercomputer in the Barcelona Supercomputing Center. It is the most powerful supercomputer in Spain, one of thirteen supercomputers in the Spanish Supercomputing Network and one of the seven supercomputers of the European infrastructure PRACE.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and across the USA. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high performance computing, scientific visualization, data analysis & storage systems, software, research & development and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable computational research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.

<span class="mw-page-title-main">Irish Centre for High-End Computing</span>

The Irish Centre for High-End Computing (ICHEC) is the national high-performance computing centre in Ireland. It was established in 2005 and provides supercomputing resources, support, training and related services. ICHEC is involved in education and training, including providing courses for researchers.

The Cray CX1 is a deskside high-performance workstation designed by Cray Inc., based on the x86-64 processor architecture. It was launched on September 16, 2008, and was discontinued in early 2012. It comprises a single chassis blade server design that supports a maximum of eight modular single-width blades, giving up to 96 processor cores. Computational load can be run independently on each blade and/or combined using clustering techniques.

<span class="mw-page-title-main">Pleiades (supercomputer)</span>

Pleiades is a petascale supercomputer housed at the NASA Advanced Supercomputing (NAS) facility at NASA's Ames Research Center located at Moffett Field near Mountain View, California. It is maintained by NASA and partners Hewlett Packard Enterprise and Intel.

QPACE is a massively parallel and scalable supercomputer designed for applications in lattice quantum chromodynamics.

<span class="mw-page-title-main">Tsubame (supercomputer)</span> Series of supercomputers

Tsubame is a series of supercomputers that operates at the GSIC Center at the Tokyo Institute of Technology in Japan, designed by Satoshi Matsuoka.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">NCAR-Wyoming Supercomputing Center</span>

The NCAR-Wyoming Supercomputing Center (NWSC) is a high-performance computing (HPC) and data archival facility located in Cheyenne, Wyoming that provides advanced computing services to researchers in the Earth system sciences.

<span class="mw-page-title-main">NVLink</span> High speed chip interconnect

NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS).

<span class="mw-page-title-main">Cray XC40</span> Supercomputer manufactured by Cray

The Cray XC40 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Haswell Xeon processors, with optional Nvidia Tesla or Intel Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

Nvidia DGX is a line of NVIDIA-produced servers and workstations which specialize in using GPGPU to accelerate deep learning applications. The typical design of a DGX system is based upon a rackmount chassis with motherboard that carries high performance x86 server CPUs. The main component of a DGX system is a set of 4 to 16 Nvidia Tesla GPU modules on an independent system board, DGX systems have large heatsinks and powerful fans to adequately cool thousands of watts of thermal output. The GPU modules are typically integrated into the system using a version of the SXM socket.

Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures.

<span class="mw-page-title-main">NEC SX-Aurora TSUBASA</span>

The NEC SX-Aurora TSUBASA is a vector processor of the NEC SX architecture family. Unlike previous SX supercomputers, the SX-Aurora TSUBASA is provided as a PCIe card, termed by NEC as a "Vector Engine" (VE). Eight VE cards can be inserted into a vector host (VH) which is typically a x86-64 server running the Linux operating system. The product has been announced in a press release on 25 October 2017 and NEC has started selling it in February 2018. The product succeeds the SX-ACE.

Inspur Server Series is a series of server computers introduced in 1993 by Inspur, an information technology company, and later expanded to the international markets. The servers were likely among the first originally manufactured by a Chinese company. It is currently developed by Inspur Information and its San Francisco-based subsidiary company - Inspur Systems, both Inspur's spinoff companies. The product line includes GPU Servers, Rack-mounted servers, Open Computing Servers and Multi-node Servers.

<span class="mw-page-title-main">Taiwania 3</span> Supercomputer of Taiwan

Taiwania 3 is one of the supercomputers made by Taiwan. and also the newest one. It is placed in the National Center for High-performance Computing of NARLabs. There are 50,400 cores in total with 900 nodes, using Intel Xeon Platinum 8280 2.4 GHz CPU and using CentOS as Operating System. It is an open access for public supercomputer. It is currently open access to scientists and more to do specific research after get permission from Taiwan's National Center for High-performance Computing. This is the third supercomputer of the Taiwania series. It uses CentOS x86_64 7.8 as its system operator and Slurm Workload Manager as workflow manager to ensure better performance. Taiwania 3 uses InfiniBand HDR100 100Gbit/s high speed Internet connection to ensure better performance of the supercomputer. The main memory capability is 192GB. There's currently two Intel Xeon Platinum 8280 2.4 GHz CPU inside each node. The full calculation capability is 2.7PFLOPS. It is launched into operation in November 2020 before schedule due to the needed for COVID-19. It is currently ranked number 227 on Top 500 list of June, 2021 and number 80 on Green 500 list. It is manufactured by Quanta Computer, Taiwan Fixed Network, and ASUS Cloud.

References

  1. H. Baier et al., PoS LAT2009 (2009) 001, (arXiv : 0911.2174)
  2. N. Meyer et al., Lecture Notes in Computer Science 7905 (2013) 383, (arXiv : 1309.4887)
  3. P. Arts et al., PoS LAT2014 (2014) 021, (arXiv : 1502.04025)
  4. Eurotech press release
  5. The Top500 list, November 2015, http://top500.org/system/178607
  6. The Green500 list, November 2015, http://green500.org/lists/green201511&green500from=1&green500to=100
  7. Y. Iwasaki, Nucl. Phys. Proc. Suppl. 34 (1994) 78, (arXiv : hep-lat/9401030)
  8. J. Beddoes and M. Bibby, Principles of Metal Manufacturing Processes, Elsevier Science (1999)