QPACE

Last updated

QPACE (QCD Parallel Computing on the Cell Broadband Engine) is a massively parallel and scalable supercomputer designed for applications in lattice quantum chromodynamics.

Contents

Overview

The QPACE supercomputer is a research project carried out by several academic institutions in collaboration with the IBM Research and Development Laboratory in Böblingen, Germany, and other industrial partners including Eurotech, Knürr, and Xilinx. The academic design team of about 20 junior and senior scientists, mostly physicists, came from the University of Regensburg (project lead), the University of Wuppertal, DESY Zeuthen, Jülich Research Centre, and the University of Ferrara. The main goal was the design of an application-optimized scalable architecture that beats industrial products in terms of compute performance, price-performance ratio, and energy efficiency. The project officially started in 2008. Two installations were deployed in the summer of 2009. The final design was completed in early 2010. Since then QPACE is used for calculations of lattice QCD. The system architecture is also suitable for other applications that mainly rely on nearest-neighbor communication, e.g., lattice Boltzmann methods. [1]

In November 2009 QPACE was the leading architecture on the Green500 list of the most energy-efficient supercomputers in the world. [2] The title was defended in June 2010, when the architecture achieved an energy signature of 773 MFLOPS per Watt in the Linpack benchmark. [3] In the Top500 list of most powerful supercomputers, QPACE ranked #110-#112 in November 2009, and #131-#133 in June 2010. [4] [5]

QPACE was funded by the German Research Foundation (DFG) in the framework of SFB/TRR-55 and by IBM. Additional contributions were made by Eurotech, Knürr, and Xilinx.

Architecture

In 2008 IBM released the PowerXCell 8i multi-core processor, an enhanced version of the IBM Cell Broadband Engine used, e.g., in the PlayStation 3. The processor received much attention in the scientific community due to its outstanding floating-point performance. [6] [7] [8] It is one of the building blocks of the IBM Roadrunner cluster, which was the first supercomputer architecture to break the PFLOPS barrier. Cluster architectures based on the PowerXCell 8i typically rely on IBM BladeCenter blade servers interconnected by industry-standard networks such as Infiniband. For QPACE an entirely different approach was chosen. A custom-designed network co-processor implemented on Xilinx Virtex-5 FPGAs is used to connect the compute nodes. FPGAs are re-programmable semiconductor devices that allow for a customized specification of the functional behavior. The QPACE network processor is tightly coupled to the PowerXCell 8i via a Rambus-proprietary I/O interface.

The smallest building block of QPACE is the node card, which hosts the PowerXCell 8i and the FPGA. Node cards are mounted on backplanes, each of which can host up to 32 node cards. One QPACE rack houses up to eight backplanes, with four backplanes each mounted to the front and back side. The maximum number of node cards per rack is 256. QPACE relies on a water-cooling solution to achieve this packaging density.

Sixteen node cards are monitored and controlled by a separate administration card, called the root card. One more administration card per rack, called the superroot card, is used to monitor and control the power supplies. The root cards and superroot cards are also used for synchronization of the compute nodes.

Node card

The heart of QPACE is the IBM PowerXCell 8i multi-core processor. Each node card hosts one PowerXCell 8i, 4 GB of DDR2 SDRAM with ECC, one Xilinx Virtex-5 FPGA and seven network transceivers. A single 1 Gigabit Ethernet transceiver connects the node card to the I/O network. Six 10 Gigabit transceivers are used for passing messages between neighboring nodes in a three-dimensional toroidal mesh.

The QPACE network co-processor is implemented on a Xilinx Virtex-5 FPGA, which is directly connected to the I/O interface of the PowerXCell 8i. [9] [10] The functional behavior of the FPGA is defined by a hardware description language and can be changed at any time at the cost of rebooting the node card. Most entities of the QPACE network co-coprocessor are coded in VHDL.

Networks

The QPACE network co-processor connects the PowerXCell 8i to three communications networks: [10] [11]

Cooling

The compute nodes of the QPACE supercomputer are cooled by water. Roughly 115 Watt have to be dissipated from each node card. [10] The cooling solution is based on a two-component design. Each node card is mounted to a thermal box, which acts as a large heat sink for heat-critical components. The thermal box interfaces to a coldplate, which is connected to the water-cooling circuit. The performance of the coldplate allows for the removal of the heat from up to 32 nodes. The node cards are mounted on both sides of the coldplate, i.e., 16 nodes each are mounted on the top and bottom of the coldplate. The efficiency of the cooling solution allows for the cooling of the compute nodes with warm water. The QPACE cooling solution also influenced other supercomputer designs such as SuperMUC. [12]

Installations

Two identical installations of QPACE with four racks have been operating since 2009:

The aggregate peak performance is about 200 TFLOPS in double precision, and 400 TFLOPS in single precision. The installations are operated by the University of Regensburg, Jülich Research Centre, and the University of Wuppertal.

See also

Related Research Articles

Field-programmable gate array Array of logic gates that are reprogrammable

A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturing – hence the term "field-programmable". The FPGA configuration is generally specified using a hardware description language (HDL), similar to that used for an application-specific integrated circuit (ASIC). Circuit diagrams were previously used to specify the configuration, but this is increasingly rare due to the advent of electronic design automation tools.

Supercomputer Extremely powerful computer for its era

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). Since November 2017, all of the world's fastest 500 supercomputers run Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

IBM Blue Gene Series of supercomputers by IBM

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption.

Altera

Altera Corporation was a manufacturer of programmable logic devices (PLDs) headquartered in San Jose, California. On December 28, 2015, the company was acquired by Intel.

Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as many other forms of dedicated computation.

Xilinx

Xilinx, Inc. is an American technology company that is primarily a supplier of programmable logic devices. The company invented the field-programmable gate array (FPGA). It is the semiconductor company that created the first fabless manufacturing model.

The PowerPC 400 family is a line of 32-bit embedded RISC processor cores based on the PowerPC or Power ISA instruction set architectures. The cores are designed to fit inside specialized applications ranging from system-on-a-chip (SoC) microcontrollers, network appliances, application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) to set-top boxes, storage devices and supercomputers.

Nallatech is a computer hardware and software firm based in Camarillo, California, United States. The company specializes in field-programmable gate array (FPGA) integrated circuit technology applied in computing. As of 2007 the company's primary markets include defense and high-performance computing. Nallatech was acquired by Interconnect Systems, Inc. in 2008, which in turn was bought by Molex in 2016.

Roadrunner (supercomputer)

Roadrunner was a supercomputer built by IBM for the Los Alamos National Laboratory in New Mexico, USA. The US$100-million Roadrunner was designed for a peak performance of 1.7 petaflops. It achieved 1.026 petaflops on May 25, 2008, to become the world's first TOP500 LINPACK sustained 1.0 petaflops system.

A multi-gigabit transceiver (MGT) is a SerDes capable of operating at serial bit rates above 1 Gigabit/second. MGTs are used increasingly for data communications because they can run over longer distances, use fewer wires, and thus have lower costs than parallel interfaces with equivalent data throughput.

The QCDOC is a supercomputer technology focusing on using relatively cheap low power processing elements to produce a massively parallel machine. The machine is custom-made to solve small but extremely demanding problems in the fields of quantum physics.

The Cray CX1 is a deskside high-performance workstation designed by Cray Inc., based on the x86-64 processor architecture. It was launched on September 16, 2008, and was discontinued in early 2012. It comprises a single chassis blade server design that supports a maximum of eight modular single-width blades, giving up to 96 processor cores. Computational load can be run independently on each blade and/or combined using clustering techniques.

Sequoia (supercomputer)

IBM Sequoia was a petascale Blue Gene/Q supercomputer constructed by IBM for the National Nuclear Security Administration as part of the Advanced Simulation and Computing Program (ASC). It was delivered to the Lawrence Livermore National Laboratory (LLNL) in 2011 and was fully deployed in June 2012. Sequoia was dismantled in 2020, its last position on the top500.org list was #22 in the November 2019 list.

SpaceCube

SpaceCube is a family of high-performance reconfigurable systems designed for spaceflight applications requiring on-board processing. The SpaceCube was developed by engineers at the NASA Goddard Space Flight Center. The SpaceCube 1.0 system is based on Xilinx's Virtex-4 commercial FPGAs. The debut mission of the SpaceCube 1.0, Hubble Servicing Mission 4, was the first time Xilinx's Virtex-4 FPGAs flew in space.

The NetFPGA project is an effort to develop open-source hardware and software for rapid prototyping of computer network devices. The project targeted academic researchers, industry users, and students. It was not the first platform of its kind in the networking community. NetFPGA used an FPGA-based approach to prototyping networking devices. This allows users to develop designs that are able to process packets at line-rate, a capability generally unafforded by software based approaches. NetFPGA focused on supporting developers that can share and build on each other's projects and IP building blocks.

Virtex is the flagship family of FPGA products developed by Xilinx. Other current product lines include Kintex (mid-range) and Artix (low-cost), each including configurations and models optimized for different applications. In addition, Xilinx offers the Spartan low-cost series, which continues to be updated and is nearing production utilizing the same underlying architecture and process node as the larger 7-series devices.

Supercomputer architecture

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of massively parallel systems.

Stratix

Stratix is a family of FPGA products developed by Intel, Programmable Solutions Group. Other current product lines include e.g. Arria and Cyclone families.

In computing, a logic block or configurable logic block (CLB) is a fundamental building block of field-programmable gate array (FPGA) technology. Logic blocks can be configured by the engineer to provide reconfigurable logic gates.

QPACE2

QPACE 2 is a massively parallel and scalable supercomputer. It was designed for applications in lattice quantum chromodynamics but is also suitable for a wider range of applications..

References

  1. L. Biferale et al., Lattice Boltzmann fluid-dynamics on the QPACE supercomputer, Procedia Computer Science 1 (2010) 1075
  2. The Green500 list, November 2009, http://www.green500.org/lists/green200911
  3. The Green500 list, June 2010, http://www.green500.org/lists/green201006
  4. The Top500 list, November 2009, "Archived copy". Archived from the original on October 17, 2012. Retrieved January 17, 2013.CS1 maint: archived copy as title (link)
  5. The Top500 list, June 2010, "Archived copy". Archived from the original on October 17, 2012. Retrieved January 17, 2013.CS1 maint: archived copy as title (link)
  6. G. Bilardi et al., The Potential of On-Chip Multiprocessing for QCD Machines , Lecture Notes in Computer Science 3769 (2005) 386
  7. S. Williams et al., The Potential of the Cell Processor for Scientific Computing , Proceedings of the 3rd conference on Computing frontiers (2006) 9
  8. G. Goldrian et al., QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine , Computing in Science and Engineering 10 (2008) 46
  9. I. Ouda, K. Schleupen, Application Note: FPGA to IBM Power Processor Interface Setup, IBM Research report, 2008
  10. 1 2 3 H. Baier et al., QPACE - a QCD parallel computer based on Cell processors, Proceedings of Science (LAT2009), 001
  11. S. Solbrig, Synchronization in QPACE, STRONGnet Conference, Cyprus, 2010
  12. B. Michel et al., Aquasar: Der Weg zu optimal effizienten Rechenzentren [ permanent dead link ], 2011
  13. Qpace - کیوپیس