QCDOC

Last updated

The QCDOC (quantum chromodynamics on a chip) is a supercomputer technology focusing on using relatively cheap low power processing elements to produce a massively parallel machine. The machine is custom-made to solve small but extremely demanding problems in the fields of quantum physics.

Contents

Overview

The computers were designed and built jointly by University of Edinburgh (UKQCD), Columbia University, the RIKEN BNL Brookhaven Research Center and IBM. [1] The purpose of the collaboration was to exploit computing facilities for lattice field theory calculations whose primary aim is to increase the predictive power of the Standard Model of elementary particle interactions through numerical simulation of quantum chromodynamics (QCD). The target was to build a massively parallel supercomputer able to peak at 10 Tflops with sustained power at 50% capacity.

There are three QCDOCs in service each reaching 10 Tflops peak operation.

Around 23 UK academic staff, their postdocs and students, from seven universities, belong to UKQCD. Costs were funded through a Joint Infrastructure Fund Award of £6.6 million. Staff costs (system support, physicist programmers and postdocs) are around £1 million per year, other computing and operating costs are around £0.2 million per year. [2]

QCDOC was to replace an earlier design, QCDSP, where the power came from connecting large amounts of DSPs together in a similar fashion. The QCDSP strapped 12.288 nodes to a 4D network and reached 1 Tflops in 1998.

QCDOC can be seen as a predecessor to the highly successful Blue Gene/L supercomputer. They share a lot of design traits, and similarities go beyond superficial characteristics. Blue Gene is also a massively parallel supercomputer built with a large amount of cheap, relatively weak PowerPC 440 based SoC nodes connected with a high bandwidth multidimensional mesh. They differ, however, in that the computing nodes in BG/L are more powerful and are connected with a faster, more sophisticated network that scales up to several hundred thousand nodes per system.

Architecture

Logic schematic of the QCDOC ASIC QCDOC chip schema.png
Logic schematic of the QCDOC ASIC

Computing node

The computing nodes are custom ASICs with about fifty million transistors each. They are mainly made up of existing building blocks from IBM. They are built around a 500 MHz PowerPC 440 core with 4 MB DRAM, memory management for external DDR SDRAM, system I/O for internode communications, and dual Ethernet built in. The computing node is capable of 1 double precision Gflops. Each node has one DIMM socket capable of holding between 128 and 2048 MB of 333 MHz ECC DDR SDRAM.

Inter node communication

Each node has the capability to send and receive data from each of its twelve nearest neighbors in a six-dimensional mesh at a rate of 500 Mbit/s each. This provides a total off-node bandwidth of 12 Gbit/s. Each of these 24 channels has DMA to the other nodes' on-chip DRAM or the external SDRAM. In practice only four dimensions will be used to form a communications sub-torus where the remaining two dimensions will be used to partition the system.

The operating system communicates with the computing nodes using the Ethernet network. This is also used for diagnostics, configuration and communications with disk storage.

Mechanical design

Two nodes are placed together on a daughter card with one DIMM socket and a 4:1 Ethernet hub for off-card communications. The daughter cards have two connectors, one carrying the internode communications network and one carrying power, Ethernet, clock and other house keeping facilities.

Thirty-two daughter cards are placed in two rows on a motherboard that supports 800 Mbit/s off-board Ethernet communications. Eight motherboards are placed in crates with two backplanes supporting four motherboards each. Each crate consists of 512 processor nodes a and a 26 hypercube communications network. One node consumes about 5 W of power, and each crate is air and water cooled. A complete system can consist of any number of crates, for a total of up to several tens of thousands of nodes.

Operating system

The QCDOC runs a custom-built operating system, QOS, which facilitates boot, runtime, monitoring, diagnostics, and performance and simplifies management of the large number of computing nodes. It uses a custom embedded kernel and provides single process POSIX ("unix-like") compatibility using the Cygnus newlib library. The kernel includes a specially written UDP/IP stack and NFS client for disk access.

The operating system also maintains system partitions so several users can have access to separate parts of the system for different applications. Each partition will only run one client application at any given time. Any multitasking is scheduled by the host controller system which is a regular computer using a large amounts of Ethernet ports connecting to the QCDOC.

See also

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there have existed supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

<span class="mw-page-title-main">IBM Blue Gene</span> Series of supercomputers by IBM

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption.

<span class="mw-page-title-main">Quadrics (company)</span>

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

<span class="mw-page-title-main">ASCI Red</span> Supercomputer

ASCI Red was the first computer built under the Accelerated Strategic Computing Initiative (ASCI), the supercomputing initiative of the United States government created to help the maintenance of the United States nuclear arsenal after the 1992 moratorium on nuclear testing.

The PowerPC 400 family is a line of 32-bit embedded RISC processor cores based on the PowerPC or Power ISA instruction set architectures. The cores are designed to fit inside specialized applications ranging from system-on-a-chip (SoC) microcontrollers, network appliances, application-specific integrated circuits (ASICs) and field-programmable gate arrays (FPGAs) to set-top boxes, storage devices and supercomputers.

<span class="mw-page-title-main">Edinburgh Parallel Computing Centre</span> Supercomputing centre at the University of Edinburgh

EPCC, formerly the Edinburgh Parallel Computing Centre, is a supercomputing centre based at the University of Edinburgh. Since its foundation in 1990, its stated mission has been to accelerate the effective exploitation of novel computing throughout industry, academia and commerce.

<span class="mw-page-title-main">IBM BladeCenter</span> Blade server architecture by IBM

The IBM BladeCenter was IBM's blade server architecture, until it was replaced by Flex System in 2012. The x86 division was later sold to Lenovo in 2014.

<span class="mw-page-title-main">Norman Christ</span> American physicist

Norman Howard Christ is a physicist and a professor at Columbia University, where he holds the Ephraim Gildor Professorship of Computational Theoretical Physics. He is notable for his research in Lattice QCD.

Red Storm is a supercomputer architecture designed for the US Department of Energy’s National Nuclear Security Administration Advanced Simulation and Computing Program. Cray, Inc developed it based on the contracted architectural specifications provided by Sandia National Laboratories. The architecture was later commercially produced as the Cray XT3.

<span class="mw-page-title-main">Cray XT3</span> Distributed memory massively parallel MIMD supercomputer

The Cray XT3 is a distributed memory massively parallel MIMD supercomputer designed by Cray Inc. with Sandia National Laboratories under the codename Red Storm. Cray turned the design into a commercial product in 2004. The XT3 derives much of its architecture from the previous Cray T3E system, and also from the Intel ASCI Red supercomputer.

The Cray CX1 is a deskside high-performance workstation designed by Cray Inc., based on the x86-64 processor architecture. It was launched on September 16, 2008, and was discontinued in early 2012. It comprises a single chassis blade server design that supports a maximum of eight modular single-width blades, giving up to 96 processor cores. Computational load can be run independently on each blade and/or combined using clustering techniques.

<span class="mw-page-title-main">PERCS</span>

PERCS is IBM's answer to DARPA's High Productivity Computing Systems (HPCS) initiative. The program resulted in commercial development and deployment of the Power 775, a supercomputer design with extremely high performance ratios in fabric and memory bandwidth, as well as very high performance density and power efficiency.

QPACE is a massively parallel and scalable supercomputer designed for applications in lattice quantum chromodynamics.

<span class="mw-page-title-main">SciNet Consortium</span> Scientific research group between the University of Toronto and local hospitals

SciNet is a consortium of the University of Toronto and affiliated Ontario hospitals. It has received funding from both the federal and provincial government, Faculties at the University of Toronto, and affiliated hospitals.

New York Blue Gene supercomputer, also known as NewYorkBlue, is an 18 rack Blue Gene/L and a 2 rack Blue Gene/P massively parallel supercomputer based on the IBM system-on-chip technology. It is located in the New York Center for Computational Sciences (NYCCS). The supercomputer is owned by Stony Brook University and is located at Brookhaven National Laboratory in Upton, Long Island, New York. The funds for this machine were provided by the New York state, with the leadership of the NYS Assembly. It began operating on July 15, 2007, when it was the fifth most powerful supercomputer. The renovation of laboratory space was supported by the New York state and U.S. DOE fund. As of June 2010, the Blue Gene/L was ranked 67th in the Top 500 supercomputing rankings. Together with the Computational Center for Nanotechnology Innovations at Rensselaer Polytechnic Institute, NewYorkBlue provides New York state with more computing power available for general research than any state in the nation.

<span class="mw-page-title-main">K computer</span> Supercomputer in Kobe, Japan

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (1016) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.

<span class="mw-page-title-main">Supercomputing in Japan</span> Overview of supercomputing in Japan

Japan operates a number of centers for supercomputing which hold world records in speed, with the K computer becoming the world's fastest in June 2011. and Fugaku took the lead in June 2020, and furthered it, as of November 2020, to 3 times faster than number two computer.

iDataCool is a high-performance computer cluster based on a modified IBM System x iDataPlex. The cluster serves as a research platform for cooling of IT equipment with hot water and efficient reuse of the waste heat. The project is carried out by the physics department of the University of Regensburg in collaboration with the IBM Research and Development Laboratory Böblingen and InvenSor. It is funded by the German Research Foundation (DFG), the German state of Bavaria, and IBM.

DOME is a Dutch government-funded project between IBM and ASTRON in form of a public-private-partnership focussing on the Square Kilometre Array (SKA), the world's largest planned radio telescope. SKA will be built in Australia and South Africa. The DOME project objective is technology roadmap development that applies both to SKA and IBM. The 5-year project was started in 2012 and is co-funded by the Dutch government and IBM Research in Zürich, Switzerland and ASTRON in the Netherlands. The project ended officially on 30 September 2017.

<span class="mw-page-title-main">QPACE2</span> Massively parallel and scalable supercomputer

QPACE 2 is a massively parallel and scalable supercomputer. It was designed for applications in lattice quantum chromodynamics but is also suitable for a wider range of applications..

References

  1. RIKEN BNL Research Center Dedicates New Supercomputer for Physics Research
  2. "Home - Science and Technology Facilities Council".