Active | operational 2024 |
---|---|
Sponsors | Swiss Confederation |
Operators | Swiss National Supercomputing Centre (CSCS) |
Location | Lugano-Cornadero, Switzerland |
Architecture | HPE Cray EX254n: Nvidia GH200 Grace Hopper with combinations of Grace 72 ARMv9-Neoverse-V2 CPUs and Hopper H100 Tensor Core GPUs (1'305'600 cores total) |
Power | 10 MW under full load |
Operating system | Linux |
Memory | 144 terabytes (TB) |
Speed | 270 PFLOPS (Rmax) |
Ranking | TOP500 : 6, June 2024 |
Website | cscs.ch |
Sources | "Nvidia GH200 Grace Hopper Superchip" |
The Alps supercomputer is a high-performance computer funded by the Swiss Confederation through the ETH Domain, with its main location in Lugano. It is part of the Swiss National Supercomputing Centre (CSCS), which provides computing services for selected scientific customers. [1]
The Swiss National Supercomputing Centre (CSCS) was founded in 1991. This center operates a user lab for computing services. Examples in the past include the analysis of data from the Large Hadron Collider (LHC) at CERN, data storage for the X-ray laser SwissFEL of the Paul Scherrer Institute, and simulations for weather forecasts by MeteoSwiss. [2] These computing services have been provided over time by increasingly powerful computing systems. Since 2020 and the commissioning of the high-performance computer HPE Cray EX, the name Alps has been used for the new computers. On September 14, 2024, the latest supercomputer AlpsHPE Cray EX254n was inaugurated. Even beforehand, the planned performance of Alps was described as being able to train the LLM GPT-3 from OpenAI in two days. [3] This supercomputer is based on Grace Hopper GH200 integrated circuits (ICs) from Nvidia [4] [5] and achieves a performance of 270 petaflops per second, which means 270 quadrillion operations per second. In 2024, it ranks 6th (TOP500 list) among the world's fastest computers, although the in-house computers of Meta, Microsoft, Alphabet Inc./Google LLC, and Oracle are likely more powerful, but their performance is not known. A panel of experts from various natural sciences decides who is allowed to use this new computer. The use by a research collaboration of EPFL and the Yale Institute for Global Health has already been approved. This research group uses an open-source AI model from Meta and trained it on Alps with health data from medical research. With Alps, scientists in Switzerland receive an infrastructure to exploit many possibilities of artificial intelligence (AI). The new supercomputer is used as part of the Swiss AI Initiative by the ETH Zurich and EPFL.
To suitably house and operate modern supercomputers, a new data center building and an adjacent office building were constructed in Lugano-Cornadero. The data center building consists of three floors. The lowest floor houses the basic infrastructure with primary power and water distribution as well as an emergency power supply via batteries. The cooling of the computers and the buildings in summer is done with lake water from Lake Lugano. From a depth of 45 meters, 460 liters of cold lake water per second are supplied to the data center via 2.8 km long pipes. There, it cools the internal cooling circuit of the computer via a heat exchanger. [6] The secondary distribution is done on the middle floor using power distribution units, which allow flexible installation of the computers above. The computers are located on the top floor. [7] The latest Alps highly-parallel supercomputer was delivered by Hewlett Packard Enterprise (HPE), which acquired the supercomputer-specialized company Cray as a subsidiary in 2019. It is installed on an area of 2000 m2. The total cost was about 100 million CHF.
In order to achieve superior performance, combinations of central processors (CPUs) with graphics processors (GPUs) as well as their associated memories (128 GB LPDDR-5X RAM; 96 GB HBM-3) [8] are placed in close proximity on the same monolithic integrated circuit provided by Nvidia. Arrays of 72 CPUs are called Grace and consist of ARMv9-Neoverse-V2 processors, which are RISC processors. The 132 GPUs are called Hopper H100 Tensor Core. [9] The combinations of said 72 CPUs together with 132 GPUs integrated on a VLSI chip are called GH200 Grace Hopper in memory of Grace Hopper. A total of 1'305'600 processor cores (CPUs and GPUs) are available on this Alps system. Data exchanges between the 2'688 nodes occur on an Ethernet-type network called Slingshot-11 at a rate of 200 Gbit/s. [10] [8] A single node is composed of four GH200, in a Quad GH200 configuration. Every Quad GH200 node acts as a single NUMA system, with 288 CPU cores and 4 GPUs. The Grace CPUs communicate through a cache-coherent interconnect, while the Hopper GPUs communicate through NVLink. [11]
A team from CSCS develops special software for different applications. The power consumption of the computer at full load is 10 MW. The electricity costs are estimated to be around 15 million CHF per year.
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2022, supercomputers have existed which can perform over 1018 FLOPS, so called exascale supercomputers. For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.
Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.
Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.
The Oak Ridge Leadership Computing Facility (OLCF), formerly the National Leadership Computing Facility, is a designated user facility operated by Oak Ridge National Laboratory and the Department of Energy. It contains several supercomputers, the largest of which is an HPE OLCF-5 named Frontier, which was ranked 1st on the TOP500 list of world's fastest supercomputers as of June 2023. It is located in Oak Ridge, Tennessee.
The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.
The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.
The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.
This list compares various amounts of computing power in instructions per second organized by order of magnitude in FLOPS.
The Swiss National Supercomputing Centre is the national high-performance computing centre of Switzerland. It was founded in Manno, canton Ticino, in 1991. In March 2012, the CSCS moved to its new location in Lugano-Cornaredo.
Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.
Titan or OLCF-3 was a supercomputer built by Cray at Oak Ridge National Laboratory for use in a variety of science projects. Titan was an upgrade of Jaguar, a previous supercomputer at Oak Ridge, that uses graphics processing units (GPUs) in addition to conventional central processing units (CPUs). Titan was the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it became available to researchers in early 2013. The initial cost of the upgrade was US$60 million, funded primarily by the United States Department of Energy.
XK7 is a supercomputing platform, produced by Cray, launched on October 29, 2012. XK7 is the second platform from Cray to use a combination of central processing units ("CPUs") and graphical processing units ("GPUs") for computing; the hybrid architecture requires a different approach to programming to that of CPU-only supercomputers. Laboratories that host XK7 machines host workshops to train researchers in the new programming languages needed for XK7 machines. The platform is used in Titan, the world's second fastest supercomputer in the November 2013 list as ranked by the TOP500 organization. Other customers include the Swiss National Supercomputing Centre which has a 272 node machine and Blue Waters has a machine that has Cray XE6 and XK7 nodes that performs at approximately 1 petaFLOPS (1015 floating-point operations per second).
The Cray XC30 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Xeon processors, with optional Nvidia Tesla or Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. Each liquid-cooled cabinet can contain up to 48 blades, each with eight CPU sockets, and uses 90 kW of power. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.
The Cray XC40 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Haswell Xeon processors, with optional Nvidia Tesla or Intel Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.
The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.
Piz Daint is a supercomputer in the Swiss National Supercomputing Centre, named after the mountain Piz Daint in the Swiss Alps.
Hewlett Packard Enterprise Frontier, or OLCF-5, is the world's first exascale supercomputer. It is hosted at the Oak Ridge Leadership Computing Facility (OLCF) in Tennessee, United States and became operational in 2022. As of December 2023, Frontier is the world's fastest supercomputer. It is based on the Cray EX and is the successor to Summit (OLCF-4). Frontier achieved an Rmax of 1.102 exaFLOPS, which is 1.102 quintillion floating-point operations per second, using AMD CPUs and GPUs.
Hopper is a graphics processing unit (GPU) microarchitecture developed by Nvidia. It is designed for datacenters and is used alongside the Lovelace microarchitecture. It is the latest generation of the line of products formerly branded as Nvidia Tesla, now Nvidia Data Centre GPUs.
LUMI is a petascale supercomputer located at the CSC data center in Kajaani, Finland. As of January 2023, the computer is the fastest supercomputer in Europe.
Selene is a supercomputer developed by Nvidia, capable of achieving 63.460 petaflops, ranking as the fifth fastest supercomputer in the world, when it entered the list. Selene is based on the Nvidia DGX system consisting of AMD CPUs, Nvidia A100 GPUs, and Mellanox HDDR networking. Selene is based on the Nvidia DGX Superpod, which is a high performance turnkey supercomputer solution provided by Nvidia using DGX hardware. DGX Superpod is a tightly integrated system that combines high performance DGX compute nodes with fast storage and high bandwidth networking. It aims to provide a turnkey solution to high-demand machine learning workloads. Selene was built in three months and is the fastest industrial system in the US while being the second-most energy-efficient supercomputing system ever.