Summit (supercomputer)

Last updated
Summit
Summit (supercomputer) logo 2017.svg
Sponsors United States Department of Energy
Operators IBM
Architecture9,216 POWER9 22-core CPUs
27,648 Nvidia Tesla V100 GPUs [1]
Power13 MW [2]
Operating system Red Hat Enterprise Linux (RHEL) [3] [4]
Storage250 PB
Speed200 petaFLOPS (peak)
Ranking TOP500 : 7 (1H2024)
PurposeScientific research
Website www.olcf.ornl.gov/olcf-resources/compute-systems/summit/
Summit components Summit (supercomputer).jpg
Summit components
POWER9 wafer with TOP500 certificates for Summit and Sierra POWER9TOP500Certificates.jpg
POWER9 wafer with TOP500 certificates for Summit and Sierra

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, United States of America. As of June 2024, it is the 9th fastest supercomputer in the world on the TOP500 list. It held the number 1 position on this list from November 2018 to June 2020. [5] [6] Its current[ when? ] LINPACK benchmark is clocked at 148.6 petaFLOPS. [7]

Contents

As of November 2019, the supercomputer had ranked as the 5th most energy efficient in the world with a measured power efficiency of 14.668 gigaFLOPS/watt. [8] Summit was the first supercomputer to reach exaflop (a quintillion operations per second) speed, on a non-standard metric, achieving 1.88 exaflops during a genomic analysis and is expected to reach 3.3 exaflops using mixed-precision calculations. [9]

History

The United States Department of Energy awarded a $325 million contract in November 2014 to IBM, Nvidia and Mellanox. The effort resulted in construction of Summit and Sierra. Summit is tasked with civilian scientific research and is located at the Oak Ridge National Laboratory in Tennessee. Sierra is designed for nuclear weapons simulations and is located at the Lawrence Livermore National Laboratory in California. [10]

Summit was estimated to cover 5,600 square feet (520 m2) [11] and require 219 kilometres (136 mi) of cabling. [12] Researchers will utilize Summit for diverse fields such as cosmology, medicine, and climatology. [13]

In 2015, the project called Collaboration of Oak Ridge, Argonne and Lawrence Livermore (CORAL) included a third supercomputer named Aurora and was planned for installation at Argonne National Laboratory. [14] By 2018, Aurora was re-engineered with completion anticipated in 2021 as an exascale computing project along with Frontier and El Capitan to be completed shortly thereafter. [15] Aurora was completed in late 2022. [16]

Uses

The Summit supercomputer may be used to research energy, artificial intelligence, human health, and other research areas. [17] It has been used in earthquake simulation, extreme weather simulation, materials science, genomics, and predicting the lifetime of neutrinos. [18]

Design

Each of its 4,608 nodes consist of 2 IBM POWER9 CPUs, 6 Nvidia Tesla GPUs, [19] with over 600 GB of coherent memory (96 GB HBM2 plus 512 GB DDR4) which is addressable by all CPUs and GPUs, plus 800 GB of non-volatile RAM that can be used as a burst buffer or as extended memory. [20] The POWER9 CPUs and Nvidia Volta GPUs are connected using Nvidia's high speed NVLink. This allows for a heterogeneous computing model. [21]

To provide a high rate of data throughput, the nodes are connected in a non-blocking fat-tree topology using a dual-rail Mellanox EDR InfiniBand interconnect for both storage and inter-process communications traffic, which delivers both 200 Gbit/s bandwidth between nodes and in-network computing acceleration for communications frameworks such as MPI and SHMEM/PGAS.

The storage for Summit [22] has a fast an in-system layer and a center-wide parallel filesystem layer. The in-system layer is optimized for fast storage with SSDs on each node, while the center-wide parallel file system provides easy to access data stored on hard drives. The two layers work together seamlessly so users do not have to differentiate their storage needs. The center-wide parallel file system is GPFS (IBM Storage Scale). It provides 250PB of storage. The cluster delivers 2.5 TB/s of single stream read peak throughput and 1 TB/s of 1M file throughput. It was one of the first supercomputers that also required extremely fast metadata performance to support AI/ML workloads exemplified by the 2.6M 32k file creates per second it delivers.

See also

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

Floating point operations per second is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations.

<span class="mw-page-title-main">MareNostrum</span> Supercomputer in the Barcelona Supercomputing Center

MareNostrum is the main supercomputer in the Barcelona Supercomputing Center. It is the most powerful supercomputer in Spain, one of thirteen supercomputers in the Spanish Supercomputing Network and one of the seven supercomputers of the European infrastructure PRACE.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 Top 500 List. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem is called Alpine.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis & storage systems, software, research & development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

<span class="mw-page-title-main">Jaguar (supercomputer)</span> Cray supercomputer at Oak Ridge National Laboratory

Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS. It had 224,256 x86-based AMD Opteron processor cores, and operated with a version of Linux called the Cray Linux Environment. Jaguar was a Cray XT5 system, a development from the Cray XT4 supercomputer.

Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.

This list compares various amounts of computing power in instructions per second organized by order of magnitude in FLOPS.

<span class="mw-page-title-main">Titan (supercomputer)</span> American supercomputer

Titan or OLCF-3 was a supercomputer built by Cray at Oak Ridge National Laboratory for use in a variety of science projects. Titan was an upgrade of Jaguar, a previous supercomputer at Oak Ridge, that uses graphics processing units (GPUs) in addition to conventional central processing units (CPUs). Titan was the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it became available to researchers in early 2013. The initial cost of the upgrade was US$60 million, funded primarily by the United States Department of Energy.

XK7 is a supercomputing platform, produced by Cray, launched on October 29, 2012. XK7 is the second platform from Cray to use a combination of central processing units ("CPUs") and graphical processing units ("GPUs") for computing; the hybrid architecture requires a different approach to programming to that of CPU-only supercomputers. Laboratories that host XK7 machines host workshops to train researchers in the new programming languages needed for XK7 machines. The platform is used in Titan, the world's second fastest supercomputer in the November 2013 list as ranked by the TOP500 organization. Other customers include the Swiss National Supercomputing Centre which has a 272 node machine and Blue Waters has a machine that has Cray XE6 and XK7 nodes that performs at approximately 1 petaFLOPS (1015 floating-point operations per second).

<span class="mw-page-title-main">POWER9</span> 2017 family of multi-core microprocessors by IBM

POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members.

<span class="mw-page-title-main">Nvidia DGX</span> Line of Nvidia produced servers and workstations

The Nvidia DGX represents a series of servers and workstations designed by Nvidia, primarily geared towards enhancing deep learning applications through the use of general-purpose computing on graphics processing units (GPGPU). These systems typically come in a rackmount format featuring high-performance x86 server CPUs on the motherboard.

<span class="mw-page-title-main">Sierra (supercomputer)</span> Supercomputer developed by IBM

Sierra or ATS-2 is a supercomputer built for the Lawrence Livermore National Laboratory for use by the National Nuclear Security Administration as the second Advanced Technology System. It is primarily used for predictive applications in nuclear weapon stockpile stewardship, helping to assure the safety, reliability, and effectiveness of the United States' nuclear weapons.

<span class="mw-page-title-main">Frontier (supercomputer)</span> American supercomputer

Hewlett Packard Enterprise Frontier, or OLCF-5, is the world's first exascale supercomputer. It is hosted at the Oak Ridge Leadership Computing Facility (OLCF) in Tennessee, United States and became operational in 2022. As of December 2023, Frontier is the world's fastest supercomputer. It is based on the Cray EX and is the successor to Summit (OLCF-4). Frontier achieved an Rmax of 1.102 exaFLOPS, which is 1.102 quintillion floating-point operations per second, using AMD CPUs and GPUs.

<span class="mw-page-title-main">Fugaku (supercomputer)</span> Japanese supercomputer

Fugaku(Japanese: 富岳) is a petascale supercomputer at the Riken Center for Computational Science in Kobe, Japan. It started development in 2014 as the successor to the K computer and made its debut in 2020. It is named after an alternative name for Mount Fuji.

<span class="mw-page-title-main">Leonardo (supercomputer)</span> Supercomputer in Italy

Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.

<span class="mw-page-title-main">Taiwania 3</span> Supercomputer of Taiwan

Taiwania 3 is one of the supercomputers made by Taiwan, and also the newest one. It is placed in the National Center for High-performance Computing of NARLabs. There are 50,400 cores in total with 900 nodes, using Intel Xeon Platinum 8280 2.4 GHz CPU and using CentOS as Operating System. It is an open access for public supercomputer. It is currently open access to scientists and more to do specific research after getting permission from Taiwan's National Center for High-performance Computing. This is the third supercomputer of the Taiwania series. It uses CentOS x86_64 7.8 as its system operator and Slurm Workload Manager as workflow manager to ensure better performance. Taiwania 3 uses InfiniBand HDR100 100 Gbit/s high speed Internet connection to ensure better performance of the supercomputer. The main memory capability is 192 GB. There's currently two Intel Xeon Platinum 8280 2.4 GHz CPU inside each node. The full calculation capability is 2.7PFLOPS. It is launched into operation in November 2020 before schedule due to the needed for COVID-19. It is currently ranked number 227 on Top 500 list of June, 2021 and number 80 on Green 500 list. It is manufactured by Quanta Computer, Taiwan Fixed Network, and ASUS Cloud.

References

  1. "ORNL Launches Summit Supercomputer".
  2. Liu, Zhiye (26 June 2018). "US Dethrones China With IBM Summit Supercomputer". Tom's Hardware. Retrieved 19 July 2018.
  3. Kerner, Sean Michael (8 June 2018). "IBM Unveils Summit, the World's Fastest Supercomputer (For Now)". Server Watch. Retrieved 24 February 2020.
  4. Nestor, Marius (11 June 2018). "Meet IBM Summit, World's Fastest and Smartest Supercomputer Powered by Linux". Softpedia News. Retrieved 24 February 2020.
  5. Lohr, Steve (8 June 2018). "Move Over, China: U.S. Is Again Home to World's Speediest Supercomputer". The New York Times. Retrieved 19 July 2018.
  6. "Top 500 List - November 2022". TOP500. November 2022. Retrieved 13 April 2022.
  7. "November 2022 | TOP500 Supercomputer Sites". TOP500. Retrieved 13 April 2022.
  8. "Green500 List - November 2019". TOP500. Retrieved 7 April 2020.
  9. Holt, Kris (8 June 2018). "The US again has the world's most powerful supercomputer". Engadget. Retrieved 20 July 2018.
  10. Shankland, Steven (14 September 2015). "IBM, NVIDIA land $325M supercomputer deal". C|Net. Retrieved 29 December 2015.
  11. https://www.olcf.ornl.gov/wp-content/uploads/2018/06/Summit_bythenumbers_FIN-1.pdf [ bare URL PDF ]
  12. Alcorn, Paul (20 November 2017). "Regaining America's Supercomputing Supremacy With The Summit Supercomputer". Tom's Hardware. Retrieved 20 November 2017.
  13. Noyes, Katherine (16 March 2015). "IBM, NVIDIA rev HPC engines in next-gen supercomputer push". PC World. Retrieved 29 December 2015.
  14. R. Johnson, Colin (15 April 2015). "IBM vs. Intel in Supercomputer Bout". EE Times. Retrieved 29 December 2015.
  15. Morgan, Timothy Prickett (9 April 2018). "Bidders Off And Running After $1.8 Billion DOE Exascale Super Deals". The Next Platform. Retrieved 20 July 2018.
  16. Hemsoth, Nicole (2021-09-23). "A Status Check on Global Exascale Ambitions". The Next Platform. Retrieved 2021-10-15.
  17. "Introducing Summit" . Retrieved 24 December 2019.
  18. "Summit Supercomputer is Already Making its Mark on Science". 20 September 2018. Retrieved 5 August 2020.
  19. "The most powerful computers on the planet - Summit and Sierra". IBM. 6 June 2018. Retrieved 4 April 2019.
  20. Lilly, Paul (January 25, 2017). "NVIDIA 12nm FinFET Volta GPU Architecture Reportedly Replacing Pascal In 2017". HotHardware .
  21. "Summit and Sierra Supercomputers: An Inside Look at the U.S. Department of Energy's New Pre-Exascale Systems" (PDF). November 1, 2014.
  22. Oral, Sarp; Vazhkudai, Sudharshan; Wang, Feiyi; Zimmer, Christopher; Brumgard, Christopher; Hanley, Jesse; Markomanolis, George; Miller, Ross; Leverman, Dustin B. (2019-11-01). End-to-end I/O portfolio for the summit supercomputing ecosystem (Report). Oak Ridge National Laboratory (ORNL), Oak Ridge, TN (United States). OSTI   1619016.
Records
Preceded by
Sunway TaihuLight
93.01 petaFLOPS
World's most powerful supercomputer
June 2018 - June 2020
148.6 petaFLOPS
Succeeded by
RIKEN Fugaku
0.54 exaFLOPS