El Capitan (supercomputer)

Last updated

El Capitan
Active
  • Deployment: 2H 2023
  • Completion: 2024
Sponsors U.S. Department of Energy
Operators Lawrence Livermore National Laboratory and U.S. Department of Energy
Location Livermore Computing Complex
ArchitectureHPE Cray Shasta
Power40 MW (Proj)
SpaceTBA
MemoryTBA
StorageTBA
Speed2  exaFLOPS (Rmax) (Proj)
CostUS$600 million (estimated cost)
PurposeScientific research and development, stockpile stewardship [1]

Hewlett Packard Enterprise El Capitan, is an upcoming exascale supercomputer, hosted at the Lawrence Livermore National Laboratory in Livermore, United States and projected to become operational in 2024. It is based on the Cray EX Shasta architecture. When deployed, El Capitan is projected to displace Frontier as the world's fastest supercomputer.

Contents

Design

El Capitan will use an unknown number of AMD Instinct MI300A accelerated computing units (APUs). [2] The MI300A consists of 24 AMD Zen AMD64-based CPU cores, and a CDNA 3-based GPU integrated onto a single organic package, along with 128GB of HBM3 RAM. [3]

The floor space and number of racks for El Capitan have not yet been announced.

Blades are interconnected by an HPE Slingshot 64-port switch that provides 12.8 terabits/second of bandwidth. Groups of blades are linked in a dragonfly topology with at most three hops between any two nodes. Cabling is either optical or copper, customized to minimize cable length. Total cabling runs 145 km (90 mi).

El Capitan uses an APU architecture, where the CPU and GPU share an internal on-chip coherent interconnect.

History

El Capitan was ordered as a part of the Department of Energy's CORAL-2 initiative, intended to replace Sierra (supercomputer), an IBM/NVIDIA machine deployed in 2018. The original design envisioned hundreds of thousands of GPUs and 40 MW of power.[ citation needed ] LLNL partnered with HPE Cray and AMD to build the system. [4]

Three El Capitan prototypes – named rzVernal, Tioga, and Tenaya – themselves were powerful enough to be listed on the TOP200 supercomputer list in June, 2023. [5] rzVernal reached 4.1 petaflops. [6] In early July, the first components of El Capitan were installed at Lawrence Livermore, with complete installation expected by mid 2024. [7]

Related Research Articles

<span class="mw-page-title-main">AMD</span> American semiconductor company

Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets.

In computing, floating point operations per second is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second.

Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.

<span class="mw-page-title-main">MareNostrum</span> Supercomputer in the Barcelona Supercomputing Center

MareNostrum is the main supercomputer in the Barcelona Supercomputing Center. It is the most powerful supercomputer in Spain, one of thirteen supercomputers in the Spanish Supercomputing Network and one of the seven supercomputers of the European infrastructure PRACE.

<span class="mw-page-title-main">National Energy Research Scientific Computing Center</span> Supercomputer facility operated by the US Department of Energy in Berkeley, California

The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Science. As the mission computing center for the Office of Science, NERSC houses high performance computing and data systems used by 9,000 scientists at national laboratories and universities around the country. Research at NERSC is focused on fundamental and applied research in energy efficiency, storage, and generation; Earth systems science, and understanding of fundamental forces of nature and the universe. The largest research areas are in High Energy Physics, Materials Science, Chemical Sciences, Climate and Environmental Sciences, Nuclear Physics, and Fusion Energy research. NERSC's newest and largest supercomputer is Perlmutter, which debuted in 2021 ranked 5th on the TOP500 list of world's fastest supercomputers.

<span class="mw-page-title-main">AMD APU</span> Marketing term by AMD

AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit (CPU) and 3D integrated graphics processing unit (IGPU) on a single die.

The Oak Ridge Leadership Computing Facility (OLCF), formerly the National Leadership Computing Facility, is a designated user facility operated by Oak Ridge National Laboratory and the Department of Energy. It contains several supercomputers, the largest of which is an HPE OLCF-5 named Frontier, which was ranked 1st on the TOP500 list of world's fastest supercomputers as of June 2023. It is located in Oak Ridge, Tennessee.

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precision floating-point format.

<span class="mw-page-title-main">Jaguar (supercomputer)</span> Japans next fastest Intel x86 based supercomputer

Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS. It had 224,256 x86-based AMD Opteron processor cores, and operated with a version of Linux called the Cray Linux Environment. Jaguar was a Cray XT5 system, a development from the Cray XT4 supercomputer.

Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second (exaFLOPS)"; it is a measure of supercomputer performance.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

<span class="mw-page-title-main">Titan (supercomputer)</span> American supercomputer

Titan or OLCF-3 was a supercomputer built by Cray at Oak Ridge National Laboratory for use in a variety of science projects. Titan was an upgrade of Jaguar, a previous supercomputer at Oak Ridge, that uses graphics processing units (GPUs) in addition to conventional central processing units (CPUs). Titan was the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it became available to researchers in early 2013. The initial cost of the upgrade was US$60 million, funded primarily by the United States Department of Energy.

XK7 is a supercomputing platform, produced by Cray, launched on October 29, 2012. XK7 is the second platform from Cray to use a combination of central processing units ("CPUs") and graphical processing units ("GPUs") for computing; the hybrid architecture requires a different approach to programming to that of CPU-only supercomputers. Laboratories that host XK7 machines host workshops to train researchers in the new programming languages needed for XK7 machines. The platform is used in Titan, the world's second fastest supercomputer in the November 2013 list as ranked by the TOP500 organization. Other customers include the Swiss National Supercomputing Centre which has a 272 node machine and Blue Waters has a machine that has Cray XE6 and XK7 nodes that performs at approximately 1 petaFLOPS (1015 floating-point operations per second).

<span class="mw-page-title-main">Summit (supercomputer)</span> Supercomputer developed by IBM

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, capable of 200 petaFLOPS thus making it the 5th fastest supercomputer in the world after Frontier (OLCF-5), Fugaku, LUMI, and Leonardo, with Frontier being the fastest. It held the number 1 position from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.

<span class="mw-page-title-main">AMD Instinct</span> Brand name by AMD; professional GPUs for high-performance-computing, machine learning

AMD Instinct is AMD's brand of professional GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

<span class="mw-page-title-main">Frontier (supercomputer)</span> American supercomputer

Hewlett Packard Enterprise Frontier, or OLCF-5, is the world's first and fastest exascale supercomputer, hosted at the Oak Ridge Leadership Computing Facility (OLCF) in Tennessee, United States and first operational in 2022. It is based on the Cray EX and is the successor to Summit (OLCF-4). As of December 2023, Frontier is the world's fastest supercomputer. Frontier achieved an Rmax of 1.102 exaFLOPS, which is 1.102 quintillion operations per second, using AMD CPUs and GPUs.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), OpenCL.

<span class="mw-page-title-main">Aurora (supercomputer)</span> Planned supercomputer

Aurora is a planned supercomputer, originally contracted to be completed by 2018 but through a series of delays at the prime contractor, Intel Corporation, now planned to be completed in 2023. It was originally planned be the worlds’ fastest supercomputer with over 2 exaflops, however a series of delays have cast that into doubt. It is sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory.

<span class="mw-page-title-main">LUMI</span> Supercomputer in Finland

LUMI is a petascale supercomputer located at the CSC data center in Kajaani, Finland. As of January 2023, the computer is the fastest supercomputer in Europe.

References

  1. "Fiscal Year 2023 Stockpile Stewardship and Management Plan – Biennial Plan Summary Report to Congress" (PDF). United States Department of Energy. pp. 3–17. Retrieved May 27, 2023.
  2. Shilov, Anton (July 6, 2023). "el-capitan-installation-begins-first-apu-exascale-system-shaping-up-for-2024". Anandtech. Top500.org. Retrieved July 15, 2023.
  3. Smith, Ryan (January 25, 2023). "ces-2023-amd-instinct-mi300-data-center-apu-silicon-in-hand-146b-transistors-shipping-h223". Anandtech. Retrieved February 13, 2023.
  4. Trader, Tiffany (August 13, 2019). "Cray Wins NNSA-Livermore "El Capitan" Exascale Contract". hpcwire.com. Retrieved February 13, 2023.
  5. "June 2023 list". TOP500.org. Retrieved October 10, 2023.
  6. Aaron Klotz (June 3, 2022). "Trio of Prototype AMD-Based El Capitan Supercomputers Already Rank in Top 200". Tom's Hardware.
  7. Anton Shilov (July 6, 2023). "El Capitan Installation Begins: First APU-based Exascale System Shaping Up For 2024". Anandtech.