Trinity (supercomputer)

Last updated
Trinity
Trinity (supercomputer).jpg
Operators National Nuclear Security Administration
Location Los Alamos National Laboratory
CostUS$174M [1]
PurposePrimarily utilized to perform milestone weapons calculations
Website lanl.gov/projects/trinity/

Trinity (or ATS-1) is a United States supercomputer built by the National Nuclear Security Administration (NNSA) for the Advanced Simulation and Computing Program (ASC). [2] The aim of the ASC program is to simulate, test, and maintain the United States nuclear stockpile.

Contents

History

Trinity technical specifications

Trinity High-Level Technical Specifications [10]
Operational Lifetime2015 to 2020
ArchitectureCray XC40
Memory Capacity2.07 PiB
Peak Performance41.5 PF/s
Number of Compute Nodes19,420
Parallel File System Capacity78 PB (69 PiB)
Burst Buffer Capacity3.7 PB
Footprint4606 sq ft
power requirement8.6 MW

Compute Tier

Trinity was built in 2 stages. The first stage incorporated the Intel Xeon Haswell processor while the second stage added a significant performance increase using the Intel Xeon Phi Knights Landing Processor. There are 301,952 Haswell and 678,912 Knights Landing processors in the combined system, yielding a total peak performance of over 40 PF/s (petaflops) [4]

Storage Tiers

There are 5 primary storage tiers; Memory, Burst Buffer, Parallel File System, Campaign Storage, and Archive. [11]

Memory

2 PiB of DDR4 DRAM provide physical memory for the machine. Each processor also has DRAM built on to the tile, providing additional memory capacity. The data in this tier is highly transient and is typically in residence for only a few seconds, being overwritten continuously. [11]

Burst Buffer

Cray supplies the three hundred XC40 Data Warp blades that each contain 2 Burst Buffer nodes and 4 SSD drives. There is a total of 3.78 PB of storage in this tier, capable of moving data at a rate of up to 2 TB/s. In this tier, data is typically resident for a few hours, with data being overwritten in approximately that same time frame. [11]

Parallel File System

Trinity uses a Sonexion based Lustre file system with a total capacity of 78 PB. Throughput on this tier is about 1.8 TB/s (1.6 TiB/s). It is used to stage data in preparation for HPC operations. Data residence in this tier is typically several weeks.

Campaign Storage

The MarFS Filesystem fits into the Campaign Storage tier and combines properties of POSIX and Object storage models. The capacity of this tier is growing at a rate of about 30 PB/year, with a current capacity of over 100 PB. In testing, LANL scientists were able to create 968 billion files in a single directory at a rate of 835 million file creations per second. This storage is designed to be more robust than typical object storage, while sacrificing some of the end user functionality that you would get from a POSIX system. Performance of this tier is between 100-300 GB/s of throughput. Data residence in this tier is longer term, typically lasting several months.

Key Design goals

  • Transparency
  • Data protection
  • Recoverability
  • Ease of administration

MarFS is an open source filesystem. [12]

Archive

The final layer of storage is the Archive. This is a HPSS tape file system that holds approximately 100 PB of data.

Infographic on Trinity's file storage system. Click to enlarge. Trinity Storage Tiers image.jpg
Infographic on Trinity's file storage system. Click to enlarge.

See also

Related Research Articles

Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.

<span class="mw-page-title-main">Advanced Simulation and Computing Program</span>

The Advanced Simulation and Computing Program (ASC) is a super-computing program run by the National Nuclear Security Administration, in order to simulate, test, and maintain the United States nuclear stockpile. The program was created in 1995 in order to support the Stockpile Stewardship Program. The goal of the initiative is to extend the lifetime of the current aging stockpile.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

GPFS is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It is used by many of the world's largest commercial companies, as well as some of the supercomputers on the Top 500 List. For example, it is the filesystem of the Summit at Oak Ridge National Laboratory which was the #1 fastest supercomputer in the world in the November 2019 Top 500 List. Summit is a 200 Petaflops system composed of more than 9,000 POWER9 processors and 27,000 NVIDIA Volta GPUs. The storage filesystem is called Alpine.

<span class="mw-page-title-main">Arctic Region Supercomputing Center</span> Former research facility in Fairbanks, Alaska

The Arctic Region Supercomputing Center (ARSC) was from 1993 to 2015 a research facility organized under the University of Alaska Fairbanks (UAF). Located on the UAF campus, ARSC offered high-performance computing (HPC) and mass storage to the UAF and State of Alaska research communities.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis & storage systems, software, research & development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">National Energy Research Scientific Computing Center</span> Supercomputer facility operated by the US Department of Energy in Berkeley, California

The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Science. As the mission computing center for the Office of Science, NERSC houses high performance computing and data systems used by 9,000 scientists at national laboratories and universities around the country. Research at NERSC is focused on fundamental and applied research in energy efficiency, storage, and generation; Earth systems science, and understanding of fundamental forces of nature and the universe. The largest research areas are in High Energy Physics, Materials Science, Chemical Sciences, Climate and Environmental Sciences, Nuclear Physics, and Fusion Energy research. NERSC's newest and largest supercomputer is Perlmutter, which debuted in 2021 ranked 5th on the TOP500 list of world's fastest supercomputers.

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

<span class="mw-page-title-main">Blue Waters</span> Supercomputer at the University of Illinois at Urbana-Champaign, United States

Blue Waters was a petascale supercomputer operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. On August 8, 2007, the National Science Board approved a resolution which authorized the National Science Foundation to fund "the acquisition and deployment of the world's most powerful leadership-class supercomputer." The NSF awarded $208 million for the Blue Waters project.

<span class="mw-page-title-main">Irish Centre for High-End Computing</span> National high-performance computing centre in Ireland

The Irish Centre for High-End Computing (ICHEC) is the national high-performance computing centre in Ireland. It was established in 2005 and provides supercomputing resources, support, training and related services. ICHEC is involved in education and training, including providing courses for researchers.

<span class="mw-page-title-main">Sequoia (supercomputer)</span> IBM supercomputer at Lawrence Livermore National Laboratory

IBM Sequoia was a petascale Blue Gene/Q supercomputer constructed by IBM for the National Nuclear Security Administration as part of the Advanced Simulation and Computing Program (ASC). It was delivered to the Lawrence Livermore National Laboratory (LLNL) in 2011 and was fully deployed in June 2012. Sequoia was dismantled in 2020, its last position on the top500.org list was #22 in the November 2019 list.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

<span class="mw-page-title-main">Jaguar (supercomputer)</span> Cray supercomputer at Oak Ridge National Laboratory

Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS. It had 224,256 x86-based AMD Opteron processor cores, and operated with a version of Linux called the Cray Linux Environment. Jaguar was a Cray XT5 system, a development from the Cray XT4 supercomputer.

BeeGFS is a parallel file system developed for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. It specializes in data throughput.

<span class="mw-page-title-main">Supercomputing in Europe</span> Overview of supercomputing in Europe

Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.

<span class="mw-page-title-main">Appro</span> American technology company

Appro was a developer of supercomputing supporting High Performance Computing (HPC) markets focused on medium- to large-scale deployments. Appro was based in Milpitas, California with a computing center in Houston, Texas, and a manufacturing and support subsidiary in South Korea and Japan.

<span class="mw-page-title-main">Cray XC40</span> Supercomputer manufactured by Cray

The Cray XC40 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Haswell Xeon processors, with optional Nvidia Tesla or Intel Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.

The Cray XC50 is a massively parallel multiprocessor supercomputer manufactured by Cray. The machine can support Intel Xeon processors, as well as Cavium ThunderX2 processors, Xeon Phi processors and NVIDIA Tesla P100 GPUs. The processors are connected by Cray's proprietary "Aries" interconnect, in a dragonfly network topology. The XC50 is an evolution of the XC40, with the main difference being the support of Tesla P100 processors and the use of Cray software release CLE 6 or 7.

<span class="mw-page-title-main">Aurora (supercomputer)</span> US DOE supercomputer by Intel and Cray

Aurora is an exascale supercomputer that was sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory. It has been the second fastest supercomputer in the world since 2023. It is expected that after optimizing its performance it will exceed 2 ExaFLOPS, making it the fastest computer ever.

<span class="mw-page-title-main">Leonardo (supercomputer)</span> Supercomputer in Italy

Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.

References

  1. 1 2 "Cray Awarded $174 Million Supercomputer Contract From the National Nuclear Security Administration" (Press release). July 10, 2014. Archived from the original on 2017-10-18. Retrieved 2014-08-24.
  2. Morgan, Timothy Prickett (1 October 2020). "With "Crossroads" Supercomputer, HPE Notches Another DOE Win". The Next Platform. Retrieved 5 November 2020.
  3. "Trinity / NERSC-8 RFP". Archived from the original on 2018-11-26. Retrieved 2018-11-26.
  4. 1 2 3 "Trinity Supercomputer's Haswell and KNL Partitions Are Merged". 19 July 2017.
  5. "Novermber [sic] 2015 | TOP500".
  6. "LANL Adds Capacity to Trinity Supercomputer for Stockpile Stewardship". 24 July 2017.
  7. "November 2016 | TOP500".
  8. "November 2018 | TOP500".
  9. "NNSA supercomputers recognized worldwide for speed and performance". Energy.gov. December 3, 2020. Retrieved 2023-11-13.
  10. "Technical Specifications".
  11. 1 2 3 Grider, Gary (2018). "Storage Lessons from HPC: A Multi-Decadal Struggle" (PDF). Storage Developer Conference. www.snia.org.
  12. "MarFS". github.com. Retrieved 2024-07-22.