Trinity (supercomputer)

Last updated
Trinity
Trinity (supercomputer).jpg
Operators National Nuclear Security Administration
Location Los Alamos National Laboratory
CostUS$174M [1]
PurposePrimarily utilized to perform milestone weapons calculations
Website lanl.gov/projects/trinity/

Trinity (or ATS-1) is a United States supercomputer built by the National Nuclear Security Administration (NNSA) for the Advanced Simulation and Computing Program (ASC). [2] The aim of the ASC program is to simulate, test, and maintain the United States nuclear stockpile.

Contents

History

Trinity technical specifications

Trinity High-Level Technical Specifications [12]
Operational Lifetime2015 to 2020
ArchitectureCray XC40
Memory Capacity2.07 PiB
Peak Performance41.5 PF/s
Number of Compute Nodes19,420
Parallel File System Capacity78 PB (69 PiB)
Burst Buffer Capacity3.7 PB
Footprint4606 sq ft
power requirement8.6 MW

Compute Tier

Trinity was built in 2 stages. The first stage incorporated the Intel Xeon Haswell processor while the second stage added a significant performance increase using the Intel Xeon Phi Knights Landing Processor. There are 301,952 Haswell and 678,912 Knights Landing processors in the combined system, yielding a total peak performance of over 40 PF/s (petaflops) [13]

Storage Tiers

There are 5 primary storage tiers; Memory, Burst Buffer, Parallel File System, Campaign Storage, and Archive. [14]

Memory

2 PiB of DDR4 DRAM provide physical memory for the machine. Each processor also has DRAM built on to the tile, providing additional memory capacity. The data in this tier is highly transient and is typically in residence for only a few seconds, being overwritten continuously. [15]

Burst Buffer

Cray supplies the three hundred XC40 Data Warp blades that each contain 2 Burst Buffer nodes and 4 SSD drives. There is a total of 3.78 PB of storage in this tier, capable of moving data at a rate of up to 2 TB/s. In this tier, data is typically resident for a few hours, with data being overwritten in approximately that same time frame. [16]

Parallel File System

Trinity uses a Sonexion based Lustre file system with a total capacity of 78 PB. Throughput on this tier is about 1.8 TB/s (1.6 TiB/s). It is used to stage data in preparation for HPC operations. Data residence in this tier is typically several weeks.

Campaign Storage

The MarFS Filesystem fits into the Campaign Storage tier and combines properties of POSIX and Object storage models. The capacity of this tier is growing at a rate of about 30 PB/year, with a current capacity of over 100 PB. In testing, LANL scientists were able to create 968 billion files in a single directory at a rate of 835 million file creations per second. This storage is designed to be more robust than typical object storage, while sacrificing some of the end user functionality that you would get from a POSIX system. Performance of this tier is between 100-300 GB/s of throughput. Data residence in this tier is longer term, typically lasting several months.

Key Design goals

  • Transparency
  • Data protection
  • Recoverability
  • Ease of administration

MarFS is an open source filesystem and can be downloaded here: https://github.com/mar-file-system/marfs

Archive

The final layer of storage is the Archive. This is a HPSS tape file system that holds approximately 100 PB of data.

Infographic on Trinity's file storage system. Click to enlarge. Trinity Storage Tiers image.jpg
Infographic on Trinity's file storage system. Click to enlarge.

See also

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed in the TOP500, which ranks the most powerful supercomputers in the world.

The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that is based on comprehensive advanced computing resources and supports services to researchers in Texas and across the U.S. The mission of TACC is to enable discoveries that advance science and society through the application of advanced computing technologies. Specializing in high-performance computing, scientific visualization, data analysis & storage systems, software, research & development, and portal interfaces, TACC deploys and operates advanced computational infrastructure to enable the research activities of faculty, staff, and students of UT Austin. TACC also provides consulting, technical documentation, and training to support researchers who use these resources. TACC staff members conduct research and development in applications and algorithms, computing systems design/architecture, and programming tools and environments.

<span class="mw-page-title-main">National Energy Research Scientific Computing Center</span> Supercomputer facility operated by the US Department of Energy in Berkeley, California

The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Science. As the mission computing center for the Office of Science, NERSC houses high performance computing and data systems used by 9,000 scientists at national laboratories and universities around the country. Research at NERSC is focused on fundamental and applied research in energy efficiency, storage, and generation; Earth systems science, and understanding of fundamental forces of nature and the universe. The largest research areas are in High Energy Physics, Materials Science, Chemical Sciences, Climate and Environmental Sciences, Nuclear Physics, and Fusion Energy research. NERSC's newest and largest supercomputer is Perlmutter, which debuted in 2021 ranked 5th on the TOP500 list of world's fastest supercomputers.

<span class="mw-page-title-main">TOP500</span> Database project devoted to the ranking of computers

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

<span class="mw-page-title-main">Irish Centre for High-End Computing</span> National high-performance computing centre in Ireland

The Irish Centre for High-End Computing (ICHEC) is the national high-performance computing centre in Ireland. It was established in 2005 and provides supercomputing resources, support, training and related services. ICHEC is involved in education and training, including providing courses for researchers.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

<span class="mw-page-title-main">Jaguar (supercomputer)</span> Cray supercomputer at Oak Ridge National Laboratory

Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS. It had 224,256 x86-based AMD Opteron processor cores, and operated with a version of Linux called the Cray Linux Environment. Jaguar was a Cray XT5 system, a development from the Cray XT4 supercomputer.

<span class="mw-page-title-main">Supercomputing in Europe</span> Overview of supercomputing in Europe

Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed European Infrastructure for Supercomputing Applications (DEISA), which was formed in 2002 as a consortium of eleven supercomputing centers from seven European countries. Operating within the CORDIS framework, HPC Europa aims to provide access to supercomputers across Europe.

<span class="mw-page-title-main">Appro</span> American technology company

Appro was a developer of supercomputing supporting High Performance Computing (HPC) markets focused on medium- to large-scale deployments. Appro was based in Milpitas, California with a computing center in Houston, Texas, and a manufacturing and support subsidiary in South Korea and Japan.

<span class="mw-page-title-main">NCAR-Wyoming Supercomputing Center</span> High performance computing center in Wyoming, US

The NCAR-Wyoming Supercomputing Center (NWSC) is a high-performance computing (HPC) and data archival facility located in Cheyenne, Wyoming, that provides advanced computing services to researchers in the Earth system sciences.

The Cray XC30 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Xeon processors, with optional Nvidia Tesla or Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. Each liquid-cooled cabinet can contain up to 48 blades, each with eight CPU sockets, and uses 90 kW of power. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.

<span class="mw-page-title-main">Cray XC40</span> Supercomputer manufactured by Cray

The Cray XC40 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Haswell Xeon processors, with optional Nvidia Tesla or Intel Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, stored in air-cooled or liquid-cooled cabinets. The XC series supercomputers are available with the Cray DataWarp applications I/O accelerator technology.

The Cheyenne supercomputer at the NCAR-Wyoming Supercomputing Center (NWSC) in Cheyenne, Wyoming operated for seven years as one of the world’s most powerful and energy-efficient computers from 2017 to 2024. Ranked in November 2016 as the 20th most powerful computer in the world and November 2023 as 160th by Top500, the 5.34-petaflops system is capable of more than triple the amount of scientific computing performed by NCAR’s previous supercomputer, Yellowstone. It also is three times more energy efficient than Yellowstone, with a peak computation rate of more than 3 billion calculations per second for every watt of energy consumed.

The Cray XC50 is a massively parallel multiprocessor supercomputer manufactured by Cray. The machine can support Intel Xeon processors, as well as Cavium ThunderX2 processors, Xeon Phi processors and NVIDIA Tesla P100 GPUs. The processors are connected by Cray's proprietary "Aries" interconnect, in a dragonfly network topology. The XC50 is an evolution of the XC40, with the main difference being the support of Tesla P100 processors and the use of Cray software release CLE 6 or 7.

<span class="mw-page-title-main">Sierra (supercomputer)</span> Supercomputer developed by IBM

Sierra or ATS-2 is a supercomputer built for the Lawrence Livermore National Laboratory for use by the National Nuclear Security Administration as the second Advanced Technology System. It is primarily used for predictive applications in nuclear weapon stockpile stewardship, helping to assure the safety, reliability, and effectiveness of the United States' nuclear weapons.

<span class="mw-page-title-main">Aurora (supercomputer)</span> US DOE supercomputer by Intel and Cray

Aurora is an exascale supercomputer that was sponsored by the United States Department of Energy (DOE) and designed by Intel and Cray for the Argonne National Laboratory. It has been the second fastest supercomputer in the world since 2023. It is expected that after optimizing its performance it will exceed 2 ExaFLOPS, making it the fastest computer ever.

<span class="mw-page-title-main">LUMI</span> Supercomputer in Finland

LUMI is a petascale supercomputer located at the CSC data center in Kajaani, Finland. As of January 2023, the computer is the fastest supercomputer in Europe.

<span class="mw-page-title-main">Leonardo (supercomputer)</span> Supercomputer in Italy

Leonardo is a petascale supercomputer located at the CINECA datacenter in Bologna, Italy. The system consists of an Atos BullSequana XH2000 computer, with close to 14,000 Nvidia Ampere GPUs and 200 Gbit/s Nvidia Mellanox HDR InfiniBand connectivity. Inaugurated in November 2022, Leonardo is capable of 250 petaflops, making it one of the top five fastest supercomputers in the world. It debuted on the TOP500 in November 2022 ranking fourth in the world, and second in Europe.

The Tri-Lab Operating System Stack (TOSS) is a Linux distribution based on Red Hat Enterprise Linux (RHEL) that was created to provide a software stack for high performance computing (HPC) clusters for laboratories within the National Nuclear Security Administration (NNSA). The operating system allows multiple smaller systems to emulate a high-performance computing (HPC) platform.

References

  1. "Cray Awarded $174 Million Supercomputer Contract From the National Nuclear Security Administration". Archived from the original on 2017-10-18. Retrieved 2014-08-24.
  2. Morgan, Timothy Prickett (1 October 2020). "With "Crossroads" Supercomputer, HPE Notches Another DOE Win". The Next Platform. Retrieved 5 November 2020.
  3. "Trinity / NERSC-8 RFP". Archived from the original on 2018-11-26. Retrieved 2018-11-26.
  4. "Cray Awarded $174 Million Supercomputer Contract From the National Nuclear Security Administration". Archived from the original on 2018-07-09. Retrieved 2018-11-26.
  5. "Trinity Supercomputer's Haswell and KNL Partitions Are Merged". 19 July 2017.
  6. "Novermber [sic] 2015 | TOP500".
  7. "LANL Adds Capacity to Trinity Supercomputer for Stockpile Stewardship". 24 July 2017.
  8. "November 2016 | TOP500".
  9. "Trinity Supercomputer's Haswell and KNL Partitions Are Merged". 19 July 2017.
  10. "November 2018 | TOP500".
  11. "NNSA supercomputers recognized worldwide for speed and performance". Energy.gov. Retrieved 2023-11-13.
  12. "Technical Specifications".
  13. "Trinity Supercomputer's Haswell and KNL Partitions Are Merged". 19 July 2017.
  14. https://www.snia.org/sites/default/files/SDC/2018/presentations/General_Session/Grider_Gary_Storage_Lessons_from_HPC_A_Multi-Decadal_Struggle.pdf [ bare URL PDF ]
  15. https://www.snia.org/sites/default/files/SDC/2018/presentations/General_Session/Grider_Gary_Storage_Lessons_from_HPC_A_Multi-Decadal_Struggle.pdf [ bare URL PDF ]
  16. https://www.snia.org/sites/default/files/SDC/2018/presentations/General_Session/Grider_Gary_Storage_Lessons_from_HPC_A_Multi-Decadal_Struggle.pdf [ bare URL PDF ]