High Performance Storage System

High Performance Storage System
Developer(s)	HPSS Collaboration (IBM, LANL, LBNL, LLNL, ORNL, SNL)
Stable release	10.3 / September 2023
Operating system	Linux
Type	Hierarchical Storage Management
License	Proprietary
Website	hpss-collaboration

Last updated October 27, 2023

High Performance Storage System (HPSS) is a flexible, scalable, policy-based, software-defined Hierarchical Storage Management product developed by the HPSS Collaboration. It provides scalable hierarchical storage management (HSM), archive, and file system services using cluster, LAN and SAN ^{[ clarification needed ]} technologies to aggregate the capacity and performance of many computers, disks, disk systems, tape drives, and tape libraries.^[1]

Architecture

HPSS supports a variety of methods for accessing and creating data. Among them are support for FTP, parallel FTP, FUSE (Linux), as well as a robust client API with support for parallel I/O.

As of version 7.5, HPSS has full support on Linux. The HPSS client API is supported on AIX, Linux, and Solaris.^[1]

The implementation is built around IBM's Db2, a scalable relational database management system.

The HPSS Collaboration

In early 1992, several United States Department of Energy (DOE) National Laboratories — Lawrence Livermore (LLNL), Los Alamos (LANL), Oak Ridge (ORNL), and Sandia (SNL) — joined with IBM to form the National Storage Laboratory (NSL).^[2] The NSL's purpose was to commercialize software and hardware technologies that would overcome computing and data storage bottlenecks.^[3] The NSL's research on data storage gave birth to the collaboration which produces HPSS. This collaboration began in the fall of 1992^[4] and involved IBM's Houston Global Services and five DOE national labs (Lawrence Berkeley [LBL], LLNL, LANL, ORNL, and SNL).^[1] At that time, the HPSS design team at the DOE national laboratories and IBM recognized there would be a data storage explosion driven by computing power rising to teraflops/petaflops requiring data stored in HSMs to rise to petabytes and beyond, data transfer rates with the HSM to rise to gigabytes/s and higher, and daily throughput with a HSM in tens of terabytes per day. Therefore, the collaboration set out to design and deploy a system that would scale by a factor of 1,000 or more and evolve from the base above toward these expected targets and beyond.^[5]

The HPSS collaboration is based on the premise that no single organization has the experience and resources to meet all the challenges represented by the growing imbalance between computing power and data collection capabilities, and storage system I/O, capacity, and functionality. Over twenty organizations worldwide including industry, US Department of Energy (DOE), other federal laboratories, universities, National Science Foundation (NSF) supercomputer centers, French Commissariat a l'Energie Atomique (CEA), and Gleicher Enterprises have contributed to various aspects of this effort.

As of 2022, the primary HPSS development team consists of:

IBM Global Business Services (Houston, TX)
Los Alamos National Laboratory (Los Alamos, NM)
Lawrence Livermore National Laboratory (Livermore, CA)
Lawrence Berkeley National Energy Research Scientific Computing Center (Berkeley, CA)
Oak Ridge National Laboratory (Oak Ridge, TN)
Sandia National Laboratory (Albuquerque, NM)

Notable achievements

Two of the larger HPSS sites, ECMWF and UK Met Office, had 217 and 99 petabytes of data stored within a single HPSS instance and namespace as of December 7, 2016.^[5]
On November 14, 2007, the San Diego Supercomputer Center along with IBM, DataDirect, and Brocade demonstrated a "Billion File" test which successfully backed up a billion files from GPFS into HPSS.^[6]
In May 2013 a 380 Petabyte HPSS installation entered service at the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign.^[7]

Related Research Articles

Los Alamos National Laboratory is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, in the American southwest. Best known for its central role in helping develop the first atomic bomb, LANL is one of the world's largest and most advanced scientific institutions.

Lawrence Livermore National Laboratory (LLNL) is a federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now is sponsored by the United States Department of Energy and administered by Lawrence Livermore National Security, LLC.

Lawrence Berkeley National Laboratory (LBNL) is a federally funded research and development center in the hills of Berkeley, California, United States. Established in 1931 by the University of California (UC), the laboratory is sponsored by the United States Department of Energy and administered by the UC system. Ernest Lawrence, who won the Nobel prize for inventing the cyclotron, founded the Lab and served as its Director until his death in 1958. Located in the Berkeley Hills, the lab overlooks the campus of the University of California, Berkeley.

In computing, floating point operations per second is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second.

Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption.

The United States Department of Energy National Laboratories and Technology Centers is a system of laboratories overseen by the United States Department of Energy (DOE) for scientific and technological research. The primary mission of the DOE national laboratories is to conduct research and development (R&D) addressing national priorities: energy and climate, the environment, national security, and health. Sixteen of the seventeen DOE national laboratories are federally funded research and development centers administered, managed, operated and staffed by private-sector organizations under management and operating (M&O) contracts with the DOE.

ASCI Blue Pacific was a supercomputer installed at the Lawrence Livermore National Laboratory (LLNL) in Livermore, CA at the end of 1998. It was a collaboration between IBM and LLNL.

MareNostrum is the main supercomputer in the Barcelona Supercomputing Center. It is the most powerful supercomputer in Spain, one of thirteen supercomputers in the Spanish Supercomputing Network and one of the seven supercomputers of the European infrastructure PRACE.

The Advanced Simulation and Computing Program is a super-computing program run by the National Nuclear Security Administration, in order to simulate, test, and maintain the United States nuclear stockpile. The program was created in 1995 in order to support the Stockpile Stewardship Program. The goal of the initiative is to extend the lifetime of the current aging stockpile.

The Cray Time Sharing System, also known in the Cray user community as CTSS, was developed as an operating system for the Cray-1 or Cray X-MP line of supercomputers in 1978. CTSS was developed by the Los Alamos Scientific Laboratory in conjunction with the Lawrence Livermore Laboratory. CTSS was popular with Cray sites in the United States Department of Energy (DOE), but was used by several other Cray sites, such as the San Diego Supercomputing Center.

Lustre is a type of parallel distributed file system, generally used for large-scale cluster computing. The name Lustre is a portmanteau word derived from Linux and cluster. Lustre file system software is available under the GNU General Public License and provides high performance file systems for computer clusters ranging in size from small workgroup clusters to large-scale, multi-site systems. Since June 2005, Lustre has consistently been used by at least half of the top ten, and more than 60 of the top 100 fastest supercomputers in the world, including the world's No. 1 ranked TOP500 supercomputer in November 2022, Frontier, as well as previous top supercomputers such as Fugaku, Titan and Sequoia.

The National Energy Research Scientific Computing Center (NERSC), is a high-performance computing (supercomputer) National User Facility operated by Lawrence Berkeley National Laboratory for the United States Department of Energy Office of Science. As the mission computing center for the Office of Science, NERSC houses high performance computing and data systems used by 9,000 scientists at national laboratories and universities around the country. Research at NERSC is focused on fundamental and applied research in energy efficiency, storage, and generation; Earth systems science, and understanding of fundamental forces of nature and the universe. The largest research areas are in High Energy Physics, Materials Science, Chemical Sciences, Climate and Environmental Sciences, Nuclear Physics, and Fusion Energy research. NERSC's newest and largest supercomputer is Perlmutter, which debuted in 2021 ranked 5th on the TOP500 list of world's fastest supercomputers.

The Oak Ridge Leadership Computing Facility (OLCF), formerly the National Leadership Computing Facility, is a designated user facility operated by Oak Ridge National Laboratory and the Department of Energy. It contains several supercomputers, the largest of which is an HPE OLCF-5 named Frontier, which was ranked 1st on the TOP500 list of world's fastest supercomputers as of June 2023. It is located in Oak Ridge, Tennessee.

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL benchmarks, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

IBM Sequoia was a petascale Blue Gene/Q supercomputer constructed by IBM for the National Nuclear Security Administration as part of the Advanced Simulation and Computing Program (ASC). It was delivered to the Lawrence Livermore National Laboratory (LLNL) in 2011 and was fully deployed in June 2012. Sequoia was dismantled in 2020, its last position on the top500.org list was #22 in the November 2019 list.

The National Center for Computational Sciences (NCCS) is a United States Department of Energy (DOE) Leadership Computing Facility that houses the Oak Ridge Leadership Computing Facility (OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading high-performance computing (HPC) resources and international expertise in scientific computing.

ASC Purple was a supercomputer installed at the Lawrence Livermore National Laboratory in Livermore, California. The computer was a collaboration between IBM Corporation and Lawrence Livermore Lab. Announced November 19, 2002, it was installed in July 2005 and decommissioned on November 10th, 2010. The contract for this computer along with the Blue Gene/L supercomputer was worth US $290 million. As of November 2009, the computer ranked 66th on the TOP500 supercomputer list.

Appro was a developer of supercomputing supporting High Performance Computing (HPC) markets focused on medium- to large-scale deployments. Appro was based in Milpitas, California with a computing center in Houston, Texas, and a manufacturing and support subsidiary in South Korea and Japan.

Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, capable of 200 petaFLOPS thus making it the 5th fastest supercomputer in the world after Frontier (OLCF-5), Fugaku, LUMI, and Leonardo, with Frontier being the fastest. It held the number 1 position from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS.

The Tri-Lab Operating System Stack (TOSS) is a Linux distribution based on Red Hat Enterprise Linux (RHEL) that was created to provide a software stack for high performance computing (HPC) clusters for laboratories within the National Nuclear Security Administration (NNSA). The operating system allows multiple smaller systems to emulate a high-performance computing (HPC) platform.

References

1 2 3 "Official HPSS Collaboration Website". IBM.
↑ "High Performance Storage System Taking the Long View". str.llnl.gov. Retrieved 2023-03-29.
↑ Watson, R.W.; Coyne, R.A. (June 1994). "The National Storage Laboratory (NSL): Overview and status". Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems. pp. 39–43. doi:10.1109/MASS.1994.373025. ISBN 0-8186-5580-1. S2CID 206444692.
↑ "HPSS at LLNL". LLNL.
1 2 Largest HPSS Sites 1+ petabytes
↑ HPCWire Nov 15, 2007 Archived November 17, 2007, at the Wayback Machine
↑ "NCSA puts world's largest High Performance Storage System into production". 2013-05-30. Retrieved 2014-08-30.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[hpss_collab-1] 1 2 3 "Official HPSS Collaboration Website". IBM.

[2] "High Performance Storage System Taking the Long View". str.llnl.gov. Retrieved 2023-03-29.

[3] Watson, R.W.; Coyne, R.A. (June 1994). "The National Storage Laboratory (NSL): Overview and status". Proceedings Thirteenth IEEE Symposium on Mass Storage Systems. Toward Distributed Storage and Data Management Systems. pp. 39–43. doi:10.1109/MASS.1994.373025. ISBN 0-8186-5580-1. S2CID 206444692.

[hpss_llnl-4] "HPSS at LLNL". LLNL.

[petabyte-5] 1 2 Largest HPSS Sites 1+ petabytes

[sdsc-6] HPCWire Nov 15, 2007 Archived November 17, 2007, at the Wayback Machine

[7] "NCSA puts world's largest High Performance Storage System into production". 2013-05-30. Retrieved 2014-08-30.

[1]

[2]

[3]

[4]

[5]

[6]

[7]