NAS Parallel Benchmarks

Last updated
NAS Parallel Benchmarks
Original author(s) NASA Numerical Aerodynamic Simulation Program
Developer(s) NASA Advanced Supercomputing Division
Initial release1991 (1991)
Stable release
3.4
Website nas.nasa.gov/Software/NPB/

NAS Parallel Benchmarks (NPB) are a set of benchmarks targeting performance evaluation of highly parallel supercomputers. They are developed and maintained by the NASA Advanced Supercomputing (NAS) Division (formerly the NASA Numerical Aerodynamic Simulation Program) based at the NASA Ames Research Center. NAS solicits performance results for NPB from all sources. [1]

Contents

History

Motivation

Traditional benchmarks that existed before NPB, such as the Livermore loops, the LINPACK Benchmark and the NAS Kernel Benchmark Program, were usually specialized for vector computers. They generally suffered from inadequacies including parallelism-impeding tuning restrictions and insufficient problem sizes, which rendered them inappropriate for highly parallel systems. Equally unsuitable were full-scale application benchmarks due to high porting cost and unavailability of automatic software parallelization tools. [2] As a result, NPB were developed in 1991 [3] and released in 1992 [4] to address the ensuing lack of benchmarks applicable to highly parallel machines.

NPB 1

The first specification of NPB recognized that the benchmarks should feature

In the light of these guidelines, it was deemed the only viable approach to use a collection of "paper-and-pencil" benchmarks that specified a set of problems only algorithmically and left most implementation details to the implementer's discretion under certain necessary limits.

NPB 1 defined eight benchmarks, each in two problem sizes dubbed Class A and Class B. Sample codes written in Fortran 77 were supplied. They used a small problem size Class S and were not intended for benchmarking purposes. [2]

NPB 2

Since its release, NPB 1 displayed two major weaknesses. Firstly, due to its "paper-and-pencil" specification, computer vendors usually highly tuned their implementations so that their performance became difficult for scientific programmers to attain. Secondly, many of these implementation were proprietary and not publicly available, effectively concealing their optimizing techniques. Secondly, problem sizes of NPB 1 lagged behind the development of supercomputers as the latter continued to evolve. [3]

NPB 2, released in 1996, [5] [6] came with source code implementations for five out of eight benchmarks defined in NPB 1 to supplement but not replace NPB 1. It extended the benchmarks with an up-to-date problem size Class C. It also amended the rules for submitting benchmarking results. The new rules included explicit requests for output files as well as modified source files and build scripts to ensure public availability of the modifications and reproducibility of the results. [3]

NPB 2.2 contained implementations of two more benchmarks. [5] NPB 2.3 of 1997 was the first complete implementation in MPI. [4] It shipped with serial versions of the benchmarks consistent with the parallel versions and defined a problem size Class W for small-memory systems. [7] NPB 2.4 of 2002 offered a new MPI implementation and introduced another still larger problem size Class D. [6] It also augmented one benchmark with I/O-intensive subtypes. [4]

NPB 3

NPB 3 retained the MPI implementation from NPB 2 and came in more flavors, namely OpenMP, [8] Java [9] and High Performance Fortran. [10] These new parallel implementations were derived from the serial codes in NPB 2.3 with additional optimizations. [7] NPB 3.1 and NPB 3.2 added three more benchmarks, [11] [12] which, however, were not available across all implementations; NPB 3.3 introduced a Class E problem size. [7] Based on the single-zone NPB 3, a set of multi-zone benchmarks taking advantage of the MPI/OpenMP hybrid programming model were released under the name NPB-Multi-Zone (NPB-MZ) for "testing the effectiveness of multi-level and hybrid parallelization paradigms and tools". [1] [13]

The benchmarks

As of NPB 3.3, eleven benchmarks are defined as summarized in the following table.

BenchmarkName derived from [2] Available sinceDescription [2] Remarks
MGMultiGridNPB 1 [2] Approximate the solution to a three-dimensional discrete Poisson equation using the V-cycle multigrid method
CGConjugate GradientEstimate the smallest eigenvalue of a large sparse symmetric positive-definite matrix using the inverse iteration with the conjugate gradient method as a subroutine for solving systems of linear equations
FTFast Fourier TransformSolve a three-dimensional partial differential equation (PDE) using the fast Fourier transform (FFT)
IS Integer Sort Sort small integers using the bucket sort [5]
EP Embarrassingly Parallel Generate independent Gaussian random variates using the Marsaglia polar method
BTBlock TridiagonalSolve a synthetic system of nonlinear PDEs using three different algorithms involving block tridiagonal, scalar pentadiagonal and symmetric successive over-relaxation (SSOR) solver kernels, respectively
  • The BT benchmark has I/O-intensive subtypes [4]
  • All three benchmarks have multi-zone versions [13]
SPScalar Pentadiagonal [6]
LU Lower-Upper symmetric Gauss-Seidel [6]
UAUnstructured Adaptive [11] NPB 3.1 [7] Solve Heat equation with convection and diffusion from moving ball. Mesh is adaptive and recomputed at every 5th step.
DC Data Cube operator [12]
DTData Traffic [7] NPB 3.2 [7]

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, supercomputers have existed, which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

<span class="mw-page-title-main">Ames Research Center</span> Research center operated by NASA

The Ames Research Center (ARC), also known as NASA Ames, is a major NASA research center at Moffett Federal Airfield in California's Silicon Valley. It was founded in 1939 as the second National Advisory Committee for Aeronautics (NACA) laboratory. That agency was dissolved and its assets and personnel transferred to the newly created National Aeronautics and Space Administration (NASA) on October 1, 1958. NASA Ames is named in honor of Joseph Sweetman Ames, a physicist and one of the founding members of NACA. At last estimate NASA Ames had over US$3 billion in capital equipment, 2,300 research personnel and a US$860 million annual budget.

<span class="mw-page-title-main">Moffett Federal Airfield</span> Joint civil-military airport in California

Moffett Federal Airfield, also known as Moffett Field, is a joint civil-military airport located in an unincorporated part of Santa Clara County, California, United States, between northern Mountain View and northern Sunnyvale. On November 10, 2014, NASA announced that it would be leasing 1,000 acres (400 ha) of the airfield property to Google for 60 years.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">ETA10</span> 1980s supercomputer

The ETA10 is a vector supercomputer designed, manufactured, and marketed by ETA Systems, a spin-off division of Control Data Corporation (CDC). The ETA10 was an evolution of the CDC Cyber 205, which can trace its origins back to the CDC STAR-100, one of the first vector supercomputers to be developed.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">Jack Dongarra</span> American computer scientist (born 1950)

Jack Joseph Dongarra is an American computer scientist and mathematician. He is the American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the University of Tennessee. He holds the position of a Distinguished Research Staff member in the Computer Science and Mathematics Division at Oak Ridge National Laboratory, Turing Fellowship in the School of Mathematics at the University of Manchester, and is an adjunct professor and teacher in the Computer Science Department at Rice University. He served as a faculty fellow at the Texas A&M University Institute for Advanced Study (2014–2018). Dongarra is the founding director of the Innovative Computing Laboratory at the University of Tennessee. He was the recipient of the Turing Award in 2021.

<span class="mw-page-title-main">High-performance computing</span> Computing with supercomputers and clusters

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.

<span class="mw-page-title-main">NASA Advanced Supercomputing Division</span> Provides computing resources for various NASA projects

The NASA Advanced Supercomputing (NAS) Division is located at NASA Ames Research Center, Moffett Field in the heart of Silicon Valley in Mountain View, California. It has been the major supercomputing and modeling and simulation resource for NASA missions in aerodynamics, space exploration, studies in weather patterns and ocean currents, and space shuttle and aircraft design and development for almost forty years.

The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of a parallel application. PVFS was designed for use in large scale cluster computing. PVFS focuses on high performance access to large data sets. It consists of a server process and a client library, both of which are written entirely of user-level code. A Linux kernel module and pvfs-client process allow the file system to be mounted and used with standard utilities. The client library provides for high performance access via the message passing interface (MPI). PVFS is being jointly developed between The Parallel Architecture Research Laboratory at Clemson University and the Mathematics and Computer Science Division at Argonne National Laboratory, and the Ohio Supercomputer Center. PVFS development has been funded by NASA Goddard Space Flight Center, The DOE Office of Science Advanced Scientific Computing Research program, NSF PACI and HECURA programs, and other government and private agencies. PVFS is now known as OrangeFS in its newest development branch.

Trilinos is a collection of open-source software libraries, called packages, intended to be used as building blocks for the development of scientific applications. The word "Trilinos" is Greek and conveys the idea of "a string of pearls", suggesting a number of software packages linked together by a common infrastructure. Trilinos was developed at Sandia National Laboratories from a core group of existing algorithms and utilizes the functionality of software interfaces such as BLAS, LAPACK, and MPI. In 2004, Trilinos received an R&D100 Award.

<span class="mw-page-title-main">Finite element machine</span> Project

The Finite Element Machine (FEM) was a late 1970s-early 1980s NASA project to build and evaluate the performance of a parallel computer for structural analysis. The FEM was completed and successfully tested at the NASA Langley Research Center in Hampton, Virginia. The motivation for FEM arose from the merger of two concepts: the finite element method of structural analysis and the introduction of relatively low-cost microprocessors.

<span class="mw-page-title-main">Research Institute for Advanced Computer Science</span>

The Research Institute for Advanced Computer Science (RIACS) was founded June 1, 1983 as a joint collaboration between the Universities Space Research Association (USRA) and the NASA Ames Research Center. The Institute was created to conduct basic and applied research in computer science, covering a broad range of research topics of interest to the aerospace community including supercomputing, computational fluid dynamics, computational chemistry, high performance networking, and artificial intelligence.

<span class="mw-page-title-main">NASA Research Park</span> Research park near San Jose, California

NASA Research Park, situated near San Jose, California, operates as a research facility under the auspices of NASA. It is focused on fostering collaboration among government entities, academic institutions, industries, and nonprofit organizations. It was established in the fall of 2002.

<span class="mw-page-title-main">Pleiades (supercomputer)</span> NASA supercomputer at Ames Research Center/NAS

Pleiades is a petascale supercomputer housed at the NASA Advanced Supercomputing (NAS) facility at NASA's Ames Research Center located at Moffett Field near Mountain View, California. It is maintained by NASA and partners Hewlett Packard Enterprise and Intel.

The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in November 2010. New versions of the list are published twice a year. The main performance metric used to rank the supercomputers is GTEPS.

In the nation's quest to provide security along its lengthy coastlines, air reconnaissance was put forth by the futuristic Rear Admiral William A. Moffett. Through his efforts, two Naval Air Stations were commissioned in the early 1930s to port the Naval Airships (dirigibles) which he believed capable of meeting this challenge.

<span class="mw-page-title-main">Horst D. Simon</span> Computer scientist

Horst D. Simon is a computer scientist known for his contributions to high-performance computing (HPC) and computational science. He is director of ADIA Lab in Abu Dhabi, UAE and editor of TOP500.

The Center for Supercomputing Research and Development (CSRD) at the University of Illinois (UIUC) was a research center funded from 1984 to 1993. It built the shared memory Cedar computer system, which included four hardware multiprocessor clusters, as well as parallel system and applications software. It was distinguished from the four earlier UIUC Illiac systems by starting with commercial shared memory subsystems that were based on an earlier paper published by the CSRD founders. Thus CSRD was able to avoid many of the hardware design issues that slowed the Illiac series work. Over its 9 years of major funding, plus follow-on work by many of its participants, CSRD pioneered many of the shared memory architectural and software technologies upon which all 21st century computation is based.

References

  1. 1 2 "NAS Parallel Benchmarks Changes". NASA Advanced Supercomputing Division. Retrieved 2009-02-23.
  2. 1 2 3 4 5 Baily, D.; Barszcz, E.; Barton, J.; Browning, D.; Carter, R.; Dagum, L.; Fatoohi, R.; Fineberg, S.; Frederickson, P.; Weeratunga, S. (March 1994), "The NAS Parallel Benchmarks" (PDF), NAS Technical Report RNR-94-007, NASA Ames Research Center, Moffett Field, CA
  3. 1 2 3 Bailey, D.; Harris, T.; Saphir, W.; van der Wijngaart, R.; Woo, A.; Yarrow, M. (December 1995), "The NAS Parallel Benchmarks 2.0" (PDF), NAS Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA
  4. 1 2 3 4 Wong, P.; van der Wijngaart, R. (January 2003), "NAS Parallel Benchmarks I/O Version 2.4" (PDF), NAS Technical Report NAS-03-002, NASA Ames Research Center, Moffett Field, CA
  5. 1 2 3 Saphir, W.; van der Wijngaart, R.; Woo, A.; Yarrow, M., New Implementations and Results for the NAS Parallel Benchmarks 2 (PDF), NASA Ames Research Center, Moffett Field, CA
  6. 1 2 3 4 van der Wijngaart, R. (October 2002), "NAS Parallel Benchmarks Version 2.4" (PDF), NAS Technical Report NAS-02-007, NASA Ames Research Center, Moffett Field, CA
  7. 1 2 3 4 5 6 "NAS Parallel Benchmarks Changes". NASA Advanced Supercomputing Division. Retrieved 2009-03-17.
  8. Jin, H.; Frumkin, M.; Yan, J. (October 1999), "The OpenMP Implementation of NAS Parallel Benchmarks and Its Performance" (PDF), NAS Technical Report NAS-99-011, NASA Ames Research Center, Moffett Field, CA
  9. Frumkin, M.; Schultz, M.; Jin, H.; Yan, J., "Implementation of the NAS Parallel Benchmarks in Java" (PDF), NAS Technical Report NAS-02-009, NASA Ames Research Center, Moffett Field, CA
  10. Frumkin, M.; Jin, H.; Yan, J. (September 1998), "Implementation of NAS Parallel Benchmarks in High Performance Fortran" (PDF), NAS Technical Report NAS-98-009, NASA Ames Research Center, Moffett Field, CA
  11. 1 2 Feng, H.; van der Wijngaart, F.; Biswas, R.; Mavriplis, C. (July 2004), "Unstructured Adaptive (UA) NAS Parallel Benchmark, Version 1.0" (PDF), NAS Technical Report NAS-04-006, NASA Ames Research Center, Moffett Field, CA
  12. 1 2 Frumkin, M.; Shabanov, L. (September 2004), "Benchmarking Memory Performance with the Data Cube Operator" (PDF), NAS Technical Report NAS-04-013, NASA Ames Research Center, Moffett Field, CA
  13. 1 2 van der Wijngaart, R.; Jin, H. (July 2003), "NAS Parallel Benchmarks, Multi-Zone Versions" (PDF), NAS Technical Report NAS-03-010, NASA Ames Research Center, Moffett Field, CA