HPC Challenge Benchmark

HPC Challenge Benchmark
Original author(s)	Innovative Computing Laboratory, University of Tennessee
Initial release	2003
Stable release	1.5.0 / March 18, 2016;6 years ago
Platform	Cross-platform
License	BSD
Website	icl.cs.utk.edu/hpcc/

Last updated June 26, 2022

HPC Challenge Benchmark combines several benchmarks to test a number of independent attributes of the performance of high-performance computer (HPC) systems. The project has been co-sponsored by the DARPA High Productivity Computing Systems program, the United States Department of Energy and the National Science Foundation.^[2]

Context

The performance of complex applications on HPC systems can depend on a variety of independent performance attributes of the hardware. The HPC Challenge Benchmark is an effort to improve visibility into this multidimensional space by combining the measurement of several of these attributes into a single program.

Although the performance attributes of interest are not specific to any particular computer architecture, the reference implementation of the HPC Challenge Benchmark in C and MPI assumes that the system under test is a cluster of shared memory multiprocessor systems connected by a network. Due to this assumption of a hierarchical system structure most of the tests are run in several different modes of operation. Following the notation used by the benchmark reports, results labeled "single" mean that the test was run on one randomly chosen processor in the system, results labeled "star" mean that an independent copy of the test was run concurrently on each processor in the system, and results labeled "global" mean that all the processors were working in coordination to solve a single problem (with data distributed across the nodes of the system).

Components

The benchmark currently consists of 7 tests (with the modes of operation indicated for each):

HPL^[3] (High Performance LINPACK) – measures performance of a solver for a dense system of linear equations (global).
DGEMM – measures performance for matrix-matrix multiplication (single, star).
STREAM^[4] – measures sustained memory bandwidth to/from memory (single, star).
PTRANS – measures the rate at which the system can transpose a large array (global).
RandomAccess – measures the rate of 64-bit updates to randomly selected elements of a large table (single, star, global).
FFT – performs a Fast Fourier Transform on a large one-dimensional vector using the generalized Cooley–Tukey algorithm (single, star, global).
Communication Bandwidth and Latency – MPI-centric performance measurements based on the b_eff^[5] bandwidth/latency benchmark.

Performance attributes

At a high level, the tests are intended to provide coverage of four important attributes of performance: double-precision floating-point arithmetic (DGEMM and HPL), local memory bandwidth (STREAM), network bandwidth for "large" messages (PTRANS, RandomAccess, FFT, b_eff), and network bandwidth for "small" messages (RandomAccess, b_eff). Some of the codes are more complex than others and can have additional performance sensitivities. For example, in some systems HPL performance can be limited by network bandwidth and/or network latency.

Competition

The annual HPC Challenge Award Competition at the Supercomputing Conference focuses on four of the most challenging benchmarks in the suite:

Global HPL
Global RandomAccess (OR BSS Random Access Benchmark)
EP STREAM (Triad) per system
Global FFT

There are two classes of awards:

Class 1: Best performance on a base or optimized run submitted to the HPC Challenge website.^[6]
Class 2: Most "elegant" implementation of four or five computational kernels including three or more of the HPC Challenge benchmarks.^[7]

Related Research Articles

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there are supercomputers which can perform over 10¹⁷ FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS).

The Earth Simulator (ES), developed by the Japanese government's initiative "Earth Simulator Project", was a highly parallel vector supercomputer system for running global climate models to evaluate the effects of global warming and problems in solid earth geophysics. The system was developed for Japan Aerospace Exploration Agency, Japan Atomic Energy Research Institute, and Japan Marine Science and Technology Center (JAMSTEC) in 1997. Construction started in October 1999, and the site officially opened on 11 March 2002. The project cost 60 billion yen.

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption.

In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algorithm can be measured based on the usage of different resources. Algorithmic efficiency can be thought of as analogous to engineering productivity for a repeating or continuous process.

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

Memory bandwidth is the rate at which data can be read from or stored into a semiconductor memory by a processor. Memory bandwidth is usually expressed in units of bytes/second, though this can vary for systems with natural data sizes that are not a multiple of the commonly used 8-bit bytes.

In computing, computer performance is the amount of useful work accomplished by a computer system. Outside of specific contexts, computer performance is estimated in terms of accuracy, efficiency and speed of executing computer program instructions. When it comes to high computer performance, one or more of the following factors might be involved:

The TOP500 project ranks and details the 500 most powerful non-distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coincides with the International Supercomputing Conference in June, and the second is presented at the ACM/IEEE Supercomputing Conference in November. The project aims to provide a reliable basis for tracking and detecting trends in high-performance computing and bases rankings on HPL, a portable implementation of the high-performance LINPACK benchmark written in Fortran for distributed-memory computers.

PERCS is IBM's answer to DARPA's High Productivity Computing Systems (HPCS) initiative. The program resulted in commercial development and deployment of the Power 775, a supercomputer design with extremely high performance ratios in fabric and memory bandwidth, as well as very high performance density and power efficiency.

ITU-T Y.156sam Ethernet Service Activation Test Methodology is a draft recommendation under study by the ITU-T describing a new testing methodology adapted to the multiservice reality of packet-based networks.

Giga-updates per second (GUPS) is a measure of computer performance. GUPS is a measurement of how frequently a computer can issue updates to randomly generated RAM locations. GUPS measurements stress the latency and especially bandwidth capabilities of a machine.

ITU-T Y.1564 is an Ethernet service activation test methodology, which is the new ITU-T standard for turning up, installing and troubleshooting Ethernet-based services. It is the only standard test methodology that allows for complete validation of Ethernet service-level agreements (SLAs) in a single test.

The K computer – named for the Japanese word/numeral "kei" (京), meaning 10 quadrillion (10¹⁶) – was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Prefecture, Japan. The K computer was based on a distributed memory architecture with over 80,000 compute nodes. It was used for a variety of applications, including climate research, disaster prevention and medical research. The K computer's operating system was based on the Linux kernel, with additional drivers designed to make use of the computer's hardware.

The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense n by n system of linear equations Ax = b, which is a common task in engineering.

The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in November 2010. New versions of the list are published twice a year. The main performance metric used to rank the supercomputers is GTEPS.

In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further, cache coherency issues can affect multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism.

The HPCGbenchmark is a supercomputing benchmark test proposed by Michael Heroux from Sandia National Laboratories, and Jack Dongarra and Piotr Luszczek from the University of Tennessee. It is intended to model the data access patterns of real-world applications such as sparse matrix calculations, thus testing the effect of limitations of the memory subsystem and internal interconnect of the supercomputer on its computing performance. Because it is internally I/O bound, HPCG testing generally achieves only a tiny fraction of the peak FLOPS the computer could theoretically deliver.

This is a list of the individual topics in Electronics, Mathematics, and Integrated Circuits that together make up the Computer Engineering field. The organization is by topic to create an effective Study Guide for this field. The contents match the full body of topics and detail information expected of a person identifying themselves as a Computer Engineering expert as laid out by the National Council of Examiners for Engineering and Surveying. It is a comprehensive list and superset of the computer engineering topics generally dealt with at any one time.

References

↑ "Releases · icl-utk-edu/hpcc". github.com. Retrieved 2021-04-12.
↑ "Cray X1 Supercomputer Has Highest Reported Scores on Government-Sponsored HPC Challenge Benchmark Tests". 2004-06-14. Archived from the original on 2009-03-30. Retrieved 2010-01-22.
↑ "HPL – A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". Innovative Computing Laboratory, University of Tennessee at Knoxville. Retrieved 2015-06-10.
↑ "STREAM: Sustainable Memory Bandwidth in High Performance Computers" . Retrieved 2015-06-10.
↑ "Effective Bandwidth (b_eff) Benchmark". High Performance Computing Center Stuttgart. Retrieved 2015-06-10.
↑ The benchmark is designed to allow replacement of a limited set of functions with more highly optimized versions while remaining a "base" run. Additional (but still limited) modifications are allowed under the category of "optimized" runs.
↑ "HPC Challenge Award Competition". DARPA HPCS Program. Retrieved 2010-01-23.

External links

HPC Challenge Benchmark Official Website
HPC Challenge Award Competition Official Website
BSS Random Access Benchmark Performance Evaluation and Optimization of Random Memory Access on Multicores with High Productivity (Best Paper Award) at ACM/IEEE HiPC 2010

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "Releases · icl-utk-edu/hpcc". github.com. Retrieved 2021-04-12.

[2] "Cray X1 Supercomputer Has Highest Reported Scores on Government-Sponsored HPC Challenge Benchmark Tests". 2004-06-14. Archived from the original on 2009-03-30. Retrieved 2010-01-22.

[3] "HPL – A Portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers". Innovative Computing Laboratory, University of Tennessee at Knoxville. Retrieved 2015-06-10.

[4] "STREAM: Sustainable Memory Bandwidth in High Performance Computers" . Retrieved 2015-06-10.

[5] "Effective Bandwidth (b_eff) Benchmark". High Performance Computing Center Stuttgart. Retrieved 2015-06-10.

[6] The benchmark is designed to allow replacement of a limited set of functions with more highly optimized versions while remaining a "base" run. Additional (but still limited) modifications are allowed under the category of "optimized" runs.

[7] "HPC Challenge Award Competition". DARPA HPCS Program. Retrieved 2010-01-23.

[1]

[2]

[3]

[4]

[5]

[6]

[7]