Princeton Application Repository for Shared-Memory Computers

Last updated
PARSEC Benchmark Suite
Parsec-logo.png
Original author(s) Princeton University and Intel
Developer(s) Christian Bienia
Initial releaseJanuary 25, 2008
Stable release
2.1 / August 13, 2009
Written in C/C++
Operating system Linux, OpenSolaris
Type Benchmark
License 3-clause BSD
Website parsec.cs.princeton.edu

Princeton Application Repository for Shared-Memory Computers (PARSEC) is a benchmark suite composed of multi-threaded emerging workloads that is used to evaluate and develop next-generation chip-multiprocessors. It was collaboratively created by Intel and Princeton University to drive research efforts on future computer systems. [1] [2] Since its inception the benchmark suite has become a community project that is continued to be improved by a broad range of research institutions. [3] PARSEC is freely available and is used for both academic and non-academic research. [4] [5] [6]

Contents

Background

The introduction of chip-multiprocessors required computer manufacturers to rewrite software for the first time to take advantage of parallel processing capabilities, including rewriting existing systems for testing and development. [2] [7] At that time parallel software only existed in very specialized areas. However, before chip-multiprocessors became commonly available software developers were not willing to rewrite any mainstream programs, which means hardware manufacturers did not have access to any programs for test and development purposes that represented the expected real-world program behavior accurately. This posed a hen-and-egg problem that motivated a new type of benchmark suite with parallel programs that could take full advantage of chip-multiprocessors.

PARSEC was created to break this circular dependency. It was designed to fulfill the following five objectives: [8]

  1. Focuses on multithreaded applications
  2. Includes emerging workloads
  3. Has a diverse selection of programs
  4. Workloads employ state-of-art techniques
  5. The suite supports research

Traditional benchmarks that were publicly available before PARSEC were generally limited in their scope of included application domains or typically only available in an unparallelized, serial version. Parallel programs were only prevalent in the domain of High-Performance Computing and on a much smaller scale in business environments. [9] Chip-multiprocessors however were expected to be heavily used in all areas of computing such as with parallelized consumer applications.

Workloads

The PARSEC Benchmark Suite is available in version 2.1, which includes the following workloads: [10]

Related Research Articles

Supercomputer Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there have existed supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS).

Symmetric multiprocessing The equal sharing of all resources by multiple identical processors

Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all input and output devices, and are controlled by a single operating system instance that treats all processors equally, reserving none for special purposes. Most multiprocessor systems today use an SMP architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.

Parallel computing Programming paradigm in which many processes are executed simultaneously

Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling. As power consumption by computers has become a concern in recent years, parallel computing has become the dominant paradigm in computer architecture, mainly in the form of multi-core processors.

Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures.

OpenMP Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

Benchmark (computing) Comparing the relative performance of computers by running the same program on all of them

In computing, a benchmark is the act of running a computer program, a set of programs, or other operations, in order to assess the relative performance of an object, normally by running a number of standard tests and trials against it.

Multi-core processor Microprocessor with more than one processing unit

A multi-core processor is a computer processor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core.

In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. Transactional memory systems provide high-level abstraction as an alternative to low-level thread synchronization. This abstraction allows for coordination between concurrent reads and writes of shared data in parallel systems.

Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal execution on a separate independent thread. Such a speculative thread may need to make assumptions about the values of input variables. If these prove to be invalid, then the portions of the speculative thread that rely on these input variables will need to be discarded and squashed. If the assumptions are correct the program can complete in a shorter time provided the thread was able to be scheduled efficiently.

David J. Kuck, a graduate of the University of Michigan, was a professor in the Computer Science Department the University of Illinois at Urbana-Champaign from 1965 to 1993. He is the father of Olympic silver medalist Jonathan Kuck. While at the University of Illinois at Urbana-Champaign he developed the Parafrase compiler system (1977), which was the first testbed for the development of automatic vectorization and related program transformations. In his role as Director (1986–93) of the Center for Supercomputing Research and Development (CSRD-UIUC), Kuck led the construction of the CEDAR project, a hierarchical shared-memory 32-processor SMP supercomputer completed in 1988 at the University of Illinois.

oneAPI Threading Building Blocks, is a C++ template library developed by Intel for parallel programming on multi-core processors. Using TBB, a computation is broken down into tasks that can run in parallel. The library manages and schedules threads to execute these tasks.

Intel Tera-Scale is a research program by Intel that focuses on development in Intel processors and platforms that utilize the inherent parallelism of emerging visual-computing applications. Such applications require teraFLOPS of parallel computing performance to process terabytes of data quickly. Parallelism is the concept of performing multiple tasks simultaneously. Utilizing parallelism will not only increase the efficiency of computer processing units (CPUs), but also increase the bytes of data analyzed each second. In order to appropriately apply parallelism, the CPU must be able to handle multiple threads and to do so the CPU must consist of multiple cores. The conventional amount of cores in consumer grade computers are 2–8 cores while workstation grade computers can have even greater amounts. However, even the current amount of cores aren't great enough to perform at teraFLOPS performance leading to an even greater amount of cores that must be added. As a result of the program, two prototypes have been manufactured that were used to test the feasibility of having many more cores than the conventional amount and proved to be successful.

Ambric, Inc. was a designer of computer processors that developed the Ambric architecture. Its Am2045 Massively Parallel Processor Array (MPPA) chips were primarily used in high-performance embedded systems such as medical imaging, video, and signal-processing.

In transaction processing, the Telecommunication Application Transaction Processing Benchmark (TATP) is a benchmark designed to measure the performance of in-memory database transaction systems.

Kunle Olukotun British-born Nigerian computer scientist

Oyekunle Ayinde "Kunle" Olukotun is a British-born Nigerian computer scientist who is the Cadence Design Systems Professor of the Stanford School of Engineering, Professor of Electrical Engineering and Computer Science at Stanford University and the director of the Stanford Pervasive Parallelism Lab. Olukotun is known as the “father of the multi-core processor”, and the leader of the Stanford Hydra Chip Multiprocessor research project. Olukotun's achievements include designing the first general-purpose multi-core CPU, innovating single-chip multiprocessor and multi-threaded processor design, and pioneering multicore CPUs and GPUs, transactional memory technology and domain-specific languages programming models. Olukotun's research interests include computer architecture, parallel programming environments and scalable parallel systems, domain specific languages and high-level compilers.

Tachyon (software)

Tachyon is a parallel/multiprocessor ray tracing software. It is a parallel ray tracing library for use on distributed memory parallel computers, shared memory computers, and clusters of workstations. Tachyon implements rendering features such as ambient occlusion lighting, depth-of-field focal blur, shadows, reflections, and others. It was originally developed for the Intel iPSC/860 by John Stone for his M.S. thesis at University of Missouri-Rolla. Tachyon subsequently became a more functional and complete ray tracing engine, and it is now incorporated into a number of other open source software packages such as VMD, and SageMath. Tachyon is released under a permissive license.

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorporating specialized processing capabilities to handle particular tasks.

An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, internet of things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

Timothy M. Pinkston is an American computer engineer, researcher, educator and administrator whose work is focused in the area of computer architecture. He holds the George Pfleger Chair in Electrical and Computer Engineering and is a Professor of Electrical and Computer Engineering at University of Southern California (USC). He also serves in an administrative role as Vice Dean for Faculty Affairs at the USC Viterbi School of Engineering.

References


  1. "Intel Teams with Universities on Multicore Software Suite". EDN. Archived from the original on 2013-01-23. Retrieved 2006-08-22.
  2. 1 2 "Designing future computers with future workloads". Research@Intel. Retrieved 2008-02-26.[ dead link ]
  3. "Intel CTO looks into the future: Measuring the value and need for multi-core". Gabe on EDA. Retrieved 2006-08-31.[ dead link ]
  4. "The PARSEC Benchmark Suite". Princeton University. Retrieved 2008-01-05.
  5. Bhadauria, Major; Weaver, Vincent M.; McKee, Sally A. (October 2009), "Understanding PARSEC Performance on Contemporary CMPs", Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE
  6. Barrow-Williams, Nick; Fensch, Christian; Moore, Simon (October 2009), "A Communication Characterization of SPLASH-2 and PARSEC", Proceedings of the 2009 IEEE International Symposium on Workload Characterization, IEEE
  7. Rabaey, Jan M.; Burke, Daniel; Lutz, Ken; Wawrzynek, John (July–August 2008), "Workloads of the Future" (PDF), IEEE Design & Test of Computers, IEEE[ dead link ]
  8. Bienia, Christian; Kumar, Sanjeev; Singh, Jaswinder Pal; Li, Kai (October 2008), "The PARSEC Benchmark Suite: Characterization and Architectural Implications", Proceedings of the 17th international conference on Parallel architectures and compilation techniques, Association for Computing Machinery, New York, NY, USA
  9. Bienia, C.; Kumar, S.; Kai Li (2008). "PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors". 2008 IEEE International Symposium on Workload Characterization. p. 47. doi:10.1109/IISWC.2008.4636090. ISBN   978-1-4244-2777-2.
  10. Bienia, Christian; Li, Kai (June 2009), "PARSEC 2.0: A New Benchmark Suite for Chip-Multiprocessors", Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation, Association for Computing Machinery, New York, NY, USA