Arm MAP

Last updated
Arm MAP
Developer(s) Arm Holdings (formerly Allinea Software Ltd.)
Initial release2013
Stable release
20.2 / November 2020;2 years ago (2020-11)
Operating system Linux (Windows and macOS for remote client)
Platform x86-64, Arm, PowerPC, Intel Xeon Phi
Available inEnglish
Type Profiler
Licence Proprietary commercial software
Website www.arm.com/products/development-tools/server-and-hpc/forge/map

Arm MAP, is an application profiler produced by Allinea Software now part of Arm. [1] [2] of Warwick, United Kingdom, for profiling the performance of C, C++, Fortran 90 and Python software. It is widely used for its multithreaded and multiprocess capabilities such as profiling parallel Message Passing Interface (MPI) or OpenMP applications, including those running on clusters of Linux machines, and for scalar (sequential) code. [3]

Contents

Profiler

It is one of the first profilers able to both analyze and visually display the performance when running at high scales (including many thousands of cores). Arm MAP is also being used to examine applications that are preparing to scale to 1 ExaFLOP/s [4]

The profiler uses adaptive sampling methods to identify process counters and activities and combines data from multiple processes that may be running on multiple compute server nodes. It analyzes performance and causes of bottlenecks including:

This enables developers to identify hotspots and areas of potential improvement.

The tool is scalable - and merges performance data using the scalable architecture first used in Arm DDT to debug Petascale (typically over 100,000 processes). Arm MAP shares a common user interface with Arm DDT which together makes the Arm Forge tool suite. This is widely used by research scientists and developers of parallel scientific applications.

The Arm MAP profiler is used on large supercomputers and also smaller clusters or workstations. Sites with installations include National Energy Research Scientific Computing Center (NERSC), University of Cambridge and Los Alamos National Laboratory.

Related Research Articles

<span class="mw-page-title-main">Supercomputer</span> Type of extremely powerful computer

A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions per second (MIPS). Since 2017, there have existed supercomputers which can perform over 1017 FLOPS (a hundred quadrillion FLOPS, 100 petaFLOPS or 100 PFLOPS). For comparison, a desktop computer has performance in the range of hundreds of gigaFLOPS (1011) to tens of teraFLOPS (1013). Since November 2017, all of the world's fastest 500 supercomputers run on Linux-based operating systems. Additional research is being conducted in the United States, the European Union, Taiwan, Japan, and China to build faster, more powerful and technologically superior exascale supercomputers.

<span class="mw-page-title-main">Beowulf cluster</span> Type of computing cluster

A Beowulf cluster is a computer cluster of what are normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed which allow processing to be shared among them. The result is a high-performance parallel computing cluster from inexpensive personal computer hardware.

Message Passing Interface (MPI) is a standardized and portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of users writing portable message-passing programs in C, C++, and Fortran. There are several open-source MPI implementations, which fostered the development of a parallel software industry, and encouraged development of portable and scalable large-scale parallel applications.

<span class="mw-page-title-main">Quadrics (company)</span>

Quadrics was a supercomputer company formed in 1996 as a joint venture between Alenia Spazio and the technical team from Meiko Scientific. They produced hardware and software for clustering commodity computer systems into massively parallel systems. Their highpoint was in June 2003 when six out of the ten fastest supercomputers in the world were based on Quadrics' interconnect. They officially closed on June 29, 2009.

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

Checkpointing is a technique that provides fault tolerance for computing systems. It basically consists of saving a snapshot of the application's state, so that applications can restart from that point in case of failure. This is particularly important for long running applications that are executed in failure-prone computing systems.

<span class="mw-page-title-main">High-performance computing</span> Computing with supercomputers and clusters

High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.

Arm DDT is a commercial C, C++ and Fortran 90 debugger produced by Allinea Software now part of Arm of Warwick, United Kingdom. It is widely used for debugging parallel Message Passing Interface (MPI) and threaded programs, including those running on clusters of Linux machines.

<span class="mw-page-title-main">Open MPI</span> Message Passing Interface software library

Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects. It is used by many TOP500 supercomputers including Roadrunner, which was the world's fastest supercomputer from June 2008 to November 2009, and K computer, the fastest supercomputer from June 2011 to June 2012.

<span class="mw-page-title-main">Rogue Wave Software</span> American software company

Rogue Wave Software was an American software development company based in Louisville, Colorado. It provided cross-platform software development tools and embedded components for parallel, data-intensive, and other high-performance computing (HPC) applications.

<span class="mw-page-title-main">Computer cluster</span> Set of computers configured in a distributed computing system

A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

Windows HPC Server 2008, released by Microsoft on 22 September 2008, is the successor product to Windows Compute Cluster Server 2003. Like WCCS, Windows HPC Server 2008 is designed for high-end applications that require high performance computing clusters. This version of the server software is claimed to efficiently scale to thousands of cores. It includes features unique to HPC workloads: a new high-speed NetworkDirect RDMA, highly efficient and scalable cluster management tools, a service-oriented architecture (SOA) job scheduler, an MPI library based on open-source MPICH2, and cluster interoperability through standards such as the High Performance Computing Basic Profile (HPCBP) specification produced by the Open Grid Forum (OGF).

Many-task computing (MTC) in computational science is an approach to parallel computing that aims to bridge the gap between two computing paradigms: high-throughput computing (HTC) and high-performance computing (HPC).

A lightweight kernel (LWK) operating system is one used in a large computer with many processor cores, termed a parallel computer.

<span class="mw-page-title-main">Slurm Workload Manager</span> Free and open-source job scheduler for Linux and similar computers

The Slurm Workload Manager, formerly known as Simple Linux Utility for Resource Management (SLURM), or simply Slurm, is a free and open-source job scheduler for Linux and Unix-like kernels, used by many of the world's supercomputers and computer clusters.

<span class="mw-page-title-main">BeeGFS</span> Distributed file system

BeeGFS is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data throughput.

<span class="mw-page-title-main">Message passing in computer clusters</span> Aspect of computer clusters

Message passing is an inherent element of all computer clusters. All computer clusters, ranging from homemade Beowulfs to some of the fastest supercomputers in the world, rely on message passing to coordinate the activities of the many nodes they encompass. Message passing in computer clusters built with commodity servers and switches is used by virtually every internet service.

Intel Advisor is a design assistance and analysis tool for SIMD vectorization, threading, memory use, and GPU offload optimization. The tool supports C, C++, Data Parallel C++ (DPC++), Fortran and Python languages. It is available on Windows and Linux operating systems in form of Standalone GUI tool, Microsoft Visual Studio plug-in or command line interface. It supports OpenMP. Intel Advisor user interface is also available on macOS.

Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with big data by using high-performance statistical computation. The pbdR uses the same programming language as R with S3/S4 classes and methods which is used among statisticians and data miners for developing statistical software. The significant difference between pbdR and R code is that pbdR mainly focuses on distributed memory systems, where data are distributed across several processors and analyzed in a batch mode, while communications between processors are based on MPI that is easily used in large high-performance computing (HPC) systems. R system mainly focuses on single multi-core machines for data analysis via an interactive mode such as GUI interface.

References

  1. "Stephen Hawking COSMOS consortium deploys new supercomputer software". Computerworld UK. 26 June 2013.
  2. "How iVEC will use Arm MAP as a Secret Weapon in the SC13 Student Cluster Competition". Radio HPC. 21 October 2013.
  3. "Arm MAPs out new performance analysis tool - crowd sources design". InsideHPC. 15 November 2012.
  4. "When Applications Go Exascale". 14 February 2014.