Intel Advisor

Last updated
Intel Advisor
Developer(s) Intel Developer Products
Stable release
2021.4 / October 1, 2021;14 months ago (2021-10-01) [1]
Operating system Windows and Linux (UI-only on macOS)
Type Profiler
License Free and commercial support
Website software.intel.com/content/www/us/en/develop/tools/oneapi/components/advisor.html

Intel Advisor (also known as "Advisor XE", "Vectorization Advisor" or "Threading Advisor") is a design assistance and analysis tool for SIMD vectorization, threading, memory use, and GPU offload optimization. The tool supports C, C++, Data Parallel C++ (DPC++), Fortran and Python languages. It is available on Windows and Linux operating systems in form of Standalone GUI tool, Microsoft Visual Studio plug-in or command line interface. [2] It supports OpenMP (and usage with MPI). Intel Advisor user interface is also available on macOS.

Contents

Intel Advisor is available for free as a stand-alone tool or as part of the Intel oneAPI Base Toolkit. Optional paid commercial support is available for the oneAPI Base Toolkit.

Features

Vectorization optimization

Vectorization is the operation of Single Instruction Multiple Data (SIMD) instructions (like Intel Advanced Vector Extensions and Intel Advanced Vector Extensions 512) on multiple objects in parallel within a single CPU core. This can greatly increase performance by reducing loop overhead and making better use of the multiple math units in each core.

Intel Advisor helps find the loops that will benefit from better vectorization, identify where it is safe to force compiler vectorization. [3] It supports analysis of scalar, SSE, AVX, AVX2 and AVX-512-enabled codes generated by Intel, GNU and Microsoft compilers auto-vectorization. It also supports analysis of "explicitly" vectorized codes which use OpenMP 4.x and newer as well as codes or written using C vector intrinsics or assembly language. [4] [5]

Automated Roofline analysis

Intel Advisor automates the Roofline Performance Model first proposed at Berkeley [6] and extended at the University of Lisbon. [7]

Roofline Performance Model automation integrated with other features in Intel Advisor. Each circle corresponds to one loop or function Roofline in Intel Advisor.png
Roofline Performance Model automation integrated with other features in Intel Advisor. Each circle corresponds to one loop or function

Advisor "Roofline Analysis" helps to identify if given loop/function is memory or CPU bound. It also identifies under optimized loops that can have a high impact on performance if optimized. [8] [9] [10] [11]

Intel Advisor also provides an automated memory-level roofline implementation that is closer to the classical Roofline model. Classical Roofline is especially instrumental for high performance computing applications that are DRAM-bound. Advisor memory level roofline analyzes cache data and evaluates the data transactions between different memory layers to provide guidance for improvement. [12]

Intel Advisor roofline analysis supports code running on CPU or GPU. [13] [14] It also supports integer based applications - that is heavily used in machine learning, big data domains, database applications, financial applications like crypto-coins. [15]

Threading prototyping

Software architects add code annotations to describe threading that are understood by Advisor, but ignored by the compiler. Advisor then projects the scalability of the threading and checks for synchronization errors. Advisor Threading "Suitability" feature helps to predict and compare the parallel SMP scalability and performance losses for different possible threading designs. Typical Suitability reports are shown on Suitability CPU screen-shot on the right side. Advisor Suitability provides dataset size (iteration space) modeling capabilities and performance penalties break-down (exposing negative impact caused by Load Imbalance, Parallel Runtimes Overhead and Lock Contention). [16]

Suitability "CPU model" IntelAdvisorSuitabilityCPU.png
Suitability "CPU model"

Offload modelling

Advisor adds GPU offload performance modeling feature in the 2021 release. It collects application performance characteristics on a baseline platform and builds analytical performance model for target (modelled) platform.

This provides performance speedup estimation on target GPUs and overhead estimations for offloading, data transfer and scheduling region execution and pinpoints performance bottlenecks. [17] [18] [19] This information can serve for choosing offload strategy: selecting regions to offload and anticipate potential code restructuring needed to make it GPU-ready.

Customer usage

Intel Advisor is used by Schlumberger, [20] Sandia national lab, and others [21] for design and parallel algorithm research and Vectorization Advisor capabilities known to be used by LRZ and ICHEC, [22] Daresbury Lab, [23] Pexip. [24]

The step-by-step workflow is used by academia for educational purposes. [25]

See also

Related Research Articles

<span class="mw-page-title-main">Single instruction, multiple data</span> Type of parallel processing

Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal and it can be directly accessible through an instruction set architecture (ISA), but it should not be confused with an ISA. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously.

In computing, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (SIMD) instruction set extension to the x86 architecture, designed by Intel and introduced in 1999 in their Pentium III series of Central processing units (CPUs) shortly after the appearance of Advanced Micro Devices (AMD's) 3DNow!. SSE contains 70 new instructions, most of which work on single precision floating-point data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are digital signal processing and graphics processing.

In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called vectors. This is in contrast to scalar processors, whose instructions operate on single data items only, and in contrast to some of those same scalar processors having additional single instruction, multiple data (SIMD) or SWAR Arithmetic Units. Vector processors can greatly improve performance on certain workloads, notably numerical simulation and similar tasks. Vector processing techniques also operate in video-game console hardware and in graphics accelerators.

<span class="mw-page-title-main">POWER7</span> 2010 family of multi-core microprocessors by IBM

POWER7 is a family of superscalar multi-core microprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6 and POWER6+. POWER7 was developed by IBM at several sites including IBM's Rochester, MN; Austin, TX; Essex Junction, VT; T. J. Watson Research Center, NY; Bromont, QC and IBM Deutschland Research & Development GmbH, Böblingen, Germany laboratories. IBM announced servers based on POWER7 on 8 February 2010.

In computer science, stream processing is a programming paradigm which views data streams, or sequences of events in time, as the central input and output objects of computation. Stream processing encompasses dataflow programming, reactive programming, and distributed data processing. Stream processing systems aim to expose parallel processing for data streams and rely on streaming algorithms for efficient implementation. The software stack for these systems includes components such as programming models and query languages, for expressing computation; stream management systems, for distribution and scheduling; and hardware components for acceleration including floating-point units, graphics processing units, and field-programmable gate arrays.

VTune Profiler is a performance analysis tool for x86 based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but advanced hardware-based sampling requires an Intel-manufactured CPU.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

<span class="mw-page-title-main">The Portland Group</span> American technology company

PGI was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc. As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the Nvidia HPC SDK product available as a free download from Nvidia.

<span class="mw-page-title-main">Larrabee (microarchitecture)</span>

Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in Whatcom County, Washington, near the town of Bellingham. The chip was to be released in 2010 as the core of a consumer 3D graphics card, but these plans were cancelled due to delays and disappointing early performance figures. The project to produce a GPU retail product directly from the Larrabee research project was terminated in May 2010 and its technology was passed on to the Xeon Phi. The Intel MIC multiprocessor architecture announced in 2010 inherited many design elements from the Larrabee project, but does not function as a graphics processing unit; the product is intended as a co-processor for high performance computing.

Intel Fortran Compiler, is a group of Fortran compilers from Intel for Windows, macOS, and Linux.

Advanced Vector Extensions (AVX) are extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with the Sandy Bridge processor shipping in Q1 2011 and later by AMD with the Bulldozer processor shipping in Q3 2011. AVX provides new features, new instructions and a new coding scheme.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants:

Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores. Manycore processors are used extensively in embedded computers and high-performance computing.

<span class="mw-page-title-main">Intel Core</span> Line of CPUs by Intel

Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time of their introduction, moving the Pentium to the entry level. Identical or more capable versions of Core processors are also sold as Xeon processors for the server and workstation markets.

<span class="mw-page-title-main">Xeon Phi</span> Series of x86 manycore processors from Intel

Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programming interfaces (APIs) such as OpenMP.

AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 and Skylake-X CPUs; this includes the Core-X series, as well as the new Xeon Scalable Processor Family and Xeon D-2100 Embedded Series. AVX-512 consists of multiple extensions that may be implemented independently. This policy is a departure from the historical requirement of implementing the entire instruction block. Only the core extension AVX-512F is required by all AVX-512 implementations.

<span class="mw-page-title-main">Intel Xe</span> Intel GPU architecture

Intel Xe, earlier known unofficially as Gen12, is a GPU architecture developed by Intel.

References

  1. "Intel® Advisor Release Notes and New Features".
  2. "Command Line Use Cases". Intel. Retrieved 2021-01-05.
  3. "Optimize Vectorization Aspects of a Real-Time 3D Cardiac..." Intel. Retrieved 2021-01-07.
  4. "HPC Code Modernization Tools" (PDF).{{cite web}}: CS1 maint: url-status (link)
  5. "Новый инструмент анализа SIMD программ — Vectorization Advisor". habr.com (in Russian). Retrieved 2021-01-05.
  6. Williams, Samuel (April 2009). "Roofline: An insightful Visual Performance model for multicore Architectures" (PDF). University of Berkeley.
  7. Ilic, Aleksandar. "Cache-aware Roofline model: Upgrading the loft" (PDF). Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa.
  8. "Roofline Analysis in Intel Advisor 2017: youtube how-to video". YouTube .
  9. "Intel Advisor Roofline step-by-step Tutorial".
  10. "Using Roofline Model and Intel Advisor, presented by Sam Williams, Roofline performance model author".
  11. "Case Study: SimYog Improves a Simulation Tool Performance by 2x with..." Intel. Retrieved 2021-01-07.
  12. "Memory-Level Roofline Model with Intel® Advisor". Intel. Retrieved 2021-01-05.
  13. "CPU / Memory Roofline Insights Perspective". Intel. Retrieved 2021-01-05.
  14. "GPU Roofline Insights Perspective". Intel. Retrieved 2021-01-05.
  15. "Integer Roofline Modeling in Intel® Advisor". Intel. Retrieved 2021-01-05.
  16. "How to model suitability using Advisor XE 2015?".
  17. "Offload Modeling Resources for Intel® Advisor Users". Intel. Retrieved 2021-01-05.
  18. "Identify Code Regions to Offload to GPU and Visualize GPU Usage (Beta)". Intel. Retrieved 2021-01-05.
  19. "Offload Modeling Perspective". Intel. Retrieved 2021-01-05.
  20. "Schlumberger* - Parallelize Oil and Gas software with Intel Software products" (PDF).
  21. ""Leading design" company Advisor XE case study" (PDF).
  22. "Design Code for Parallelism and Offloading with Intel® Advisor".
  23. "Computer-Aided Formulation case study: getting helping hand from the Vectorization Advisor".
  24. "Pexip Speeds Enterprise-Grade Videoconferencing" (PDF).
  25. "Supercomputing'2012 HPC educator with Slippery Rock University".