OpenACC

Last updated
OpenACC
Developer(s) The OpenACC Organization
Stable release
2.7 / November 2018;4 years ago (2018-11)
Written in C, C++, and Fortran
Operating system Cross-platform
Platform Cross-platform
Type API
Website www.openacc.org

OpenACC (for open accelerators) is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. [1]

Contents

As in OpenMP, the programmer can annotate C, C++ and Fortran source code to identify the areas that should be accelerated using compiler directives and additional functions. [2] Like OpenMP 4.0 and newer, OpenACC can target both the CPU and GPU architectures and launch computational code on them.

OpenACC members have worked as members of the OpenMP standard group to merge into OpenMP specification to create a common specification which extends OpenMP to support accelerators in a future release of OpenMP. [3] [4] These efforts resulted in a technical report [5] for comment and discussion timed to include the annual Supercomputing Conference (November 2012, Salt Lake City) and to address non-Nvidia accelerator support with input from hardware vendors who participate in OpenMP. [6]

At ISC’12 OpenACC was demonstrated to work on Nvidia, AMD and Intel accelerators, without performance data. [7]

On November 12, 2012, at the SC12 conference, a draft of the OpenACC version 2.0 specification was presented. [8] New suggested capabilities include new controls over data movement (such as better handling of unstructured data and improvements in support for non-contiguous memory), and support for explicit function calls and separate compilation (allowing the creation and reuse of libraries of accelerated code). OpenACC 2.0 was officially released in June 2013. [9]

Version 2.5 of the specification was released in October 2015, [10] while version 2.6 was released in November 2017. [11] The latest version of specification, version 2.7, was released in November 2018. [12]

Compiler support

Support of OpenACC is available in commercial compilers from PGI (from version 12.6), and (for Cray hardware only) Cray. [7] [13]

OpenUH [14] is an Open64 based open source OpenACC compiler supporting C and FORTRAN, developed by HPCTools group from University of Houston.

OpenARC [15] is an open source C compiler developed at Oak Ridge National Laboratory to support all features in the OpenACC 1.0 specification. An experimental [16] open source compiler, accULL, is developed by the University of La Laguna (C language only). [17]

Omni Compiler [18] [19] is an open source compiler developed at HPCS Laboratory. of University of Tsukuba and Programming Environment Research Team of RIKEN Center for Computational Science, Japan, supported OpenACC, XcalableMP  [ ja ] and XcalableACC  [ ja ] combining XcalableMP and OpenACC.

IPMACC [20] is an open source C compiler developed by University of Victoria that translates OpenACC to CUDA, OpenCL, and ISPC. Currently, only following directives are supported: data, kernels, loop, and cache.

GCC support for OpenACC was slow in coming. [21] A GPU-targeting implementation from Samsung was announced in September 2013; this translated OpenACC 1.1-annotated code to OpenCL. [16] The announcement of a "real" implementation followed two months later, this time from NVIDIA and based on OpenACC 2.0. [22] This sparked some controversy, as the implementation would only target NVIDIA's own PTX assembly language, for which no open source assembler or runtime was available. [23] [24] Experimental support for OpenACC/PTX did end up in GCC as of version 5.1. GCC6 and GCC7 release series include a much improved implementation of the OpenACC 2.0a specification. [25] [26] GCC 9.1 offers nearly complete OpenACC 2.5 support. [27]

Usage

In a way similar to OpenMP 3.x on homogeneous system or the earlier OpenHMPP, the primary mode of programming in OpenACC is directives. [28] The specifications also include a runtime library defining several support functions. To exploit them, user should include "openacc.h" in C or "openacc_lib.h" in Fortran; [29] and then call acc_init() function.

Directives

OpenACC defines an extensive list of pragmas (directives), [30] for example:

#pragma acc parallel#pragma acc kernels

Both are used to define parallel computation kernels to be executed on the accelerator, using distinct semantics [31] [32]

#pragma acc data

Is the main directive to define and copy data to and from the accelerator.

#pragma acc loop

Is used to define the type of parallelism in a parallel or kernels region.

#pragma acc cache#pragma acc update#pragma acc declare#pragma acc wait

Runtime API

There are some runtime API functions defined too: acc_get_num_devices(), acc_set_device_type(), acc_get_device_type(), acc_set_device_num(), acc_get_device_num(), acc_async_test(), acc_async_test_all(), acc_async_wait(), acc_async_wait_all(), acc_init(), acc_shutdown(), acc_on_device(), acc_malloc(), acc_free().

OpenACC generally takes care of work organisation for the target device however this can be overridden through the use of gangs and workers. A gang consists of workers and operates over a number of processing elements (as with a workgroup in OpenCL).

See also

Related Research Articles

<span class="mw-page-title-main">OpenMP</span> Open standard for parallelizing

OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, AIX, FreeBSD, HP-UX, Linux, macOS, and Windows. It consists of a set of compiler directives, library routines, and environment variables that influence run-time behavior.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

The Direct Rendering Manager (DRM) is a subsystem of the Linux kernel responsible for interfacing with GPUs of modern video cards. DRM exposes an API that user-space programs can use to send commands and data to the GPU and perform operations such as configuring the mode setting of the display. DRM was first developed as the kernel-space component of the X Server Direct Rendering Infrastructure, but since then it has been used by other graphic stack alternatives such as Wayland and standalone applications and libraries such as SDL2 and Kodi.

Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.

<span class="mw-page-title-main">Free and open-source graphics device driver</span> Software that controls computer-graphics hardware

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

CUDA is a proprietary and closed source parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

nouveau (software) Open source software driver for Nvidia GPU

nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.

Unified Video Decoder is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.

<span class="mw-page-title-main">The Portland Group</span> American technology company

PGI was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc. As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the Nvidia HPC SDK product available as a free download from Nvidia.

<span class="mw-page-title-main">Tegra</span> System on a chip by Nvidia

Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics processing unit (GPU), northbridge, southbridge, and memory controller onto one package. Early Tegra SoCs are designed as efficient multimedia processors. The Tegra-line evolved to emphasize performance for gaming and machine learning applications without sacrificing power efficiency, before taking a drastic shift in direction towards platforms that provide vehicular automation with the applied "Nvidia Drive" brand name on reference boards and its semiconductors; and with the "Nvidia Jetson" brand name for boards adequate for AI applications within e.g. robots or drones, and for various smart high level automation purposes.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

X-Video Bitstream Acceleration (XvBA), designed by AMD Graphics for its Radeon GPU and APU, is an arbitrary extension of the X video extension (Xv) for the X Window System on Linux operating-systems. XvBA API allows video programs to offload portions of the video decoding process to the GPU video-hardware. Currently, the portions designed to be offloaded by XvBA onto the GPU are currently motion compensation (MC) and inverse discrete cosine transform (IDCT), and variable-length decoding (VLD) for MPEG-2, MPEG-4 ASP, MPEG-4 AVC (H.264), WMV3, and VC-1 encoded video.

Video Decode and Presentation API for Unix (VDPAU) is a royalty-free application programming interface (API) as well as its implementation as free and open-source library distributed under the MIT License. VDPAU is also supported by Nvidia.

<span class="mw-page-title-main">Mode setting</span>

Mode setting is a software operation that activates a display mode for a computer's display controller by using VESA BIOS Extensions or UEFI Graphics extensions.

OpenHMPP - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity associated with GPU programming. This approach based on directives has been implemented because they enable a loose relationship between an application code and the use of a hardware accelerator (HWA).

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.

Vulkan is a low-overhead, cross-platform API and open standard for 3D graphics and computing. It was originally developed as Mantle by AMD, but was later given to Khronos Group. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU.

<span class="mw-page-title-main">SYCL</span> Higher-level programming standard for heterogeneous computing

SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), OpenCL.

oneAPI (compute acceleration) Open standard for parallel computing

one-API is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is an open, cross-industry, standards-based, unified, multi-architecture, multi-vendor programming model that delivers a common developer experience across accelerator architectures - for faster application performance, more productivity, and greater innovation. The one-API initiative encourages collaboration on the one-API specification and compatible one-API implementations across the ecosystem. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.

References

  1. "Nvidia, Cray, PGI, and CAPS launch 'OpenACC' programming standard for parallel computing". The Inquirer. 4 November 2011. Archived from the original on November 17, 2011.{{cite web}}: CS1 maint: unfit URL (link)
  2. "OpenACC standard version 2.5" (PDF). OpenACC.org. Retrieved 2 June 2017.
  3. "How does the OpenACC API relate to the OpenMP API?". OpenACC.org. Retrieved 14 January 2014.
  4. "How did the OpenACC specifications originate?". OpenACC.org. Retrieved 14 January 2014.
  5. "The OpenMP Consortium Releases First Technical Report". OpenMP.org. 5 November 2012. Retrieved 14 January 2014.
  6. "OpenMP at SC12". OpenMP.org. 29 August 2012. Retrieved 14 January 2014.
  7. 1 2 "OpenACC Group Reports Expanding Support for Accelerator Programming Standard". HPCwire. 20 June 2012. Archived from the original on 23 June 2012. Retrieved 14 January 2014.
  8. "OpenACC Version 2.0 Posted for Comment". OpenACC.org. 12 November 2012. Retrieved 14 January 2014.
  9. "OpenACC 2.0 Spec | www.openacc.org". www.openacc.org. Archived from the original on 2016-04-04. Retrieved 2016-03-23.
  10. "OpenACC Standards Group Announces Release of the 2.5 Specification; Member Vendors Add Support for ARM & x86 as Parallel Devices | www.openacc.org". www.openacc.org. Archived from the original on 2016-07-26. Retrieved 2016-03-22.
  11. "What's new in OpenACC 2.6? | OpenACC". www.openacc.org. Retrieved 2018-05-01.
  12. "What's new in OpenACC 2.7! | OpenACC". www.openacc.org. Retrieved 2019-01-07.
  13. "OpenACC Standard to Help Developers to Take Advantage of GPU Compute Accelerators". Xbit laboratories. 16 November 2011. Archived from the original on 16 January 2014. Retrieved 14 January 2014.
  14. "OpenUH Compiler". Archived from the original on 25 January 2014. Retrieved 4 March 2014.
  15. "OpenARC Compiler" . Retrieved 4 November 2014.
  16. 1 2 Larabel, Michael (30 September 2013). "GCC Support Published For OpenACC On The GPU". Phoronix .
  17. "accULL The OpenACC research implementation" . Retrieved 14 January 2014.
  18. "Omni Compiler". omni-compiler.org. Retrieved 2019-11-18.
  19. Omni Compiler for C and Fortran programs with XcalableMP and OpenACC directives: omni-compiler/omni-compiler, omni-compiler, 2019-10-17, retrieved 2019-11-17
  20. "IPMACC Compiler" . Retrieved 31 January 2017.
  21. Larabel, Michael (4 December 2012). "OpenACC Still Not Loved By Open Compilers". Phoronix .
  22. Larabel, Michael (14 November 2013). "OpenACC 2.0 With GPU Support Coming To GCC". Phoronix .
  23. Larabel, Michael (15 November 2013). "NVIDIA, Mentor Graphics May Harm GCC". Phoronix .
  24. Larabel, Michael (21 November 2013). "In-Fighting Continues Over OpenACC In GCC". Phoronix .
  25. "OpenACC - GCC Wiki".
  26. Schwinge, Thomas (15 January 2015). "Merge current set of OpenACC changes from gomp-4_0-branch". gcc (Mailing list). gcc.gnu.org. Retrieved 15 January 2015.
  27. Jelinek, Jakub (3 May 2019). "GCC 9.1 Released". LWN.net .
  28. "Easy GPU Parallelism with OpenACC". Dr.Dobb's. 11 June 2012. Retrieved 14 January 2014.
  29. "OpenACC API QuickReference Card, version 1.0" (PDF). NVidia. November 2011. Retrieved 14 January 2014.
  30. "OpenACC standard version 2.0" (PDF). OpenACC.org. Retrieved 14 January 2014.
  31. "OpenACC Kernels and Parallel Constructs". PGI insider. August 2012. Retrieved 14 January 2014.
  32. "OpenACC parallel section VS kernels". CAPS entreprise Knowledge Base. 3 January 2013. Archived from the original on 16 January 2014. Retrieved 14 January 2014.