OneAPI (compute acceleration)

Last updated
oneAPI
Repository github.com/oneapi-src
Operating system Cross-platform
Platform Cross-platform
Type Open-source software specification for parallel programming
Website www.oneapi.io OOjs UI icon edit-ltr-progressive.svg

oneAPI is an open standard, adopted by Intel, [1] for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture. [2] [3] [4] [5]

Contents

oneAPI competes with other GPU computing stacks: CUDA by Nvidia and ROCm by AMD.

Specification

The oneAPI specification extends existing developer programming models to enable multiple hardware architectures through a data-parallel language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack. [6] [7]

Data Parallel C++

DPC++ [8] [9] is a programming language implementation of oneAPI, built upon the ISO C++ and Khronos Group SYCL standards. [10] DPC++ is an implementation of SYCL with extensions that are proposed for inclusion in future revisions of the SYCL standard, including: unified shared memory, group algorithms, and sub-groups. [11] [12] [13]

Libraries

The set of APIs [6] spans several domains, including libraries for linear algebra, deep learning, machine learning, video processing, and others.

Library NameShort

Name

Description
oneAPI DPC++ LibraryoneDPLAlgorithms and functions to speed DPC++ kernel programming
oneAPI Math Kernel Library oneMKLMath routines including matrix algebra, FFT, and vector math
oneAPI Data Analytics Library oneDALMachine learning and data analytics functions
oneAPI Deep Neural Network LibraryoneDNNNeural networks functions for deep learning training and inference
oneAPI Collective Communications LibraryoneCCLCommunication patterns for distributed deep learning
oneAPI Threading Building Blocks oneTBBThreading and memory management template library
oneAPI Video Processing LibraryoneVPLReal-time video encode, decode, transcode, and processing

The source code of parts of the above libraries is available on GitHub. [14]

The oneAPI documentation also lists the "Level Zero" API defining the low-level direct-to-metal interfaces and a set of ray tracing components with its own APIs. [6]

Hardware abstraction layer

oneAPI Level Zero, [15] [16] [17] the low-level hardware interface, defines a set of capabilities and services that a hardware accelerator needs to interface with compiler runtimes and other developer tools.

Implementations

Intel has released oneAPI production toolkits that implement the specification and add CUDA code migration, analysis, and debug tools. [18] [19] [20] These include the Intel oneAPI DPC++/C++ Compiler, [21] Intel Fortran Compiler, Intel VTune Profiler [22] and multiple performance libraries.

Codeplay has released an open-source layer [23] [24] [25] to allow oneAPI and SYCL/DPC++ to run atop Nvidia GPUs via CUDA.

University of Heidelberg has developed a SYCL/DPC++ implementation for both AMD and Nvidia GPUs. [26]

Huawei released a DPC++ compiler for their Ascend AI Chipset [27]

Fujitsu has created an open-source ARM version of the oneAPI Deep Neural Network Library (oneDNN) [28] for their Fugaku CPU.

Unified Acceleration Foundation (UXL) and the future for oneAPI

Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the contiuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal will compete with Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware. [29]

Related Research Articles

<span class="mw-page-title-main">OpenGL</span> Cross-platform graphics API

OpenGL is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardware-accelerated rendering.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation, vision acceleration and machine learning. The open standards and associated conformance tests enable software applications and middleware to effectively harness authoring and accelerated playback of dynamic media across a wide variety of platforms and devices. The group is based in Beaverton, Oregon.

<span class="mw-page-title-main">OpenGL ES</span> Subset of the OpenGL API for embedded systems

OpenGL for Embedded Systems is a subset of the OpenGL computer graphics rendering application programming interface (API) for rendering 2D and 3D computer graphics such as those used by video games, typically hardware-accelerated using a graphics processing unit (GPU). It is designed for embedded systems like smartphones, tablet computers, video game consoles and PDAs. OpenGL ES is the "most widely deployed 3D graphics API in history".

<span class="mw-page-title-main">Mesa (computer graphics)</span> Free and open-source library for 3D graphics rendering

Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.

VTune Profiler is a performance analysis tool for x86-based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but the advanced hardware-based sampling features require an Intel-manufactured CPU.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

Video Acceleration API (VA-API) is an open source application programming interface that allows applications such as VLC media player or GStreamer to use hardware video acceleration capabilities, usually provided by the graphics processing unit (GPU). It is implemented by the free and open-source library libva, combined with a hardware-specific driver, usually provided together with the GPU driver.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.

Video Decode and Presentation API for Unix (VDPAU) is a royalty-free application programming interface (API) as well as its implementation as free and open-source library distributed under the MIT License. VDPAU is also supported by Nvidia.

<span class="mw-page-title-main">Intel Graphics Technology</span> Series of integrated graphics processors by Intel

Intel Graphics Technology (GT) is the collective name for a series of integrated graphics processors (IGPs) produced by Intel that are manufactured on the same package or die as the central processing unit (CPU). It was first introduced in 2010 as Intel HD Graphics and renamed in 2017 as Intel UHD Graphics.

OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.

<span class="mw-page-title-main">Metal (API)</span> iOS, macOS, and tvOS graphics rendering API

Metal is a low-level, low-overhead hardware-accelerated 3D graphic and compute shader API created by Apple, debuting in iOS 8. Metal combines functions similar to OpenGL and OpenCL in one API. It is intended to improve performance by offering low-level access to the GPU hardware for apps on iOS, iPadOS, macOS, and tvOS. It can be compared to low-level APIs on other platforms such as Vulkan and DirectX 12.

OpenVX is an open, royalty-free standard for cross-platform acceleration of computer vision applications. It is designed by the Khronos Group to facilitate portable, optimized and power-efficient processing of methods for vision algorithms. This is aimed for embedded and real-time programs within computer vision and related scenarios. It uses a connected graph representation of operations.

Vulkan is a low-level, low-overhead cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and operating systems, and it is also designed to work with modern multi-core CPUs.

<span class="mw-page-title-main">SYCL</span> Higher-level programming standard for heterogeneous computing

SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.

Nvidia NVDEC is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU. NVDEC is a successor of PureVideo and is available in Kepler and later NVIDIA GPUs.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP, and OpenCL.

References

  1. Fortenberry & Tomov 2022, p. 22.
  2. "Intel Expands its Silicon Portfolio, and oneAPI Software Initiative for Next-Generation HPC". HPCwire. 2019-12-09. Retrieved 2020-02-11.
  3. "Intel Debuts New GPU – Ponte Vecchio – and Outlines Aspirations for oneAPI". HPCwire. 2019-11-18. Retrieved 2020-02-11.
  4. "SC19: Intel Unveils New GPU Stack, oneAPI Development Effort - ExtremeTech". www.extremetech.com. Retrieved 2020-02-11.
  5. Kennedy, Patrick (2018-12-24). "Intel One API to Rule Them All Is Much Needed to Expand TAM". ServeTheHome. Retrieved 2020-02-11.
  6. 1 2 3 "oneAPI Specification". oneAPI.
  7. "Preparing for the Arrival of Intel's Discrete High-Performance GPUs". HPCwire. 2021-03-23. Retrieved 2021-03-29.
  8. "Data Parallel C++: Mastering DPC++ for Programming of Heterogeneous Systems Using C++ and SYCL". Apress.
  9. Team, Editorial (2019-12-16). "Heterogeneous Computing Programming: oneAPI and Data Parallel C++". insideBIGDATA. Retrieved 2020-02-11.
  10. "The Khronos Group". The Khronos Group. 2020-02-11. Retrieved 2020-02-11.
  11. "Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification". The Khronos Group. 2020-06-30. Retrieved 2020-07-06.
  12. staff (2020-06-30). "New, Open DPC++ Extensions Complement SYCL and C++". insideHPC. Retrieved 2020-07-06.
  13. "SYCL 2020 Launches with New Name, New Features, and High Ambition". HPCwire. 2021-02-09. Retrieved 2021-02-16.
  14. "oneAPI-SRC". GitHub.
  15. Verheyde 2019-12-08T16:11:19Z, Arne (8 December 2019). "Intel Releases Bare-Metal oneAPI Level Zero Specification". Tom's Hardware. Retrieved 2020-02-11.{{cite web}}: CS1 maint: numeric names: authors list (link)
  16. "Intel's Compute Runtime Adds oneAPI Level Zero Support - Phoronix". www.phoronix.com. Retrieved 2020-03-10.
  17. "Initial Benchmarks With Intel oneAPI Level Zero Performance - Phoronix". www.phoronix.com. Retrieved 2020-04-13.
  18. "Intel Champions XPU Vision With oneAPI, Data Center GPUs - SDxCentral". SDxCentral. 2020-11-11. Retrieved 2020-11-11.
  19. "Intel Debuts oneAPI Gold and Provides More Details on GPU Roadmap". HPCwire. 2020-11-11. Retrieved 2020-11-11.
  20. Moorhead, Patrick. "Intel Announces Gold Release Of OneAPI Toolkits And New Intel Server GPU". Forbes. Retrieved 2020-12-08.
  21. "Data Parallel C++ for Cross-Architecture Applications". Intel. Retrieved 2021-10-07.
  22. "Fix Performance Bottlenecks with Intel® VTune™ Profiler". Intel. Retrieved 2021-10-07.
  23. "Codeplay Open Sources a Version of DPC++ for Nvidia GPUs". HPCwire. 2020-02-05. Retrieved 2020-02-12.
  24. "Intel's oneAPI / DPC++ / SYCL Will Run Atop NVIDIA GPUs With Open-Source Layer - Phoronix". www.phoronix.com. Retrieved 2019-12-06.
  25. "Codeplay - Codeplay contribution to DPC++ brings SYCL support for NVIDIA GPUs". www.codeplay.com. Retrieved 2020-02-11.
  26. Salter, Jim (2020-09-30). "Intel, Heidelberg University team up to bring Radeon GPU support to AI". Ars Technica. Retrieved 2021-10-07.
  27. Extending DPC++ with Support for Huawei Ascend AI Chipset, 27 April 2021, retrieved 2021-10-07
  28. fltech (19 November 2020). "A Deep Dive into a Deep Learning Library for the A64FX Fugaku CPU - The Development Story in the Developer's Own Words". fltech - 富士通研究所の技術ブログ (in Japanese). Retrieved 2021-02-10.
  29. "Exclusive: Behind the plot to break Nvidia's grip on AI by targeting software". Reuters . Retrieved 2024-04-05.

Sources