Repository | github |
---|---|
Operating system | Cross-platform |
Platform | Cross-platform |
Type | Open-source software specification for parallel programming |
Website | www |
oneAPI is an open standard, adopted by Intel, [1] for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture. [2] [3] [4] [5]
oneAPI competes with other GPU computing stacks: CUDA by Nvidia and ROCm by AMD.
The oneAPI specification extends existing developer programming models to enable multiple hardware architectures through a data-parallel language, a set of library APIs, and a low-level hardware interface to support cross-architecture programming. It builds upon industry standards and provides an open, cross-platform developer stack. [6] [7]
DPC++ [8] [9] is a programming language implementation of oneAPI, built upon the ISO C++ and Khronos Group SYCL standards. [10] DPC++ is an implementation of SYCL with extensions that are proposed for inclusion in future revisions of the SYCL standard, including: unified shared memory, group algorithms, and sub-groups. [11] [12] [13]
The set of APIs [6] spans several domains, including libraries for linear algebra, deep learning, machine learning, video processing, and others.
Library Name | Short Name | Description |
---|---|---|
oneAPI DPC++ Library | oneDPL | Algorithms and functions to speed DPC++ kernel programming |
oneAPI Math Kernel Library | oneMKL | Math routines including matrix algebra, FFT, and vector math |
oneAPI Data Analytics Library | oneDAL | Machine learning and data analytics functions |
oneAPI Deep Neural Network Library | oneDNN | Neural networks functions for deep learning training and inference |
oneAPI Collective Communications Library | oneCCL | Communication patterns for distributed deep learning |
oneAPI Threading Building Blocks | oneTBB | Threading and memory management template library |
oneAPI Video Processing Library | oneVPL | Real-time video encode, decode, transcode, and processing |
The source code of parts of the above libraries is available on GitHub. [14]
The oneAPI documentation also lists the "Level Zero" API defining the low-level direct-to-metal interfaces and a set of ray tracing components with its own APIs. [6]
oneAPI Level Zero, [15] [16] [17] the low-level hardware interface, defines a set of capabilities and services that a hardware accelerator needs to interface with compiler runtimes and other developer tools.
Intel has released oneAPI production toolkits that implement the specification and add CUDA code migration, analysis, and debug tools. [18] [19] [20] These include the Intel oneAPI DPC++/C++ Compiler, [21] Intel Fortran Compiler, Intel VTune Profiler [22] and multiple performance libraries.
Codeplay has released an open-source layer [23] [24] [25] to allow oneAPI and SYCL/DPC++ to run atop Nvidia GPUs via CUDA.
University of Heidelberg has developed a SYCL/DPC++ implementation for both AMD and Nvidia GPUs. [26]
Huawei released a DPC++ compiler for their Ascend AI Chipset [27]
Fujitsu has created an open-source ARM version of the oneAPI Deep Neural Network Library (oneDNN) [28] for their Fugaku CPU.
Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the contiuation of the OneAPI initiative, with to goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal will compete with Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware. [29]
OpenGL is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardware-accelerated rendering.
A graphics processing unit (GPU) is a specialized electronic circuit initially designed to accelerate computer graphics and image processing. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.
OpenMAX, often shortened as "OMX", is a non-proprietary and royalty-free cross-platform set of C-language programming interfaces. It provides abstractions for routines that are especially useful for processing of audio, video, and still images. It is intended for low power and embedded system devices that need to efficiently process large amounts of multimedia data in predictable ways, such as video codecs, graphics libraries, and other functions for video, image, audio, voice and speech.
Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.
A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.
VTune Profiler is a performance analysis tool for x86-based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but the advanced hardware-based sampling features require an Intel-manufactured CPU.
Compute Unified Device Architecture (CUDA) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations (like moving data between the CPU and the GPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.
Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.
nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.
Video Acceleration API (VA-API) is an open source application programming interface that allows applications such as VLC media player or GStreamer to use hardware video acceleration capabilities, usually provided by the graphics processing unit (GPU). It is implemented by the free and open-source library libva, combined with a hardware-specific driver, usually provided together with the GPU driver.
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
Intel Parallel Studio XE was a software development product developed by Intel that facilitated native code development on Windows, macOS and Linux in C++ and Fortran for parallel computing. Parallel programming enables software programs to take advantage of multi-core processors from Intel and other processor vendors.
Video Decode and Presentation API for Unix (VDPAU) is a royalty-free application programming interface (API) as well as its implementation as free and open-source library distributed under the MIT License. VDPAU is also supported by Nvidia.
Intel Graphics Technology (GT) is the collective name for a series of integrated graphics processors (IGPs) produced by Intel that are manufactured on the same package or die as the central processing unit (CPU). It was first introduced in 2010 as Intel HD Graphics and renamed in 2017 as Intel UHD Graphics.
OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.
OpenVX is an open, royalty-free standard for cross-platform acceleration of computer vision applications. It is designed by the Khronos Group to facilitate portable, optimized and power-efficient processing of methods for vision algorithms. This is aimed for embedded and real-time programs within computer vision and related scenarios. It uses a connected graph representation of operations.
Vulkan is a low-level, low-overhead cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and operating systems, it is also designed to work with modern multi-core CPUs.
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.
Nvidia NVDEC is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU. NVDEC is a successor of PureVideo and is available in Kepler and later NVIDIA GPUs.
ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), and OpenCL.
{{cite web}}
: CS1 maint: numeric names: authors list (link)