SYCL

Last updated
SYCL
Original author(s) Khronos Group
Developer(s) Khronos Group
Initial releaseMarch 2014;10 years ago (2014-03)
Stable release
2020 revision 8 (1.2.1) / 19 October 2023;10 months ago (2023-10-19) [1]
Operating system Cross-platform
Platform Cross-platform
Type High-level programming language
Website www.khronos.org/sycl/ sycl.tech

SYCL (pronounced "sickle") is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.

Contents

Origin of the name

SYCL (pronounced ‘sickle’) originally stood for SYstem-wide Compute Language, [2] but since 2020 SYCL developers have stated that SYCL is a name and have made clear that it is no longer an acronym and contains no reference to OpenCL. [3]

Purpose

SYCL is a royalty-free, cross-platform abstraction layer that builds on the underlying concepts, portability and efficiency inspired by OpenCL that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single-source development where C++ template functions can contain both host and device code to construct complex algorithms that use hardware accelerators, and then re-use them throughout their source code on different types of data.

While the SYCL standard started as the higher-level programming model sub-group of the OpenCL working group and was originally developed for use with OpenCL and SPIR, SYCL is a Khronos Group workgroup independent from the OpenCL working group since September 20, 2019 and starting with SYCL 2020, SYCL has been generalized as a more general heterogeneous framework able to target other systems. This is now possible with the concept of a generic backend to target any acceleration API while enabling full interoperability with the target API, like using existing native libraries to reach the maximum performance along with simplifying the programming effort. For example, the Open SYCL implementation targets ROCm and CUDA via AMD's cross-vendor HIP.

Versions

SYCL was introduced at GDC in March 2014 with provisional version 1.2, [4] then the SYCL 1.2 final version was introduced at IWOCL 2015 in May 2015. [5]

The latest version for the previous SYCL 1.2.1 series is SYCL 1.2.1 revision 7 which was published on April 27, 2020 (the first version was published on December 6, 2017 [6] ).

SYCL 2.2 provisional was introduced at IWOCL 2016 in May 2016 [7] targeting C++14 and OpenCL 2.2. But the SYCL committee preferred not to finalize this version and to move towards a more flexible SYCL specification to address the increasing diversity of current hardware accelerators, including artificial intelligence engines, which led to SYCL 2020.

The latest version is SYCL 2020 revision 6 which was published on November 13, 2022, an evolution from first release of revision 2 which was published on February 9, 2021, [8] taking into account the feedback from users and implementors on the SYCL 2020 Provisional Specification revision 1 published on June 30, 2020. [9] C++17 and OpenCL 3.0 support are main targets of this release. Unified shared memory (USM) is one main feature for GPUs with OpenCL and CUDA support.

At IWOCL 2021 a roadmap was presented. DPC++, ComputeCpp, Open SYCL, triSYCL and neoSYCL are the main implementations of SYCL. Next Target in development is support of C++20 in future SYCL 202x. [10]

Implementations

Extensions

SYCL safety critical

In march the Khronos Group announced the creation of the SYCL SC Working Group, [26] with the objective of creating a high-level heterogeneous computing framework for safety-critical systems. These systems span various fields, including avionics, automotive, industrial, and medical sectors.

The SYCL Safety Critical framework will comply with several industry standards to ensure its reliability and safety. These standards include MISRA C++ 202X, [27] which provides guidelines for the use of C++ in critical systems, RTCA DO-178C / EASA ED-12C, [28] which are standards for software considerations in airborne systems and equipment certification, ISO 26262/21448, [29] which pertains to the functional safety of road vehicles, IEC 61508, which covers the functional safety of electrical/electronic/programmable electronic safety-related systems, and IEC 62304,which relates to the lifecycle requirements for medical device software. [26]

Software

Some notable software fields that make use of SYCL include the following (with examples):

Resources

Khronos Maintains a list of SYCL resource. [36] Codeplay Software also provides tutorials on the website sycl.tech along with other information and news on the SYCL ecosystem.

License

The source files for building the specification, such as Makefiles and some scripts, the SYCL headers and the SYCL code samples are under the Apache 2.0 license. [37]

Comparison with other Tools

The open standards SYCL and OpenCL are similar to the programming models of the proprietary stack CUDA from Nvidia and HIP from the open-source stack ROCm, supported by AMD. [38]

In the Khronos Group realm, OpenCL and Vulkan are the low-level non-single source APIs, providing fine-grained control over hardware resources and operations. OpenCL is widely used for parallel programming across various hardware types, while Vulkan primarily focuses on high-performance graphics and computing tasks. [39]

SYCL, on the other hand, is the high-level single-source C++ embedded domain-specific language (eDSL). It enables developers to write code for heterogeneous computing systems, including CPUs, GPUs, and other accelerators, using a single-source approach. This means that both host and device code can be written in the same C++ source file. [40]

CUDA

By comparison, the single-source C++ embedded domain-specific language version of CUDA, which is named "CUDA Runtime API," is somewhat similar to SYCL. In fact, Intel released a tool called SYCLOMATIC that automatically translated code from CUDA to SYCL. [41] However, there is a less known non-single-source version of CUDA, which is called "CUDA Driver API," similar to OpenCL, and used, for example, by the CUDA Runtime API implementation itself. [38]

SYCL extends the C++ AMP features, relieving the programmer from explicitly transferring data between the host and devices by using buffers and accessors. This is in contrast to CUDA (prior to the introduction of Unified Memory in CUDA 6), where explicit data transfers were required. Starting with SYCL 2020, it is also possible to use USM instead of buffers and accessors, providing a lower-level programming model similar to Unified Memory in CUDA. [42]

SYCL is higher-level than C++ AMP and CUDA since you do not need to build an explicit dependency graph between all the kernels, and it provides you with automatic asynchronous scheduling of the kernels with communication and computation overlap. This is all done by using the concept of accessors without requiring any compiler support. [43]

Unlike C++ AMP and CUDA, SYCL is a pure C++ eDSL without any C++ extension. This allows for a basic CPU implementation that relies on pure runtime without any specific compiler. [40]

Both DPC++ [44] and AdaptiveCpp [45] compilers provide a backend to NVIDIA GPUs, similar to how CUDA does. This allows SYCL code to be compiled and run on NVIDIA hardware, allowing developers to leverage SYCL's high-level abstractions on CUDA-capable GPUs. [44] [45]

ROCm HIP

ROCm HIP targets Nvidia GPU, AMD GPU, and x86 CPU. HIP is a lower-level API that closely resembles CUDA's APIs. [46] For example, AMD released a tool called HIPIFY that can automatically translate CUDA code to HIP. [47] Therefore, many of the points mentioned in the comparison between CUDA and SYCL also apply to the comparison between HIP and SYCL. [48]

ROCm HIP has some similarities to SYCL in the sense that it can target various vendors (AMD and Nvidia) and accelerator types (GPU and CPU). [49] However, SYCL can target a broader range of accelerators and vendors. SYCL supports multiple types of accelerators simultaneously within a single application through the concept of backends. Additionally, SYCL is written in pure C++, whereas HIP, like CUDA, uses some language extensions. These extensions prevent HIP from being compiled with a standard C++ compiler. [48]

Both DPC++ [44] and AdaptiveCpp [45] compilers provide backends for NVIDIA and AMD GPUs, similar to how HIP does. This enables SYCL code to be compiled and executed on hardware from these vendors, offering developers the flexibility to leverage SYCL's high-level abstractions across a diverse range of devices and platforms. [45] [44]

Kokkos

SYCL has many similarities to the Kokkos programming model, [50] including the use of opaque multi-dimensional array objects (SYCL buffers and Kokkos arrays), multi-dimensional ranges for parallel execution, and reductions (added in SYCL 2020). [51] Numerous features in SYCL 2020 were added in response to feedback from the Kokkos community.

SYCL focuses more on heterogeneous systems; thanks to its integration with OpenCL, it can be adopted on a wide range of devices. Kokkos, on the other hand, targets most of the HPC platforms, [52] thus it is more HPC-oriented for performance.

As of 2024, the Kokkos team is developing a SYCL backend, [53] which enables Kokkos to target Intel hardware in addition to the platforms it already supports. This development broadens the applicability of Kokkos and allows for greater flexibility in leveraging different hardware architectures within HPC applications. [50]

Raja

Raja [54] [55] is a library of C++ software abstractions to enable the architecture and programming portability of HPC applications.

Like SYCL, it provides portable code across heterogeneous platforms. However, unlike SYCL, Raja introduces an abstraction layer over other programming models like CUDA, HIP, OpenMP, and others. This allows developers to write their code once and run it on various backends without modifying the core logic. Raja is maintained and developed at Lawrence Livermore National Laboratory (LLNL), whereas SYCL is an open standard maintained by the community. [39]

Similar to Kokkos, Raja is more tailored for HPC use cases, focusing on performance and scalability in high-performance computing environments. In contrast, SYCL supports a broader range of devices, making it more versatile for different types of applications beyond just HPC. [55]

As of 2024, the Raja team is developing a SYCL backend, [56] which will enable Raja to also target Intel hardware. This development will enhance Raja's portability and flexibility, allowing it to leverage SYCL's capabilities and expand its applicability across a wider array of hardware platforms. [39]

OpenMP

OpenMP targets computational offloading to external accelerators, [57] primarily focusing on multi-core architectures and GPUs. SYCL, on the other hand, is oriented towards a broader range of devices due to its integration with OpenCL, which enables support for various types of hardware accelerators. [58]

OpenMP uses a pragma-based approach, where the programmer annotates the code with directives, and the compiler handles the complexity of parallel execution and memory management. This high-level abstraction makes it easier for developers to parallelize their applications without dealing with the intricate details of memory transfers and synchronization. [59]

Both OpenMP and SYCL support C++ and are standardized. OpenMP is standardized by the OpenMP Architecture Review Board (ARB), while SYCL is standardized by the Khronos Group. [39]

OpenMP has wide support from various compilers, like GCC and Clang. [60]

std::par

std::par is part of the C++17 standard [61] and is designed to facilitate the parallel execution of standard algorithms on C++ standard containers. It provides a standard way to take advantage of external accelerators by allowing developers to specify an execution policy for parallel operations, such as std::for_each, std::transform, and std::reduce. This enables efficient use of multi-core processors and other parallel hardware without requiring significant changes to the code. [62]

SYCL can be used as a backend for std::par, enabling the execution of standard algorithms on a wide range of external accelerators, including GPUs from Intel, AMD, and NVIDIA, as well as other types of accelerators. [63] By leveraging SYCL's capabilities, developers can write standard C++ code that seamlessly executes on heterogeneous computing environments. This integration allows for greater flexibility and performance optimization across different hardware platforms. [63]

The use of SYCL as a backend for std::par is compiler-dependent, meaning it requires a compiler that supports both SYCL and the parallel execution policies introduced in C++17. [63] Examples of such compilers include DPC++ and other SYCL-compliant compilers. With these compilers, developers can take advantage of SYCL's abstractions for memory management and parallel execution while still using the familiar C++ standard algorithms and execution policies. [44]

See also

Related Research Articles

<span class="mw-page-title-main">OpenGL</span> Cross-platform graphics API

OpenGL is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardware-accelerated rendering.

<span class="mw-page-title-main">Graphics processing unit</span> Specialized electronic circuit; graphics accelerator

A graphics processing unit (GPU) is a specialized electronic circuit initially designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal computers, workstations, and game consoles. After their initial design, GPUs were found to be useful for non-graphic calculations involving embarrassingly parallel problems due to their parallel structure. Other non-graphical uses include the training of neural networks and cryptocurrency mining.

The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation, vision acceleration and machine learning. The open standards and associated conformance tests enable software applications and middleware to effectively harness authoring and accelerated playback of dynamic media across a wide variety of platforms and devices. The group is based in Beaverton, Oregon.

Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.

VTune Profiler is a performance analysis tool for x86-based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but the advanced hardware-based sampling features require an Intel-manufactured CPU.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.

Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems.

In computing, Close To Metal is the name of a beta version of a low-level programming interface developed by ATI, now the AMD Graphics Product Group, aimed at enabling GPGPU computing. CTM was short-lived, and the first production version of AMD's GPGPU technology is now called AMD Stream SDK, or rather the current AMD APP SDK ) for Windows and Linux 32-bit and 64-bit, which also targets Heterogeneous System Architecture.

<span class="mw-page-title-main">The Portland Group</span> American technology company

PGI was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc. As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the Nvidia HPC SDK product available as a free download from Nvidia.

AMD FireStream was AMD's brand name for their Radeon-based product line targeting stream processing and/or GPGPU in supercomputers. Originally developed by ATI Technologies around the Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor. The AMD FireStream can also be used as a floating-point co-processor for offloading CPU calculations, which is part of the Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the AMD FirePro line.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

<span class="mw-page-title-main">Numba</span> Open-source JIT compiler

Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with only minor code changes.

OpenVX is an open, royalty-free standard for cross-platform acceleration of computer vision applications. It is designed by the Khronos Group to facilitate portable, optimized and power-efficient processing of methods for vision algorithms. This is aimed for embedded and real-time programs within computer vision and related scenarios. It uses a connected graph representation of operations.

<span class="mw-page-title-main">GPUOpen</span> Middleware software suite

GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.

<span class="mw-page-title-main">AMD Instinct</span> Brand of data center GPUs by AMD

AMD Instinct is AMD's brand of data center GPUs. It replaced AMD's FirePro S brand in 2016. Compared to the Radeon brand of mainstream consumer/gamer products, the Instinct product line is intended to accelerate deep learning, artificial neural network, and high-performance computing/GPGPU applications.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP, and OpenCL.

oneAPI (compute acceleration) Open standard for parallel computing

oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.

References

  1. "Khronos SYCL Registry - the Khronos Group Inc".
  2. Keryell, Ronan (17 November 2019). "SYCL: A Single-Source C++ Standard for Heterogeneous Computing" (PDF). Khronos.org. Retrieved 26 September 2023.
  3. Keryell, Ronan. "Meaning of SYCL". GitHub. Retrieved 5 February 2021.
  4. Khronos Group (19 March 2014). "Khronos Releases SYCL 1.2 Provisional Specification". Khronos. Retrieved 20 August 2017.
  5. Khronos Group (11 May 2015). "Khronos Releases SYCL 1.2 Final Specification". Khronos. Retrieved 20 August 2017.
  6. Khronos Group (6 December 2017). "The Khronos Group Releases Finalized SYCL 1.2.1". Khronos. Retrieved 12 December 2017.
  7. Khronos Group (18 April 2016). "Khronos Releases OpenCL 2.2 Provisional Specification with OpenCL C++ Kernel Language". Khronos. Retrieved 18 September 2017.
  8. Khronos Group (9 February 2021). "Khronos Releases SYCL 2020 Specification". Khronos. Retrieved 22 February 2021.
  9. Khronos Group (30 June 2020). "Khronos Steps Towards Widespread Deployment of SYCL with Release of SYCL 2020 Provisional Specification". Khronos. Retrieved 4 December 2020.
  10. https://www.iwocl.org/wp-content/uploads/k04-iwocl-syclcon-2021-wong-slides.pdf [ bare URL PDF ]
  11. https://www.iwocl.org/wp-content/uploads/k01-iwocl-syclcon-2021-reinders-slides.pdf [ bare URL PDF ]
  12. "Compile Cross-Architecture: Intel® oneAPI DPC++/C++ Compiler".
  13. "Home - ComputeCpp CE - Products - Codeplay Developer".
  14. "Guides - ComputeCpp CE - Products - Codeplay Developer".
  15. "The Future of ComputeCpp". www.codeplay.com. Retrieved 2023-12-09.
  16. "AdaptiveCpp feature support". GitHub . 4 July 2023.
  17. "AdaptiveCpp/doc/compilation.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub.
  18. "AdaptiveCpp (formerly known as hipSYCL / Open SYCL)". GitHub . 4 July 2023.
  19. "triSYCL". GitHub . 6 January 2022.
  20. Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN   9781450388429. S2CID   231597238.
  21. Ke, Yinan; Agung, Mulya; Takizawa, Hiroyuki (2021). "NeoSYCL: A SYCL implementation for SX-Aurora TSUBASA". The International Conference on High Performance Computing in Asia-Pacific Region. pp. 50–57. doi:10.1145/3432261.3432268. ISBN   9781450388429. S2CID   231597238.
  22. "Sycl-GTX". GitHub . 10 April 2021.
  23. https://www.iwocl.org/wp-content/uploads/14-iwocl-syclcon-2021-thoman-slides.pdf [ bare URL PDF ]
  24. "Polygeist". GitHub . 25 February 2022.
  25. "Inteon". 25 February 2022.
  26. 1 2 "Khronos to Create SYCL SC Open Standard for Safety-Critical C++ Based Heterogeneous Compute". The Khronos Group. 2023-03-15. Retrieved 2024-07-10.
  27. "MISRA" . Retrieved 2024-07-11.
  28. "ED-12C Aviation Software Standards Training - Airborne". Eurocae. Retrieved 2024-07-11.
  29. "SOTIF – practical training". www.kuglermaag.com. Retrieved 2024-07-11.
  30. https://www.iwocl.org/wp-content/uploads/k03-iwocl-syclcon-2021-trevett-updated.mp4.pdf [ bare URL PDF ]
  31. Crisci, Luigi; Salimi Beni, Majid; Cosenza, Biagio; Scipione, Nicolò; Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Beccari, Andrea (2022-05-10). "Towards a Portable Drug Discovery Pipeline with SYCL 2020". International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery. pp. 1–2. doi:10.1145/3529538.3529688. ISBN   978-1-4503-9658-5.
  32. Solis-Vasquez, Leonardo; Mascarenhas, Edward; Koch, Andreas (2023-04-18). "Experiences Migrating CUDA to SYCL: A Molecular Docking Case Study". International Workshop on OpenCL. IWOCL '23. New York, NY, USA: Association for Computing Machinery. pp. 1–11. doi:10.1145/3585341.3585372. ISBN   979-8-4007-0745-2.
  33. https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md [ bare URL ]
  34. https://www.iwocl.org/wp-content/uploads/20-iwocl-syclcon-2021-rudkin-slides.pdf [ bare URL PDF ]
  35. Rangel, Esteban Miguel; Pennycook, Simon John; Pope, Adrian; Frontiere, Nicholas; Ma, Zhiqiang; Madananth, Varsha (2023-11-12). "A Performance-Portable SYCL Implementation of CRK-HACC for Exascale". Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery. pp. 1114–1125. arXiv: 2310.16122 . doi:10.1145/3624062.3624187. ISBN   979-8-4007-0785-8.
  36. "SYCL Resources". khronos.org. Khronos group. 20 January 2014.
  37. "SYCL Open Source Specification". GitHub . 10 January 2022.
  38. 1 2 Breyer, Marcel; Van Craen, Alexander; Pflüger, Dirk (2022-05-10). "A Comparison of SYCL, OpenCL, CUDA, and OpenMP for Massively Parallel Support Vector Machine Classification on Multi-Vendor Hardware". International Workshop on OpenCL. IWOCL '22. New York, NY, USA: Association for Computing Machinery. pp. 1–12. doi:10.1145/3529538.3529980. ISBN   978-1-4503-9658-5.
  39. 1 2 3 4 "SYCL - C++ Single-source Heterogeneous Programming for Acceleration Offload". The Khronos Group. 2014-01-20. Retrieved 2024-07-12.
  40. 1 2 "SYCL™ 2020 Specification (revision 8)". registry.khronos.org. Retrieved 2024-07-12.
  41. oneapi-src/SYCLomatic, oneAPI-SRC, 2024-07-11, retrieved 2024-07-11
  42. Chen, Jolly; Dessole, Monica; Varbanescu, Ana Lucia (2024-01-24), Lessons Learned Migrating CUDA to SYCL: A HEP Case Study with ROOT RDataFrame, arXiv: 2401.13310 , retrieved 2024-07-12
  43. "Buffer Accessor Modes". Intel. Retrieved 2024-07-11.
  44. 1 2 3 4 5 "DPC++ Documentation — oneAPI DPC++ Compiler documentation". intel.github.io. Retrieved 2024-07-11.
  45. 1 2 3 4 "AdaptiveCpp/doc/sycl-ecosystem.md at develop · AdaptiveCpp/AdaptiveCpp". GitHub. Retrieved 2024-07-11.
  46. ROCm/HIP, AMD ROCm™ Software, 2024-07-11, retrieved 2024-07-11
  47. "HIPIFY/README.md at amd-staging · ROCm/HIPIFY". GitHub. Retrieved 2024-07-11.
  48. 1 2 Jin, Zheming; Vetter, Jeffrey S. (November 2022). "Evaluating Nonuniform Reduction in HIP and SYCL on GPUs". 2022 IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD). IEEE. pp. 37–43. doi:10.1109/DRBSD56682.2022.00010. ISBN   978-1-6654-6337-9. OSTI   1996715.
  49. Reguly, Istvan Z. (2023-11-12). "Evaluating the performance portability of SYCL across CPUs and GPUs on bandwidth-bound applications". Proceedings of the SC '23 Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis. SC-W '23. New York, NY, USA: Association for Computing Machinery. pp. 1038–1047. arXiv: 2309.10075 . doi:10.1145/3624062.3624180. ISBN   979-8-4007-0785-8.
  50. 1 2 Hammond, Jeff R.; Kinsner, Michael; Brodman, James (2019). "A comparative analysis of Kokkos and SYCL as heterogeneous, parallel programming models for C++ applications". Proceedings of the International Workshop on OpenCL. pp. 1–2. doi:10.1145/3318170.3318193. ISBN   9781450362306. S2CID   195777149.
  51. Dufek, Amanda S.; Gayatri, Rahulkumar; Mehta, Neil; Doerfler, Douglas; Cook, Brandon; Ghadar, Yasaman; DeTar, Carleton (November 2021). "Case Study of Using Kokkos and SYCL as Performance-Portable Frameworks for Milc-Dslash Benchmark on NVIDIA, AMD and Intel GPUs". 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE. pp. 57–67. doi:10.1109/P3HPC54578.2021.00009. ISBN   978-1-6654-2439-4.
  52. Trott, Christian R.; Lebrun-Grandié, Damien; Arndt, Daniel; Ciesko, Jan; Dang, Vinh; Ellingwood, Nathan; Gayatri, Rahulkumar; Harvey, Evan; Hollman, Daisy S. (2022), Kokkos 3: Programming Model Extensions for the Exascale Era , retrieved 2024-07-10
  53. Arndt, Daniel; Lebrun-Grandie, Damien; Trott, Christian (2024-04-08). "Experiences with implementing Kokkos' SYCL backend". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery. pp. 1–11. doi:10.1145/3648115.3648118. ISBN   979-8-4007-1790-1. OSTI   2336667.
  54. LLNL/RAJA, Lawrence Livermore National Laboratory, 2024-07-08, retrieved 2024-07-10
  55. 1 2 Beckingsale, David A.; Scogland, Thomas RW; Burmark, Jason; Hornung, Rich; Jones, Holger; Killian, William; Kunen, Adam J.; Pearce, Olga; Robinson, Peter; Ryujin, Brian S. (November 2019). "RAJA: Portable Performance for Large-Scale Scientific Applications". 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC). IEEE. pp. 71–81. doi:10.1109/P3HPC49587.2019.00012. ISBN   978-1-7281-6003-0. OSTI   1488819.
  56. Homerding, Brian; Vargas, Arturo; Scogland, Tom; Chen, Robert; Davis, Mike; Hornung, Rich (2024-04-08). "Enabling RAJA on Intel GPUs with SYCL". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery. pp. 1–10. doi:10.1145/3648115.3648131. ISBN   979-8-4007-1790-1.
  57. tim.lewis. "Home". OpenMP. Retrieved 2024-07-10.
  58. "OpenCL - The Open Standard for Parallel Programming of Heterogeneous Systems". The Khronos Group. 2013-07-21. Retrieved 2024-07-12.
  59. Friedman, Richard. "Reference Guides". OpenMP. Retrieved 2024-07-12.
  60. "OpenMP Compilers & Tools".
  61. "std::execution::seq, std::execution::par, std::execution::par_unseq, std::execution::unseq - cppreference.com". en.cppreference.com. Retrieved 2024-07-10.
  62. "Accelerating Standard C++ with GPUs Using stdpar". NVIDIA Technical Blog. 2020-08-04. Retrieved 2024-07-10.
  63. 1 2 3 Alpay, Aksel; Heuveline, Vincent (2024-04-08). "AdaptiveCpp Stdpar: C++ Standard Parallelism Integrated into a SYCL Compiler". Proceedings of the 12th International Workshop on OpenCL and SYCL. IWOCL '24. New York, NY, USA: Association for Computing Machinery. pp. 1–12. doi:10.1145/3648115.3648117. ISBN   979-8-4007-1790-1.