Developer(s) | AMD |
---|---|
Initial release | November 14, 2016 |
Stable release | 6.2.2 / September 27, 2024 [1] |
Repository | Meta-repository github |
Written in | C, C++, Python, Fortran, Julia |
Middleware | HIP |
Engine | AMDgpu kernel driver, HIPCC, a LLVM-based compiler |
Operating system | Linux, Windows [2] |
Platform | Supported GPUs |
Predecessor | Close to metal, Stream, HSA |
Size | <2 GiB |
Type | GPGPU libraries and APIs |
License | MIT License |
Website | www |
ROCm [3] is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP (directive-based programming), and OpenCL.
ROCm is free, libre and open-source software (except the GPU firmware blobs [4] ), and it is distributed under various licenses. ROCm initially stood for Radeon Open Compute platform; however, due to Open Compute being a registered trademark, ROCm is no longer an acronym — it is simply AMD's open-source stack designed for GPU compute.
The first GPGPU software stack from ATI/AMD was Close to Metal, which became Stream.
ROCm was launched around 2016 [5] with the Boltzmann Initiative. [6] ROCm stack builds upon previous AMD GPU stacks; some tools trace back to GPUOpen and others to the Heterogeneous System Architecture (HSA).
HSAIL [7] was aimed at producing a middle-level, hardware-agnostic intermediate representation that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using LLVM, and its AMDGPU backend that was upstreamed, [8] although there is still research on such enhanced modularity with LLVM MLIR. [9]
This section needs expansion. You can help by adding to it. (January 2022) |
ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware, [10] and ROCm programming [11] via its learning portal. [12]
One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit. [13]
ROCm is primarily targeted at discrete professional GPUs, [14] but unofficial support includes the Vega family and RDNA 2 consumer GPUs.
Accelerated Processor Units (APU) are "enabled", but not officially supported. Having ROCm functional there is involved. [15]
AMD Instinct accelerators are the first-class ROCm citizens, alongside the prosumer Radeon Pro GPU series: they mostly see full support.
The only consumer-grade GPU that has relatively equal support is, as of January 2022, the Radeon VII (GCN 5 - Vega).
Name of GPU series | Southern Islands | Sea Islands | Volcanic Islands | Arctic Islands/Polaris | Vega | Navi 1X | Navi 2X | |
---|---|---|---|---|---|---|---|---|
Released | Jan 2012 | Sep 2013 | Jun 2015 | Jun 2016 | Jun 2017 | Jul 2019 | Nov 2020 | |
Marketing Name | Radeon HD 7000 | Radeon Rx 200 | Radeon Rx 300 | Radeon RX 400/500 | Radeon RX Vega/Radeon VII(7 nm) | Radeon RX 5000 | Radeon RX 6000 | |
AMD support | ||||||||
Instruction set | GCN instruction set | RDNA instruction set | ||||||
Microarchitecture | GCN 1st gen | GCN 2nd gen | GCN 3rd gen | GCN 4th gen | GCN 5th gen | RDNA | RDNA 2 | |
Type | Unified shader model | |||||||
ROCm [16] | [17] | [18] | ||||||
OpenCL | 1.2 (on Linux: 1.1 (no Image support) with Mesa 3D) | 2.0 (Adrenalin driver on Win7+) (on Linux: 1.1 (no Image support) with Mesa 3D, 2.0 with AMD drivers or AMD ROCm) | 2.0 | 2.1 [19] | ||||
Vulkan | 1.0 (Win 7+ or Mesa 17+) | 1.2 (Adrenalin 20.1, Linux Mesa 3D 20.0) | ||||||
Shader model | 5.1 | 5.1 6.3 | 6.4 | 6.5 | ||||
OpenGL | 4.6 (on Linux: 4.6 (Mesa 3D 20.0)) | |||||||
Direct3D | 11 (11_1) 12 (11_1) | 11 (12_0) 12 (12_0) | 11 (12_1) 12 (12_1) | 11 (12_1) 12 (12_2) | ||||
/drm/amdgpu [lower-alpha 1] | Experimental [20] |
This section needs expansion. You can help by adding to it. (January 2022) |
AMD ROCm product manager Terry Deem gave a tour of the stack. [21]
The main consumers of the stack are machine learning and high-performance computing/GPGPU applications.
Various deep learning frameworks have a ROCm backend: [22]
ROCm is gaining significant traction in the top 500. [24] ROCm is used with the Exascale supercomputers El Capitan [25] [26] and Frontier.
Some related software is to be found at AMD Infinity hub.
As of version 3.0, Blender can now use HIP compute kernels for its renderer cycles. [27]
Julia has the AMDGPU.jl package, [28] which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code.
AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned.
Installation instructions are provided for Linux and Windows in the official AMD ROCm documentation. ROCm software is currently spread across several public GitHub repositories. Within the main public meta-repository, there is an XML manifest for each official release: using git-repo, a version control tool built on top of Git, is the recommended way to synchronize with the stack locally. [29]
AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered under AMD Infinity Hub. [30]
AMD distributes itself packages tailored to various Linux distributions.
There is a growing third-party ecosystem packaging ROCm.
Linux distributions are officially packaging (natively) ROCm, with various degrees of advancement: Arch Linux, [31] Gentoo, [32] Debian, Fedora , [33] GNU Guix, and NixOS.
There are spack packages. [34]
This section needs expansion. You can help by adding to it. (January 2022) |
There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of user-space modules.
The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS. [35]
AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, is still lagging. [36] AMD still officially packages various LLVM forks [37] [38] [9] for parts that are not yet upstreamed – compiler optimizations destined to remain proprietary, debug support, OpenMP offloading, etc.
Support libraries implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc.
The thunk is responsible for all the thinking and queuing that goes into the stack.
The ROC runtime is a set of APIs/libraries that allows the launch of compute kernels by host applications. It is AMD's implementation of the HSA runtime API. [39] It is different from the ROC Common Language Runtime.
ROCm code object manager is in charge of interacting with LLVM intermediate representation.
The common language runtime is an indirection layer adapting calls to ROCr on Linux and PAL on windows. It used to be able to route between different compilers, like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP and OpenCL).
ROCm ships its installable client driver (ICD) loader and an OpenCL [40] implementation bundled together. As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition. [41]
The AMD implementation for its GPUs is called HIPAMD. There is also a CPU implementation mostly for demonstration purposes.
HIP builds a `HIPCC` compiler that either wraps Clang and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler. [42]
HIPIFY is a source-to-source compiling tool. It translates CUDA to HIP and reverse, either using a Clang-based tool, or a sed-like Perl script.
Like HIPIFY, GPUFORT is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so. [43]
ROCm high-level libraries are usually consumed directly by application software, such as machine learning frameworks. Most of the following libraries are in the General Matrix Multiply (GEMM) category, which GPU architecture excels at.
The majority of these user-facing libraries comes in dual-form: hip for the indirection layer that can route to Nvidia hardware, and roc for the AMD implementation. [44]
rocBLAS and hipBLAS are central in high-level libraries, it is the AMD implementation for Basic Linear Algebra Subprograms. It uses the library Tensile privately.
This pair of libraries constitutes the LAPACK implementation for ROCm and is strongly coupled to rocBLAS.
ROCm competes with other GPU computing stacks: Nvidia CUDA and Intel OneAPI.
Nvidia's CUDA is closed-source, whereas AMD ROCm is open source. There is open-source software built on top of the closed-source CUDA, for instance RAPIDS.
CUDA is able run on consumer GPUs, whereas ROCm support is mostly offered for professional hardware such as AMD Instinct and AMD Radeon Pro.
Nvidia provides a C/C++-centered frontend and its Parallel Thread Execution (PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC).
Like ROCm, oneAPI is open source, and all the corresponding libraries are published on its GitHub Page.
Unified Acceleration Foundation (UXL) is a new technology consortium that are working on the contiuation of the OneAPI initiative, with the goal to create a new open standard accelerator software ecosystem, related open standards and specification projects through Working Groups and Special Interest Groups (SIGs). The goal will compete with Nvidia's CUDA. The main companies behind it are Intel, Google, ARM, Qualcomm, Samsung, Imagination, and VMware. [45]
Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.
A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.
In computing, CUDA is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA API and its runtime: The CUDA API is an extension of the C programming language that adds the ability to specify thread-level parallelism in C and also to specify GPU device specific operations. CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels. In addition to drivers and runtime kernels, the CUDA platform includes compilers, libraries and developer tools to help programmers accelerate their applications.
nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.
AMD Software is a device driver and utility software package for AMD's Radeon graphics cards and APUs. Its graphical user interface is built with Qt and is compatible with 64-bit Windows and Linux distributions.
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies a programming language for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process.
Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was launched on January 9, 2012.
C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).
Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.
Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with only minor code changes.
Video Code Engine is AMD's video encoding application-specific integrated circuit implementing the video codec H.264/MPEG-4 AVC. Since 2012 it was integrated into all of their GPUs and APUs except Oland.
AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed by software. This allows the processor to meet the instantaneous performance needs of the operation being performed, while minimizing power draw, heat generation and noise avoidance. AMD PowerTune aims to solve thermal design power and performance constraints.
The Radeon 300 series is a series of graphics processors developed by AMD. All of the GPUs of the series are produced in 28 nm format and use the Graphics Core Next (GCN) micro-architecture.
Vulkan is a low-level, low-overhead cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and operating systems, and it is also designed to work with modern multi-core CPUs.
GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.
AMDgpu is an open source device driver for the Linux operating system developed by AMD to support its Radeon lineup of graphics cards (GPUs). It was announced in 2014 as the successor to the previous radeon
device driver as part of AMD's new "unified" driver strategy, and was released on April 20, 2015.
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.
oneAPI is an open standard, adopted by Intel, for a unified application programming interface (API) intended to be used across different computing accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, tools, and workflows for each architecture.