NVIDIA CUDA Compiler

NVIDIA CUDA Compiler
Original author(s)	Nvidia
Type	compiler
License	proprietary software
Website	docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/#introduction

Last updated September 13, 2020

Nvidia CUDA Compiler (NVCC) is a proprietary compiler by Nvidia intended for use with CUDA. CUDA code runs on both the CPU and GPU. NVCC separates these two parts and sends host code (the part of code which will be run on the CPU) to a C compiler like GCC or Intel C++ Compiler (ICC) or Microsoft Visual C Compiler, and sends the device code (the part which will run on the GPU) to the GPU. The device code is further compiled by NVCC. NVCC is based on LLVM.^[1] According to Nvidia provided documentation, nvcc in version 7.0 supports many language constructs that are defined by the C++11 standard and a few C99 features as well. In version 9.0 several more constructs from the C++14 standard are supported.^[2]

Any source file containing CUDA language extensions (.cu) must be compiled with nvcc. NVCC is a compiler driver which works by invoking all the necessary tools and compilers like cudacc, g++, cl, etc. NVCC can output either C code (CPU Code) that must then be compiled with the rest of the application using another tool or PTX or object code directly. An executable with CUDA code requires: the CUDA core library (cuda) and the CUDA runtime library (cudart).

Other widely used libraries:

CUBLAS: BLAS implementation
CUFFT: FFT implementation
CUDPP (Data Parallel Primitives): Reduction, Scan, Sort.
Thrust: Reduction, Scan, Sort.

Related Research Articles

The LLVM compiler infrastructure project is a set of compiler and toolchain technologies, which can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. In addition, even a single GPU-CPU framework provides advantages that multiple CPUs on their own do not offer due to the specialization in each chip.

Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source software implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.

A free and open-source graphics device driver is a software stack which controls computer-graphics hardware and supports graphics-rendering application programming interfaces (APIs) and is released under a free and open-source software license. Graphics device drivers are written for specific hardware to work within a specific operating system kernel and to support a range of APIs used by applications to access the graphics hardware. They may also control output to the display if the display driver is part of the graphics hardware. Most free and open-source graphics device drivers are developed by the Mesa project. The driver is made up of a compiler, a rendering API, and software which manages access to the graphics hardware.

Wen-mei Hwu is the Walter J. Sanders III-AMD Endowed Chair professor in Electrical and Computer Engineering in the Coordinated Science Laboratory at the University of Illinois at Urbana-Champaign. His research is on compiler design, computer architecture, computer microarchitecture, and parallel processing. He is a principal investigator for the petascale Blue Waters supercomputer, is co-director of the Universal Parallel Computing Research Center (UPCRC), and is principal investigator for the first NVIDIA CUDA Center of Excellence at UIUC. At the Illinois Coordinated Science Lab, Hwu leads the IMPACT Research Group and is director of the OpenIMPACT project – which has delivered new compiler and computer architecture technologies to the computer industry since 1987. From 1997 to 1999, Hwu served as the chairman of the Computer Engineering Program at Illinois. Since 2009, Hwu has served as chief technology officer at MulticoreWare Inc., leading the development of compiler tools for heterogeneous platforms. The OpenCL compilers developed by his team at MulticoreWare are based on the LLVM framework and have been deployed by leading semiconductor companies.

CUDA is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU. The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.

PGI was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, NVIDIA Corporation acquired The Portland Group, Inc. As of August 5, 2020, the "PGI Compilers and Tools" technology is a part of the NVIDIA HPC SDK product available as a free download from NVIDIA.

Clang is a compiler front end for the C, C++, Objective-C and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA and HIP frameworks. It uses the LLVM compiler infrastructure as its back end and has been part of the LLVM release cycle since LLVM 2.6.

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

AccelerEyes, doing business as ArrayFire, is an American software company that develops programming tools for parallel computing and graphics on graphics processing unit (GPU) chipsets. Its products are particularly popular in the defense industry.

Parallel Thread Execution is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The nvcc compiler translates code written in CUDA, a C++-like language, into PTX instructions, and the graphics driver contains a compiler which translates the PTX instructions into a binary code which can be run on the processing cores of Nvidia GPUs. The GNU Compiler Collection also has basic ability for PTX generation in the context of OpenMP offloading. Inline PTX assembly can be used in CUDA.

RenderScript is a component of the Android operating system for mobile devices that offers an API for acceleration that takes advantage of heterogeneous hardware. It allows developers to increase the performance of their applications at the cost of writing more complex (lower-level) code.

OpenACC is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.

Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with only minor code changes.

MulticoreWare Inc is a software development company, offering products and services related to (i) HEVC video compression, (ii) machine learning, (iii) compilers for heterogeneous computing, and (iv) software performance optimization. MulticoreWare's customers include Amazon, AMD, ARM, Microsoft, Google, Telestream and BBright Technologies. MulticoreWare was founded in 2009 and today has offices in 3 countries – USA, China and India.

GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is entirely open source software, unlike GameWorks which was heavily criticized for its proprietary and closed nature.

SYCL is a higher-level programming model for OpenCL as a single-source domain specific embedded language (DSEL) based on pure C++11 for SYCL 1.2.1 to improve programming productivity. This is a standard developed by Khronos Group, announced in March 2014.

References

↑ "CUDA LLVM Compiler". NVIDIA Corporation. 7 May 2012. Retrieved Apr 6, 2016.
↑ "CUDA C Programming Guide". docs.nvidia.com. Retrieved 2019-06-28.

David B. Kirk, and Wen-mei W. Hwu. Programming massively parallel processors: a hands-on approach. Morgan Kaufmann, 2010.
Nvidia Documentation on nvcc. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/
CUDPP. https://web.archive.org/web/20181117222643/http://gpgpu.org/developer/cudpp

This computing article is a stub. You can help Wikipedia by expanding it.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] "CUDA LLVM Compiler". NVIDIA Corporation. 7 May 2012. Retrieved Apr 6, 2016.

[2] "CUDA C Programming Guide". docs.nvidia.com. Retrieved 2019-06-28.

NVIDIA CUDA Compiler

See also

Related Research Articles

References