Original author(s) | Seiya Tokui |
---|---|
Developer(s) | Community, Preferred Networks, Inc. |
Initial release | September 2, 2015 . [1] |
Stable release | |
Preview release | |
Repository | github |
Written in | Python, Cython, CUDA |
Operating system | Linux, Windows |
Platform | Cross-platform |
Type | Numerical analysis |
License | MIT |
Website | cupy |
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. [3] CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0. [4] [5]
CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017. [6]
CuPy is a part of the NumPy ecosystem array libraries [7] and is widely adopted to utilize GPU with Python, [8] especially in high-performance computing environments such as Summit, [9] Perlmutter, [10] EULER, [11] and ABCI. [12]
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs. [14] [15]
The same set of APIs defined in the NumPy package (numpy.*
) are available under cupy.*
package.
cupy.ndarray
) for boolean, integer, float, and complex data typesThe same set of APIs defined in the SciPy package (scipy.*
) are available under cupyx.scipy.*
package.
cupyx.scipy.sparse.*_matrix
) of CSR, COO, CSC, and DIA formatcupyx.distributed
), providing collective and peer-to-peer primitives>>> importcupyascp>>> x=cp.array([1,2,3])>>> xarray([1, 2, 3])>>> y=cp.arange(10)>>> yarray([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> importcupyascp>>> x=cp.arange(12).reshape(3,4).astype(cp.float32)>>> xarray([[ 0., 1., 2., 3.], [ 4., 5., 6., 7.], [ 8., 9., 10., 11.]], dtype=float32)>>> x.sum(axis=1)array([ 6., 22., 38.], dtype=float32)
>>> importcupyascp>>> kern=cp.RawKernel(r'''... extern "C" __global__... void multiply_elemwise(const float* in1, const float* in2, float* out) {... int tid = blockDim.x * blockIdx.x + threadIdx.x;... out[tid] = in1[tid] * in2[tid];... }... ''','multiply_elemwise')>>> in1=cp.arange(16,dtype=cp.float32).reshape(4,4)>>> in2=cp.arange(16,dtype=cp.float32).reshape(4,4)>>> out=cp.zeros((4,4),dtype=cp.float32)>>> kern((4,),(4,),(in1,in2,out))# grid, block and arguments>>> outarray([[ 0., 1., 4., 9.], [ 16., 25., 36., 49.], [ 64., 81., 100., 121.], [144., 169., 196., 225.]], dtype=float32)
SciPy is a free and open-source Python library used for scientific computing and technical computing.
NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.
General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.
CUDA is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.
nouveau is a free and open-source graphics device driver for Nvidia video cards and the Tegra family of SoCs written by independent software engineers, with minor help from Nvidia employees.
IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history. IPython provides the following features:
OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.
Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.
GPULib is discontinued and unsupported software library developed by Tech-X Corporation for accelerating general-purpose scientific computations from within the Interactive Data Language (IDL) using Nvidia's CUDA platform for programming its graphics processing units (GPUs). GPULib provides basic arithmetic, array indexing, special functions, Fast Fourier Transforms (FFT), interpolation, BLAS matrix operations as well as LAPACK routines provided by MAGMA, and some image processing operations. All numeric data types provided by IDL are supported. GPULib is used in medical imaging, optics, astronomy, earth science, remote sensing, and other scientific areas.
Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with only minor code changes.
GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is entirely open source software, unlike GameWorks which is proprietary and closed.
The following table compares notable software frameworks, libraries and computer programs for deep learning.
SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.
Nvidia NVDEC is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU.
ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), OpenCL.
Dask is an open-source Python library for parallel computing. Dask scales Python code from multi-core local machines to large distributed clusters in the cloud. Dask provides a familiar user interface by mirroring the APIs of other libraries in the PyData ecosystem including: Pandas, scikit-learn and NumPy. It also exposes low-level APIs that help programmers run custom algorithms in parallel.
oneAPI is an open standard, trademarked by Intel, for a unified application programming interface intended to be used across different compute accelerator (coprocessor) architectures, including GPUs, AI accelerators and field-programmable gate arrays. It is intended to eliminate the need for developers to maintain separate code bases, multiple programming languages, and different tools and workflows for each architecture.
Google JAX is a machine learning framework for transforming numerical functions. It is described as bringing together a modified version of autograd and TensorFlow's XLA. It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary functions of JAX are:
Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing.