CuPy | |
---|---|
![]() | |
Original author(s) | Seiya Tokui |
Developer(s) | Community, Preferred Networks, Inc. |
Initial release | September 2, 2015 . [1] |
Stable release | |
Repository | github |
Written in | Python, Cython, CUDA |
Operating system | Linux, Windows |
Platform | Cross-platform |
Type | Numerical analysis |
License | MIT |
Website | cupy |
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. [3] CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports Nvidia CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0. [4] [5]
CuPy has been initially developed as a backend of Chainer deep learning framework, and later established as an independent project in 2017. [6]
CuPy is a part of the NumPy ecosystem array libraries [7] and is widely adopted to utilize GPU with Python, [8] especially in high-performance computing environments such as Summit, [9] Perlmutter, [10] EULER, [11] and ABCI. [12]
CuPy implements NumPy/SciPy-compatible APIs, as well as features to write user-defined GPU kernels or access low-level APIs. [14] [15]
The same set of APIs defined in the NumPy package (numpy.*
) are available under cupy.*
package.
cupy.ndarray
) for boolean, integer, float, and complex data typesThe same set of APIs defined in the SciPy package (scipy.*
) are available under cupyx.scipy.*
package.
cupyx.scipy.sparse.*_matrix
) of CSR, COO, CSC, and DIA formatcupyx.distributed
), providing collective and peer-to-peer primitivesimportcupyascpfromcupy.typingimportNDArrayx:NDArray[int]=cp.array([1,2,3])print(x)# prints array([1, 2, 3])y:NDArray[int]=cp.arange(10)print(y)# prints array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
importcupyascpfromcupy.typingimportNDArray,float32x:NDArray[float32]=cp.arange(12).reshape(3,4).astype(cp.float32)print(x)# prints: # array([[ 0., 1., 2., 3.],# [ 4., 5., 6., 7.],# [ 8., 9., 10., 11.]], dtype=float32)print(x.sum(axis=1))# prints array([ 6., 22., 38.], dtype=float32)
importcupyascpfromcupy.driverimportIn,Out,RawKernelkern:RawKernel=cp.RawKernel(r'''extern "C" __global__void multiply_elemwise(const float* in1, const float* in2, float* out) { int tid = blockDim.x * blockIdx.x + threadIdx.x; out[tid] = in1[tid] * in2[tid];}''','multiply_elemwise')in1:In=cp.arange(16,dtype=cp.float32).reshape(4,4)in2:In=cp.arange(16,dtype=cp.float32).reshape(4,4)out:Out=cp.zeros((4,4),dtype=cp.float32)kern((4,),(4,),(in1,in2,out))# grid, block and argumentsprint(out)# prints:# array([[ 0., 1., 4., 9.],# [ 16., 25., 36., 49.],# [ 64., 81., 100., 121.],# [144., 169., 196., 225.]], dtype=float32)
Most recently, CuPy, an open-source array library with Python, has expanded its traditional GPU support with the introduction of version 9.0 that now offers support for the ROCm stack for GPU-accelerated computing.