Numba

Last updated
Original author(s) Continuum Analytics
Developer(s) Community project
Initial release15 August 2012;11 years ago (2012-08-15)
Stable release
0.59.0 [1]   OOjs UI icon edit-ltr-progressive.svg / 7 February 2024;13 days ago (7 February 2024)
Repository
Written in Python, C
Operating system Cross-platform
Type Technical computing
License BSD 2-clause
Website numba.pydata.org

Numba is an open-source JIT compiler that translates a subset of Python and NumPy into fast machine code using LLVM, via the llvmlite Python package. It offers a range of options for parallelising Python code for CPUs and GPUs, often with only minor code changes.

Contents

Numba was started by Travis Oliphant in 2012 and has since been under active development at its repository in GitHub with frequent releases. The project is driven by developers at Anaconda, Inc., with support by DARPA, the Gordon and Betty Moore Foundation, Intel, Nvidia and AMD, and a community of contributors on GitHub.

Example

Numba can be used by simply applying the numba.jit decorator to a Python function that does numerical computations:

importnumbaimportrandom@numba.jitdefmonte_carlo_pi(n_samples:int)->float:"""Monte Carlo"""acc=0foriinrange(n_samples):x=random.random()y=random.random()if(x**2+y**2)<1.0:acc+=1return4.0*acc/n_samples

The just-in-time compilation happens transparently when the function is called:

>>> monte_carlo_pi(1000000)3.14

Numba's website contains many more examples, as well as information on how to get good performance from Numba.

GPU support

Numba can compile Python functions to GPU code. Initially two backends are available:

Since release 0.56.4, [2] AMD ROCm HSA has been officially moved to unmaintained status and a separate repository stub has been created for it.

Alternative approaches

Numba is one approach to make Python fast, by compiling specific functions that contain Python and Numpy code. Many alternative approaches for fast numeric computing with Python exist, such as Cython, Pythran, and PyPy.

Related Research Articles

<span class="mw-page-title-main">NumPy</span> Python library for numerical programming

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. The predecessor of NumPy, Numeric, was originally created by Jim Hugunin with contributions from several other developers. In 2005, Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is open-source software and has many contributors. NumPy is a NumFOCUS fiscally sponsored project.

<span class="mw-page-title-main">LLVM</span> Compiler backend for multiple programming languages

LLVM is a set of compiler and toolchain technologies that can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-independent intermediate representation (IR) that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The name LLVM originally stood for Low Level Virtual Machine, though the project has expanded and the name is no longer officially an initialism.

<span class="mw-page-title-main">PyPy</span> Alternative implementation of the Python programming language

PyPy is an implementation of the Python programming language. PyPy often runs faster than the standard implementation CPython because PyPy uses a just-in-time compiler. Most Python code runs well on PyPy except for code that depends on CPython extensions, which either does not work or incurs some overhead when run in PyPy. PyPy itself is built using a technique known as meta-tracing, which is a mostly automatic transformation that takes an interpreter as input and produces a tracing just-in-time compiler as output. Since interpreters are usually easier to write than compilers, but run slower, this technique can make it easier to produce efficient implementations of programming languages. PyPy's meta-tracing toolchain is called RPython.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

CUDA is a proprietary and closed-source parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general-purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements for the execution of compute kernels.

<span class="mw-page-title-main">OpenCL</span> Open standard for programming heterogenous computing systems, such as CPUs or GPUs

OpenCL is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-programmable gate arrays (FPGAs) and other processors or hardware accelerators. OpenCL specifies programming languages for programming these devices and application programming interfaces (APIs) to control the platform and execute programs on the compute devices. OpenCL provides a standard interface for parallel computing using task- and data-based parallelism.

<span class="mw-page-title-main">Cython</span> Programming language

Cython is a superset of the programming language Python, which allows developers to write Python code that yields performance comparable to that of C.

Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.

C++ Accelerated Massive Parallelism is a native programming model that contains elements that span the C++ programming language and its runtime library. It provides an easy way to write programs that compile and execute on data-parallel hardware, such as graphics cards (GPUs).

<span class="mw-page-title-main">Julia (programming language)</span> Dynamic programming language

Julia is a high-level, general-purpose dynamic programming language, most commonly used for numerical analysis and computational science. Distinctive aspects of Julia's design include a type system with parametric polymorphism and the use of multiple dispatch as a core programming paradigm, efficient garbage collection, and a just-in-time (JIT) compiler.

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA Foundation, which includes AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories.

<span class="mw-page-title-main">GraalVM</span> Virtual machine software

GraalVM is a Java Development Kit (JDK) written in Java. The open-source distribution of GraalVM is based on OpenJDK, and the enterprise distribution is based on Oracle JDK. As well as just-in-time (JIT) compilation, GraalVM can compile a Java application ahead of time. This allows for faster initialization, greater runtime performance, and decreased resource consumption, but the resulting executable can only run on the platform it was compiled for. It provides additional programming languages and execution modes. The first production-ready release, GraalVM 19.0, was distributed in May 2019. The most recent release is GraalVM for JDK 21, made available in September 2023.

<span class="mw-page-title-main">GPUOpen</span> Middleware software suite

GPUOpen is a middleware software suite originally developed by AMD's Radeon Technologies Group that offers advanced visual effects for computer games. It was released in 2016. GPUOpen serves as an alternative to, and a direct competitor of Nvidia GameWorks. GPUOpen is similar to GameWorks in that it encompasses several different graphics technologies as its main components that were previously independent and separate from one another. However, GPUOpen is partially open source software, unlike GameWorks which is proprietary and closed.

The following table compares notable software frameworks, libraries and computer programs for deep learning.

<span class="mw-page-title-main">SYCL</span> Higher-level programming standard for heterogeneous computing

SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, announced in March 2014.

PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is recognized as one of the two most popular machine learning libraries alongside TensorFlow, offering free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

<span class="mw-page-title-main">ROCm</span> Parallel computing platform: GPGPU libraries and application programming interface

ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP, OpenMP/Message Passing Interface (MPI), OpenCL.

CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.

<span class="mw-page-title-main">Google JAX</span> Machine Learning framework designed for parallelization and autograd.

Google JAX is a machine learning framework for transforming numerical functions. It is described as bringing together a modified version of autograd and TensorFlow's XLA. It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary functions of JAX are:

  1. grad: automatic differentiation
  2. jit: compilation
  3. vmap: auto-vectorization
  4. pmap: SPMD programming

References

  1. "Release 0.59.0". 7 February 2024. Retrieved 20 February 2024.
  2. "Release Notes — Numba 0.56.4+0.g288a38bbd.dirty-py3.7-linux-x86_64.egg documentation".