Original author(s) | Seiya Tokui |
---|---|
Developer(s) | Community, Preferred Networks, Inc. |
Initial release | June 9, 2015 . [1] [2] |
Stable release | 7.7.0 [3] / 30 July 2020 |
Repository | |
Written in | Python |
Platform | cross-platform |
Available in | Python |
Type | Deep learning library |
License | MIT |
Website | chainer |
Chainer is an open source deep learning framework written purely in Python on top of NumPy and CuPy Python libraries. The development is led by Japanese venture company Preferred Networks in partnership with IBM, Intel, Microsoft, and Nvidia. [4] [5] [6] [7]
Chainer is notable for its early adoption of "define-by-run" scheme, as well as its performance on large scale systems. [1] The first version was released in June 2015 and has gained large popularity in Japan since then. [1] [2] Furthermore, in 2017, it was listed by KDnuggets in top 10 open source machine learning Python projects. [8]
In December 2019, Preferred Networks announced the transition of its development effort from Chainer to PyTorch and it will only provide maintenance patches after releasing v7. [9]
Chainer was the first deep learning framework to introduce the define-by-run approach. [10] [11] The traditional procedure to train a network was in two phases: define the fixed connections between mathematical operations (such as matrix multiplication and nonlinear activations) in the network, and then run the actual training calculation. This is called the define-and-run or static-graph approach. Theano and TensorFlow are among the notable frameworks that took this approach. In contrast, in the define-by-run or dynamic-graph approach, the connection in a network is not determined when the training is started. The network is determined during the training as the actual calculation is performed.
One of the advantages of this approach is that it is intuitive and flexible. [12] If the network has complicated control flows such as conditionals and loops, in the define-and-run approach, specially designed operations for such constructs are needed. On the other hand, in the define-by-run approach, programming language's native constructs such as if statements and for loops can be used to describe such flow. This flexibility is especially useful to implement recurrent neural networks. [13] [14]
Another advantage is ease of debugging. [12] In the define-and-run approach, if an error (such as numeric error) has occurred in the training calculation, it is often difficult to inspect the fault, because the code written to define the network and the actual place of the error are separated. In the define-by-run approach, you can just suspend the calculation with the language's built-in debugger and inspect the data that flows on your code of the network.
Define-by-run has gained popularity since the introduction by Chainer and is now implemented in many other frameworks, including PyTorch [15] and TensorFlow. [12]
Chainer has four extension libraries, ChainerMN, ChainerRL, ChainerCV and ChainerUI. ChainerMN enables Chainer to be used on multiple GPUs with performance significantly faster than other deep learning frameworks. [1] A supercomputer running Chainer on 1024 GPUs processed 90 epochs of ImageNet dataset on ResNet-50 network in 15 minutes, which is four times faster than the previous record held by Facebook. [16] [17] ChainerRL adds state of art deep reinforcement learning algorithms, and ChainerUI is a management and visualization tool.
Chainer is used as the framework for PaintsChainer, a service which does automatic colorization of black and white, line only, draft drawings with minimal user input. [18] [19]
Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.
Torch is an open-source machine learning library, a scientific computing framework, and a script language based on the Lua programming language. It provides a wide range of algorithms for deep learning, and uses the scripting language LuaJIT, and an underlying C implementation. It was created at IDIAP at EPFL. As of 2018, Torch is no longer in active development. However PyTorch, which is based on the Torch library, is actively developed as of August 2022.
Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.
The following table compares notable software frameworks, libraries and computer programs for deep learning.
An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, internet of things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2018, a typical AI integrated circuit chip contains billions of MOSFET transistors. A number of vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.
Apache MXNet is an open-source deep learning software framework, used to train and deploy deep neural networks. It is scalable, allowing for fast model training and supports a flexible programming model and multiple programming languages. The MXNet library is portable and can scale to multiple GPUs as well as multiple machines. It was co-developed by Carlos Guestrin at University of Washington.
spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython. The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.
Caffe is a deep learning framework, originally developed at University of California, Berkeley. It is open source, under a BSD license. It is written in C++, with a Python interface.
PyTorch is an open source machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Meta AI. It is free and open-source software released under the Modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.
In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process. By contrast, the values of other parameters are learned.
The Open Neural Network Exchange (ONNX) [] is an open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and software tools to promote innovation and collaboration in the AI sector. ONNX is available on GitHub.
SqueezeNet is the name of a deep neural network for computer vision that was released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters that can more easily fit into computer memory and can more easily be transmitted over a computer network.
Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradient descent. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and artificial intelligence.
OpenVINO toolkit is a free toolkit facilitating the optimization of a deep learning model from a framework and deployment using an inference engine onto Intel hardware. The toolkit has two versions: OpenVINO toolkit, which is supported by open source community and Intel Distribution of OpenVINO toolkit, which is supported by Intel. OpenVINO was developed by Intel. The toolkit is cross-platform and free for use under Apache License version 2.0. The toolkit enables a write-once, deploy-anywhere approach to deep learning deployments on Intel platforms, including CPU, integrated GPU, Intel Movidius VPU, and FPGAs.
fast.ai is a non-profit research group focused on deep learning and artificial intelligence. It was founded in 2016 by Jeremy Howard and Rachel Thomas with the goal of democratising deep learning. They do this by providing a massive open online course (MOOC) named "Practical Deep Learning for Coders," which has no other prerequisites except for knowledge of the programming language Python.
NNI is a free and open source AutoML toolkit developed by Microsoft. It is used to automate feature engineering, model compression, neural architecture search, and hyper-parameter tuning.
Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the MIT License and can be accessed from the GitHub repository.
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the same API set as NumPy and SciPy, allowing it to be a drop-in replacement to run NumPy/SciPy code on GPU. CuPy supports NVIDIA CUDA GPU platform, and AMD ROCm GPU platform starting in v9.0.
Google JAX is a machine learning framework for transforming numerical functions. It is described as bringing together a modified version of autograd and TensorFlow's XLA. It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary functions of JAX are: