Differentiable programming

Last updated December 05, 2024

Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation.^[1]^[2]^[3]^[4]^[5] This allows for gradient-based optimization of parameters in the program, often via gradient descent, as well as other learning approaches that are based on higher order derivative information. Differentiable programming has found use in a wide variety of areas, particularly scientific computing and machine learning.^[5] One of the early proposals to adopt such a framework in a systematic fashion to improve upon learning algorithms was made by the Advanced Concepts Team at the European Space Agency in early 2016.^[6]

Approaches

Most differentiable programming frameworks work by constructing a graph containing the control flow and data structures in the program.^[7] Attempts generally fall into two groups:

Static, compiled graph-based approaches such as TensorFlow,^{[note 1]} Theano, and MXNet. They tend to allow for good compiler optimization and easier scaling to large systems, but their static nature limits interactivity and the types of programs that can be created easily (e.g. those involving loops or recursion), as well as making it harder for users to reason effectively about their programs.^[7] A proof of concept compiler toolchain called Myia uses a subset of Python as a front end and supports higher-order functions, recursion, and higher-order derivatives.^[8]^[9]^[10]

Operator overloading, dynamic graph based approaches such as PyTorch, NumPy's autograd package as well as Pyaudi. Their dynamic and interactive nature lets most programs be written and reasoned about more easily. However, they lead to interpreter overhead (particularly when composing many small operations), poorer scalability, and reduced benefit from compiler optimization.^[9]^[10]

The use of Just-in-Time compilation has emerged recently as a possible solution to overcome some of the bottlenecks of interpreted languages. The C++ heyoka and python package heyoka.py make large use of this technique to offer advanced differentiable programming capabilities (also at high orders). A package for the Julia programming language – Zygote – works directly on Julia's intermediate representation. ^[7]^[11]^[5]

A limitation of earlier approaches is that they are only able to differentiate code written in a suitable manner for the framework, limiting their interoperability with other programs. Newer approaches resolve this issue by constructing the graph from the language's syntax or IR, allowing arbitrary code to be differentiated. ^[7]^[9]

Applications

Differentiable programming has been applied in areas such as combining deep learning with physics engines in robotics,^[12] solving electronic structure problems with differentiable density functional theory,^[13] differentiable ray tracing,^[14] image processing,^[15] and probabilistic programming.^[5]

Multidisciplinary application

Differentiable programming is making significant strides in various fields beyond its traditional applications. In healthcare and life sciences, for example, it is being used for deep learning in biophysics-based modelling of molecular mechanisms. This involves leveraging differentiable programming in areas such as protein structure prediction and drug discovery. These applications demonstrate the potential of differentiable programming in contributing to significant advancements in understanding complex biological systems and improving healthcare solutions.^[16]

Notes

↑ TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

Related Research Articles

In computing, reactive programming is a declarative programming paradigm concerned with data streams and the propagation of change. With this paradigm, it is possible to express static or dynamic data streams with ease, and also communicate that an inferred dependency within the associated execution model exists, which facilitates the automatic propagation of the changed data flow.

Learning to rank or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. Training data may, for example, consist of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment for each item. The goal of constructing the ranking model is to rank new, unseen lists in a similar way to rankings in the training data.

Probabilistic programming (PP) is a programming paradigm in which probabilistic models are specified and inference for these models is performed automatically. It represents an attempt to unify probabilistic modeling and traditional general purpose programming in order to make the former easier and more widely applicable. It can be used to create systems that help make decisions in the face of uncertainty.

Theano is a Python library and optimizing compiler for manipulating and evaluating mathematical expressions, especially matrix-valued ones. In Theano, computations are expressed using a NumPy-esque syntax and compiled to run efficiently on either CPU or GPU architectures.

TensorFlow is a software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for training and inference of neural networks. It is one of the most popular deep learning frameworks, alongside others such as PyTorch and PaddlePaddle. It is free and open-source software released under the Apache License 2.0.

The following tables compare notable software frameworks, libraries, and computer programs for deep learning applications.

Keras is an open-source library that provides a Python interface for artificial neural networks. Keras was first independent software, then integrated into the TensorFlow library, and later supporting more. "Keras 3 is a full rewrite of Keras [and can be used] as a low-level cross-framework language to develop custom components such as layers, models, or metrics that can be used in native workflows in JAX, TensorFlow, or PyTorch — with one codebase." Keras 3 will be the default Keras version for TensorFlow 2.16 onwards, but Keras 2 can still be used.

Chainer is an open source deep learning framework written purely in Python on top of NumPy and CuPy Python libraries. The development is led by Japanese venture company Preferred Networks in partnership with IBM, Intel, Microsoft, and Nvidia.

PyTorch is a machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is one of the most popular deep learning frameworks, alongside others such as TensorFlow and PaddlePaddle, offering free and open-source software released under the modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts.

SqueezeNet is a deep neural network for image classification released in 2016. SqueezeNet was developed by researchers at DeepScale, University of California, Berkeley, and Stanford University. In designing SqueezeNet, the authors' goal was to create a smaller neural network with fewer parameters while achieving competitive accuracy. Their best-performing model achieved the same accuracy as AlexNet on ImageNet classification, but has a size 510x less than it.

<span class="mw-page-title-main">Flux (machine-learning framework)</span> Open-source machine-learning software library

Flux is an open-source machine-learning software library and ecosystem written in Julia. Its current stable release is v0.14.5 . It has a layer-stacking-based interface for simpler models, and has a strong support on interoperability with other Julia packages instead of a monolithic design. For example, GPU support is implemented transparently by CuArrays.jl. This is in contrast to some other machine learning frameworks which are implemented in other languages with Julia bindings, such as TensorFlow.jl, and thus are more limited by the functionality present in the underlying implementation, which is often in C or C++. Flux joined NumFOCUS as an affiliated project in December of 2021.

DeepSpeed is an open source deep learning optimization library for PyTorch.

Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the MIT License and can be accessed from the GitHub repository.

Neuro-symbolic AI is a type of artificial intelligence that integrates neural and symbolic AI architectures to address the weaknesses of each, providing a robust AI capable of reasoning, learning, and cognitive modeling. As argued by Leslie Valiant and others, the effective construction of rich computational cognitive models demands the combination of symbolic reasoning and efficient machine learning. Gary Marcus argued, "We cannot construct rich cognitive models in an adequate, automated way without the triumvirate of hybrid architecture, rich prior knowledge, and sophisticated techniques for reasoning." Further, "To build a robust, knowledge-driven approach to AI we must have the machinery of symbol manipulation in our toolkit. Too much useful knowledge is abstract to proceed without tools that represent and manipulate abstraction, and to date, the only known machinery that can manipulate such abstract knowledge reliably is the apparatus of symbol manipulation."

Mixed-precision arithmetic is a form of floating-point arithmetic that uses numbers with varying widths in a single operation.

<span class="mw-page-title-main">Knowledge graph embedding</span> Dimensionality reduction of graph-based semantic data objects [machine learning task]

In representation learning, knowledge graph embedding (KGE), also referred to as knowledge representation learning (KRL), or multi-relation learning, is a machine learning task of learning a low-dimensional representation of a knowledge graph's entities and relations while preserving their semantic meaning. Leveraging their embedded representation, knowledge graphs (KGs) can be used for various applications such as link prediction, triple classification, entity recognition, clustering, and relation extraction.

A graph neural network (GNN) belongs to a class of artificial neural networks for processing data that can be represented as graphs.

Google JAX is a machine learning framework for transforming numerical functions. It is described as bringing together a modified version of autograd and TensorFlow's XLA. It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary functions of JAX are:

grad: automatic differentiation
jit: compilation
vmap: auto-vectorization
pmap: Single program, multiple data (SPMD) programming

References

↑ Izzo, Dario; Biscani, Francesco; Mereta, Alessio (2017). "Differentiable Genetic Programming". Genetic Programming. Lecture Notes in Computer Science. Vol. 10196. pp. 35–51. arXiv: 1611.04766 . doi:10.1007/978-3-319-55696-3_3. ISBN 978-3-319-55695-6. S2CID 17786263.
↑ Baydin, Atilim Gunes; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark (2018). "Automatic Differentiation in Machine Learning: a Survey". Journal of Marchine Learning Research. 18 (153): 1–43.
↑ Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018). "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF). In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K (eds.). NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates. pp. 10201–10212.
↑ Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018. Archived from the original (PDF) on 2019-07-17. Retrieved 2019-07-04.
1 2 3 4 Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019). "A Differentiable Programming System to Bridge Machine Learning and Scientific Computing". arXiv: 1907.07587 [cs.PL].
↑ "Differential Intelligence". October 2016. Retrieved 2022-10-19.
1 2 3 4 Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Marco Concetto Rudilosso; Neethu Mariya Joy; Karmali, Tejan; Pal, Avik; Shah, Viral (2018). "Fashionable Modelling with Flux". arXiv: 1811.01457 [cs.PL].
↑ Merriënboer, Bart van; Breuleux, Olivier; Bergeron, Arnaud; Lamblin, Pascal (3 December 2018). "Automatic differentiation in ML: where we are and where we should be going". NIPS'18 . Vol. 31. pp. 8771–81.
1 2 3 Breuleux, O.; van Merriënboer, B. (2017). "Automatic Differentiation in Myia" (PDF). Archived from the original (PDF) on 2019-06-24. Retrieved 2019-06-24.
1 2 "TensorFlow: Static Graphs". Tutorials: Learning PyTorch. PyTorch.org. Retrieved 2019-03-04.
↑ Innes, Michael (2018). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv: 1810.07951 [cs.PL].
↑ Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv: 1611.01652 [cs.NE].
↑ Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. arXiv: 2009.08551 . Bibcode:2021PhRvL.126c6401L. doi: 10.1103/PhysRevLett.126.036401 . PMID 33543980.
↑ Li, Tzu-Mao; Aittala, Miika; Durand, Frédo; Lehtinen, Jaakko (2018). "Differentiable Monte Carlo Ray Tracing through Edge Sampling". ACM Transactions on Graphics. 37 (6): 222:1–11. doi: 10.1145/3272127.3275109 . S2CID 52839714.
↑ Li, Tzu-Mao; Gharbi, Michaël; Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan (August 2018). "Differentiable Programming for Image Processing and Deep Learning in Halide". ACM Transactions on Graphics. 37 (4): 139:1–13. doi: 10.1145/3197517.3201383 . S2CID 46927588.
↑ AlQuraishi, Mohammed; Sorger, Peter K. (October 2021). "Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms". Nature Methods. 18 (10): 1169–1180. doi:10.1038/s41592-021-01283-4. PMC 8793939 . PMID 34608321.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[8] TensorFlow 1 uses the static graph approach, whereas TensorFlow 2 uses the dynamic graph approach by default.

[izzo2016_dCGP-1] Izzo, Dario; Biscani, Francesco; Mereta, Alessio (2017). "Differentiable Genetic Programming". Genetic Programming. Lecture Notes in Computer Science. Vol. 10196. pp. 35–51. arXiv: 1611.04766 . doi:10.1007/978-3-319-55696-3_3. ISBN 978-3-319-55695-6. S2CID 17786263.

[baydin2018automatic-2] Baydin, Atilim Gunes; Pearlmutter, Barak A.; Radul, Alexey Andreyevich; Siskind, Jeffrey Mark (2018). "Automatic Differentiation in Machine Learning: a Survey". Journal of Marchine Learning Research. 18 (153): 1–43.

[3] Wang, Fei; Decker, James; Wu, Xilun; Essertel, Gregory; Rompf, Tiark (2018). "Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming" (PDF). In Bengio, S.; Wallach, H.; Larochelle, H.; Grauman, K (eds.). NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Curran Associates. pp. 10201–10212.

[innes-4] Innes, Mike (2018). "On Machine Learning and Programming Languages" (PDF). SysML Conference 2018. Archived from the original (PDF) on 2019-07-17. Retrieved 2019-07-04.

[diffprog-zygote-5] 1 2 3 4 Innes, Mike; Edelman, Alan; Fischer, Keno; Rackauckas, Chris; Saba, Elliot; Viral B Shah; Tebbutt, Will (2019). "A Differentiable Programming System to Bridge Machine Learning and Scientific Computing". arXiv: 1907.07587 [cs.PL].

[differential_intelligence-6] "Differential Intelligence". October 2016. Retrieved 2022-10-19.

[flux-7] 1 2 3 4 Innes, Michael; Saba, Elliot; Fischer, Keno; Gandhi, Dhairya; Marco Concetto Rudilosso; Neethu Mariya Joy; Karmali, Tejan; Pal, Avik; Shah, Viral (2018). "Fashionable Modelling with Flux". arXiv: 1811.01457 [cs.PL].

[9] Merriënboer, Bart van; Breuleux, Olivier; Bergeron, Arnaud; Lamblin, Pascal (3 December 2018). "Automatic differentiation in ML: where we are and where we should be going". NIPS'18 . Vol. 31. pp. 8771–81.

[myia1-10] 1 2 3 Breuleux, O.; van Merriënboer, B. (2017). "Automatic Differentiation in Myia" (PDF). Archived from the original (PDF) on 2019-06-24. Retrieved 2019-06-24.

[pytorchtut-11] 1 2 "TensorFlow: Static Graphs". Tutorials: Learning PyTorch. PyTorch.org. Retrieved 2019-03-04.

[12] Innes, Michael (2018). "Don't Unroll Adjoint: Differentiating SSA-Form Programs". arXiv: 1810.07951 [cs.PL].

[13] Degrave, Jonas; Hermans, Michiel; Dambre, Joni; wyffels, Francis (2016). "A Differentiable Physics Engine for Deep Learning in Robotics". arXiv: 1611.01652 [cs.NE].

[Li2021-14] Li, Li; Hoyer, Stephan; Pederson, Ryan; Sun, Ruoxi; Cubuk, Ekin D.; Riley, Patrick; Burke, Kieron (2021). "Kohn-Sham Equations as Regularizer: Building Prior Knowledge into Machine-Learned Physics". Physical Review Letters. 126 (3): 036401. arXiv: 2009.08551 . Bibcode:2021PhRvL.126c6401L. doi: 10.1103/PhysRevLett.126.036401 . PMID 33543980.

[15] Li, Tzu-Mao; Aittala, Miika; Durand, Frédo; Lehtinen, Jaakko (2018). "Differentiable Monte Carlo Ray Tracing through Edge Sampling". ACM Transactions on Graphics. 37 (6): 222:1–11. doi: 10.1145/3272127.3275109 . S2CID 52839714.

[16] Li, Tzu-Mao; Gharbi, Michaël; Adams, Andrew; Durand, Frédo; Ragan-Kelley, Jonathan (August 2018). "Differentiable Programming for Image Processing and Deep Learning in Halide". ACM Transactions on Graphics. 37 (4): 139:1–13. doi: 10.1145/3197517.3201383 . S2CID 46927588.

[17] AlQuraishi, Mohammed; Sorger, Peter K. (October 2021). "Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms". Nature Methods. 18 (10): 1169–1180. doi:10.1038/s41592-021-01283-4. PMC 8793939 . PMID 34608321.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[note 1]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

v t e Differentiable computing
General	Differentiable programming Information geometry Statistical manifold Automatic differentiation Neuromorphic computing Pattern recognition Ricci calculus Computational learning theory Inductive bias
Hardware	IPU TPU VPU Memristor SpiNNaker
Software libraries	TensorFlow PyTorch Keras scikit-learn Theano JAX Flux.jl MindSpore
Portals Computer programming Technology