Owl Scientific Computing

Last updated
Original author(s) Liang Wang
Developer(s) Community project
Initial release2016 (2016)
Stable release
1.0.2 / 14 February 2022;22 months ago (2022-02-14) [1]
Repository https://github.com/owlbarn/owl
Written in OCaml, C
Operating system Cross-platform
Type Numerical analysis
License MIT
Website https://ocaml.xyz

Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. [2] The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. [3] The source code is licensed under the MIT License and can be accessed from the GitHub repository. [4]

Contents

The library is mostly designed and developed in the functional programming language OCaml. As a unique functional programming language, OCaml offers runtime efficiency, flexible module system, static type checking, intelligent garbage collector, and powerful type inference. Owl inherits these features directly from OCaml. With Owl, users can write succinct type-safe numerical applications in a concise functional language without sacrificing performance. It speeds up the development life-cycle, and reduces the cost from prototype to production use. The system serves as the de facto tool for computation intensive tasks in OCaml. [5]

History

Owl was developed when Dr. Liang Wang was working as a Post-Doc in the OCaml Labs. [6] Owl originated from a research project which studied the design of synchronous parallel machines for large-scale distributed computing in July 2016. Back then the libraries for numerical computing in OCaml ecosystem were very limited and the tooling was fragmented at that time. In order to test various analytical applications, many numerical functions had to be implemented, from very low level algebra and random number generators to the high level stuff like algorithmic differentiation and deep neural networks. These code snippets started accumulating. These functions were later taken out and wrapped into a standalone library named Owl.

Owl's architecture undertook at least a dozen of iterations in the beginning, and some of the architectural changes are quite drastic. After one-year intensive development, Owl was capable of doing many complicated numerical tasks (e.g. image classification). Dr. Liang Wang held a tutorial at the CUFP 2017 to demonstrate data science in OCaml. [7] In 2018, Prof. Richard Mortier gave a talk about Owl in the Alan Turing Institute. [8] To further promote OCaml and functional programming in data science, Owl provides abundant learning materials in the form of a details manual. [9]

Design and features

Owl has implemented many advanced numerical functions atop of its implementation of n-dimensional arrays. Compared to other numerical libraries, Owl is unique in many perspectives, e.g. algorithmic differentiation and distributed computing have been included as integral components in the core system to maximise developers' productivity. The figure below gives a bird view of Owl's system architecture. The subsystem on the left part is Owl's Numerical system. The modules contained in this subsystem fall into three categories.

The architecture of the Owl numerical library. Owl lib architecture.svg
The architecture of the Owl numerical library.

The first is core modules contains basic data structures, i.e., N-dimensional array (Ndarray) in both dense and sparse forms. The Ndarray module supports various number types: float32, float64, complex32, complex64, int16, int32, etc. Also, the core module provide foreign function interfaces to other low level numerical libraries, such as CBLAS and LAPACK. These libraries are fully interfaced to the Linear Algebra module.

The second category is the classic analytics modules. This part contains basic mathematical and statistical functions, linear algebra, regression, optimisation, plotting, etc. Advanced math and statistics functions such as statistical hypothesis testing and Markov chain Monte Carlo are also included. As a core functionality, Owl provides the algorithmic differentiation (or automatic differentiation) and dynamic computation graph modules.

The highest level in the Owl architecture includes modules more advanced numerical applications such as neural network, natural language processing, data processing etc. The Zoo system is used for efficient scripting and code sharing. The modules in the second category, especially the algorithmic differentiation, make the code at this level quite concise.

The subsystem on the right is called Actor Subsystem which extends Owl's capability to parallel and distributed computing. The core idea is to transform a user application from sequential execution mode into parallel mode (using various computation engines) with minimal efforts. The method is to compose two subsystems together with functors to generate the parallel version of the module defined in the numerical subsystem.

Besides what have been mentioned in this figure, there are several other features in Owl. For example, the JavaScript and unikernel backends, integration with other frameworks such as TensorFlow and PyTorch, utilising GPU and other accelerator frameworks via symbolic graph, etc.

Research

The Owl project is research oriented, and supports research of numerical computing in multiple related topics. Some of its research topics are listed below.

As result of research following part of these directions, Owl produces several publications. In 2018, a paper titled Data Analytics Service Composition and Deployment on Edge Devices is accepted at the ACM SIGCOMM 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. [14] Two talks are also accepted at the OCaml Workshop of the International Conference on Functional Programming 2019, on the topics of numerical ordinary differential equation solving, [15] and executing Owl computation on GPUs. [16] An internship in the OCaml Labs investigates the topic of image segmentation and related memory optimisation in Owl. [17] In 2022, the book <<OCaml Scientific Computing>> was published by Springer. [18]

See also

Related Research Articles

OCaml is a general-purpose, high-level, multi-paradigm programming language which extends the Caml dialect of ML with object-oriented features. OCaml was created in 1996 by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy, Ascánder Suárez, and others.

<span class="mw-page-title-main">Wolfram Mathematica</span> Computational software program

Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimization, plotting functions and various types of data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other programming languages. It was conceived by Stephen Wolfram, and is developed by Wolfram Research of Champaign, Illinois. The Wolfram Language is the programming language used in Mathematica. Mathematica 1.0 was released on June 23, 1988 in Champaign, Illinois and Santa Clara, California.

Computer science is the study of the theoretical foundations of information and computation and their implementation and application in computer systems. One well known subject classification system for computer science is the ACM Computing Classification System devised by the Association for Computing Machinery.

<span class="mw-page-title-main">Theoretical computer science</span> Subfield of computer science and mathematics

Theoretical computer science (TCS) is a subset of general computer science and mathematics that focuses on mathematical aspects of computer science such as the theory of computation, formal language theory, the lambda calculus and type theory.

<span class="mw-page-title-main">LAPACK</span> Software library for numerical linear algebra

LAPACK is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines.

General-purpose computing on graphics processing units is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing.

<span class="mw-page-title-main">Neural network</span> Structure in biology and artificial intelligence

A neural network is a neural circuit of biological neurons, sometimes also called a biological neural network, or a network of artificial neurons or nodes in the case of an artificial neural network.

<span class="mw-page-title-main">David Bader (computer scientist)</span> American computer scientist

David A. Bader is a Distinguished Professor and Director of the Institute for Data Science at the New Jersey Institute of Technology. Previously, he served as the Chair of the Georgia Institute of Technology School of Computational Science & Engineering, where he was also a founding professor, and the executive director of High-Performance Computing at the Georgia Tech College of Computing. In 2007, he was named the first director of the Sony Toshiba IBM Center of Competence for the Cell Processor at Georgia Tech.

<span class="mw-page-title-main">Hardware acceleration</span> Specialized computer hardware

Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calculated in software running on a generic CPU can also be calculated in custom-made hardware, or in some mix of both.

<span class="mw-page-title-main">CUDA</span> Parallel computing platform and programming model

CUDA is a proprietary and closed source parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.

In computing, algorithmic skeletons, or parallelism patterns, are a high-level parallel programming model for parallel and distributed computing.

MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions using adaptive and fast harmonic analysis methods with guaranteed precision based on multiresolution analysis and separated representations .

Eclipse Deeplearning4j is a programming library written in Java for the Java virtual machine (JVM). It is a framework with wide support for deep learning algorithms. Deeplearning4j includes implementations of the restricted Boltzmann machine, deep belief net, deep autoencoder, stacked denoising autoencoder and recursive neural tensor network, word2vec, doc2vec, and GloVe. These algorithms all include distributed parallel versions that integrate with Apache Hadoop and Spark.

<span class="mw-page-title-main">TensorFlow</span> Machine learning software library

TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.

The following table compares notable software frameworks, libraries and computer programs for deep learning.

This glossary of artificial intelligence is a list of definitions of terms and concepts relevant to the study of artificial intelligence, its sub-disciplines, and related fields. Related glossaries include Glossary of computer science, Glossary of robotics, and Glossary of machine vision.

<span class="mw-page-title-main">Google JAX</span> Machine Learning framework designed for parallelization and autograd.

Google JAX is a machine learning framework for transforming numerical functions. It is described as bringing together a modified version of autograd and TensorFlow's XLA. It is designed to follow the structure and workflow of NumPy as closely as possible and works with various existing frameworks such as TensorFlow and PyTorch. The primary functions of JAX are:

  1. grad: automatic differentiation
  2. jit: compilation
  3. vmap: auto-vectorization
  4. pmap: SPMD programming

References

  1. "Releases – owlbarn/owl" . Retrieved 2020-11-11 via GitHub.
  2. "Owl, Scientific Computing for OCaml" . Retrieved 2020-11-11.
  3. "System Research Group" . Retrieved 2020-11-11.
  4. Owlbarn GitHub repository, https://github.com/owlbarn/owl. Retrieved 2020-11-01.
  5. "OCamlverse: Machine Learning, Scientific Computing and Data Science" . Retrieved 2020-11-05.
  6. "OCaml Labs" . Retrieved 2020-11-01.
  7. "Owl: Data Science in OCaml". CUFP Tutorials. 2017. Retrieved 2020-11-01.
  8. "The design of functional numerical software". Alan Turing Institute. 2018. Retrieved 2020-11-05.
  9. "OCaml Scientific Computing: Functional Programming Meets Data Science". Owl Team. 2020. Retrieved 2020-11-18.
  10. Wang, Liang; Catterall, Ben; Mortier, Richard (2017). "Probabilistic Synchronous Parallel". arXiv: 1709.07772 [cs.DC].
  11. "Distributed Learning over Unreliable Networks". Proceedings of Machine Learning Research. 2019. Retrieved 2020-11-18.
  12. "JuliaDiff". Julia. 2019. Retrieved 2020-11-18.
  13. "Serving DNNs like Clockwork: Performance Predictability from the Bottom Up". Usenix. 2020. Retrieved 2020-11-18.
  14. "BIG-DAMA Workshop 2018 Programme". ACM. 2018. Retrieved 2020-11-05.
  15. "OwlDE: making ODEs first-class Owl citizens". OCaml Workshop, ICFP 2019. 2019. Retrieved 2020-11-11.
  16. "Executing Owl Computation on GPU and TPU". OCaml Workshop, ICFP 2019. 2019. Retrieved 2020-11-11.
  17. "Personal Webpage, Pierre Vandenhove". 2018. Retrieved 2020-11-05.
  18. "OCaml Scientific Computing - Functional Programming in Data Science and Artificial Intelligence". Springer. 2022. Retrieved 2022-02-15.