Comparison of Gaussian process software

Last updated

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Contents

This article is written from the point of view of Bayesian statistics, which may use a terminology different from the one commonly used in kriging. The next section should clarify the mathematical/computational meaning of the information provided in the table independently of contextual terminology.

Description of columns

This section details the meaning of the columns in the table below.

Solvers

These columns are about the algorithms used to solve the linear system defined by the prior covariance matrix, i.e., the matrix built by evaluating the kernel.

Input

These columns are about the points on which the Gaussian process is evaluated, i.e. if the process is .

Output

These columns are about the values yielded by the process, and how they are connected to the data used in the fit.

Hyperparameters

These columns are about finding values of variables which enter somehow in the definition of the specific problem but that can not be inferred by the Gaussian process fit, for example parameters in the formula of the kernel.

If both the "Prior" and "Posterior" cells contain "Manually", the software provides an interface for computing the marginal likelihood and its gradient w.r.t. hyperparameters, which can be feed into an optimization/sampling algorithm, e.g., gradient descent or Markov chain Monte Carlo.

Linear transformations

These columns are about the possibility of fitting datapoints simultaneously to a process and to linear transformations of it.

Comparison table

Name License Language SolversInputOutputHyperparametersLinear transformationsName
ExactSpecializedApproximateNDNon-realLikelihoodErrorsPriorPosteriorDeriv.FiniteSum
PyMC Apache Python YesKroneckerSparseNDNoAnyCorrelatedYesYesNoYesYes PyMC
Stan BSD, GPL customYesNoNoNDNoAnyCorrelatedYesYesNoYesYes Stan
scikit-learn BSD Python YesNoNoNDYesBernoulliUncorrelatedManuallyManuallyNoNoNo scikit-learn
fbm [7] Free C YesNoNoNDNoBernoulli, PoissonUncorrelated, StationaryManyYesNoNoYes fbm
GPML [8] [7] BSD MATLAB YesNoSparseNDNoManyi.i.d.ManuallyManuallyNoNoNo GPML
GPstuff [7] GNU GPL MATLAB, R YesSparse, MarkovSparseNDNoManyCorrelatedManyYesFirst RBFNoYes GPstuff
GPy [9] BSD Python YesNoSparseNDNoManyUncorrelatedYesYesNoNoNo GPy
GPflow [9] Apache Python YesNoSparseNDNoManyUncorrelatedYesYesNoNoNo GPflow
GPyTorch [10] MIT Python YesToeplitz, KroneckerSparseNDNoManyUncorrelatedYesYesFirst RBFManuallyManually GPyTorch
GPvecchia [11] GNU GPL R YesNoSparse, HierarchicalNDNoExponential familyUncorrelatedNoNoNoNoNo GPvecchia
pyGPs [12] BSD Python YesNoSparseNDGraphs, ManuallyBernoullii.i.d.ManuallyManuallyNoNoNo pyGPs
gptk [13] BSD R YesBlock?SparseNDNoGaussianNoManuallyManuallyNoNoNo gptk
celerite [3] MIT Python, Julia, C++ NoSemisep. [lower-alpha 1] No1DNoGaussianUncorrelatedManuallyManuallyNoNoNo celerite
george [6] MIT Python, C++ YesNoHierarchicalNDNoGaussianUncorrelatedManuallyManuallyNoNoManually george
neural-tangents [14] [lower-alpha 2] Apache Python YesBlock, KroneckerNoNDNoGaussianNoNoNoNoNoNo neural-tangents
DiceKriging [15] GNU GPL R YesNoNoNDNo?GaussianUncorrelatedSCAD RBFMAPNoNoNo DiceKriging
OpenTURNS [16] GNU LGPL Python, C++ YesNoNoNDNoGaussianUncorrelatedManually (no grad.)MAPNoNoNo OpenTURNS
UQLab [17] Proprietary MATLAB YesNoNoNDNoGaussianCorrelatedNoMAPNoNoNo UQLab
ooDACE Archived 2020-08-09 at the Wayback Machine [18] Proprietary MATLAB YesNoNoNDNoGaussianCorrelatedNoMAPNoNoNo ooDACE Archived 2020-08-09 at the Wayback Machine
DACE Proprietary MATLAB YesNoNoNDNoGaussianNoNoMAPNoNoNo DACE
GpGp MIT R NoNoSparseNDNoGaussiani.i.d.ManuallyManuallyNoNoNo GpGp
SuperGauss GNU GPL R, C++ NoToeplitz [lower-alpha 3] No1DNoGaussianNoManuallyManuallyNoNoNo SuperGauss
STK GNU GPL MATLAB YesNoNoNDNoGaussianUncorrelatedManuallyManuallyNoNoManually STK
GSTools GNU LGPL Python YesNoNoNDNoGaussianNoNoNoNoNoNo GSTools
PyKrige BSD Python YesNoNo2D,3DNoGaussiani.i.d.NoNoNoNoNo PyKrige
GPR Apache C++ YesNoSparseNDNoGaussiani.i.d.Some, ManuallyManuallyFirstNoNo GPR
celerite2 MIT Python NoSemisep. [lower-alpha 1] No1DNoGaussianUncorrelatedManually [lower-alpha 4] ManuallyNoNoYes celerite2
GPJax Apache Python YesNoSparseNDGraphsBernoulliNoYesYesNoNoNo GPJax
Stheno MIT Python YesLow rankSparseNDNoGaussiani.i.d.ManuallyManuallyApproximateNoYes Stheno
Name License Language ExactSpecializedApproximateNDNon-realLikelihoodErrorsPriorPosteriorDeriv.FiniteSumName
SolversInputOutputHyperparametersLinear transformations

Notes

  1. 1 2 celerite implements only a specific subalgebra of kernels which can be solved in . [3]
  2. neural-tangents is a specialized package for infinitely wide neural networks.
  3. SuperGauss implements a superfast Toeplitz solver with computational complexity .
  4. celerite2 has a PyMC3 interface.

Related Research Articles

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a popular technique for analyzing large datasets containing a high number of dimensions/features per observation, increasing the interpretability of data while preserving the maximum amount of information, and enabling the visualization of multidimensional data. Formally, PCA is a statistical technique for reducing the dimensionality of a dataset. This is accomplished by linearly transforming the data into a new coordinate system where the variation in the data can be described with fewer dimensions than the initial data. Many studies use the first two principal components in order to plot the data in two dimensions and to visually identify clusters of closely related data points. Principal component analysis has applications in many fields such as population genetics, microbiome studies, and atmospheric science.

<span class="mw-page-title-main">Pattern recognition</span> Automated recognition of patterns and regularities in data

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent pattern. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Summary of algorithms for nonlinear dimensionality reduction

Nonlinear dimensionality reduction, also known as manifold learning, refers to various related techniques that aim to project high-dimensional data onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

<span class="mw-page-title-main">Kernel method</span> Class of algorithms for pattern analysis

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

<span class="mw-page-title-main">Echo state network</span> Type of reservoir computer

An echo state network (ESN) is a type of reservoir computer that uses a recurrent neural network with a sparsely connected hidden layer. The connectivity and weights of hidden neurons are fixed and randomly assigned. The weights of output neurons can be learned so that the network can produce or reproduce specific temporal patterns. The main interest of this network is that although its behavior is non-linear, the only weights that are modified during training are for the synapses that connect the hidden neurons to output neurons. Thus, the error function is quadratic with respect to the parameter vector and can be differentiated easily to a linear system.

<span class="mw-page-title-main">Shogun (toolbox)</span>

Shogun is a free, open-source machine learning software library written in C++. It offers numerous algorithms and data structures for machine learning problems. It offers interfaces for Octave, Python, R, Java, Lua, Ruby and C# using SWIG.

There are many types of artificial neural networks (ANN).

mlpack

mlpack is a machine learning software library for C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. Its intended target users are scientists and engineers.

<span class="mw-page-title-main">Quantum machine learning</span> Interdisciplinary research area at the intersection of quantum physics and machine learning

Quantum machine learning is the integration of quantum algorithms within machine learning programs.

In the field of statistical learning theory, matrix regularization generalizes notions of vector regularization to cases where the object to be learned is a matrix. The purpose of regularization is to enforce conditions, for example sparsity or smoothness, that can produce stable predictive functions. For example, in the more common vector framework, Tikhonov regularization optimizes over

<span class="mw-page-title-main">Extreme learning machine</span> Type of artificial neural network

Extreme learning machines are feedforward neural networks for classification, regression, clustering, sparse approximation, compression and feature learning with a single layer or multiple layers of hidden nodes, where the parameters of hidden nodes need to be tuned. These hidden nodes can be randomly assigned and never updated, or can be inherited from their ancestors without being changed. In most cases, the output weights of hidden nodes are usually learned in a single step, which essentially amounts to learning a linear model.

<span class="mw-page-title-main">Outline of machine learning</span> Overview of and topical guide to machine learning

The following outline is provided as an overview of and topical guide to machine learning:

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of certain sequences of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The NNGP concept constitutes an intensional definition: Mathematically it is just a GP, one distinguished by how it is obtained.

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

References

  1. P. Cunningham, John; Gilboa, Elad; Saatçi, Yunus (Feb 2015). "Scaling Multidimensional Inference for Structured Gaussian Processes". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 424–436. doi:10.1109/TPAMI.2013.192. PMID   26353252. S2CID   6878550.
  2. Leith, D. J.; Zhang, Yunong; Leithead, W. E. (2005). "Time-series Gaussian Process Regression Based on Toeplitz Computation of O(N²) Operations and O(N)-level Storage". Proceedings of the 44th IEEE Conference on Decision and Control. pp. 3711–3716. doi:10.1109/CDC.2005.1582739. ISBN   0-7803-9567-0. S2CID   13627455.
  3. 1 2 3 Foreman-Mackey, Daniel; Angus, Ruth; Agol, Eric; Ambikasaran, Sivaram (9 November 2017). "Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series". The Astronomical Journal. 154 (6): 220. arXiv: 1703.09710 . Bibcode:2017AJ....154..220F. doi: 10.3847/1538-3881/aa9332 . S2CID   88521913.
  4. Sarkka, Simo; Solin, Arno; Hartikainen, Jouni (2013). "Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering". IEEE Signal Processing Magazine. 30 (4): 51–61. doi:10.1109/MSP.2013.2246292. S2CID   7485363 . Retrieved 2 September 2021.
  5. Quiñonero-Candela, Joaquin; Rasmussen, Carl Edward (5 December 2005). "A Unifying View of Sparse Approximate Gaussian Process Regression". Journal of Machine Learning Research. 6: 1939–1959. Retrieved 23 May 2020.
  6. 1 2 Ambikasaran, S.; Foreman-Mackey, D.; Greengard, L.; Hogg, D. W.; O’Neil, M. (1 Feb 2016). "Fast Direct Methods for Gaussian Processes". IEEE Transactions on Pattern Analysis and Machine Intelligence. 38 (2): 252–265. arXiv: 1403.6015 . doi:10.1109/TPAMI.2015.2448083. PMID   26761732. S2CID   15206293.
  7. 1 2 3 Vanhatalo, Jarno; Riihimäki, Jaakko; Hartikainen, Jouni; Jylänki, Pasi; Tolvanen, Ville; Vehtari, Aki (Apr 2013). "GPstuff: Bayesian Modeling with Gaussian Processes". Journal of Machine Learning Research. 14: 1175−1179. Retrieved 23 May 2020.
  8. Rasmussen, Carl Edward; Nickisch, Hannes (Nov 2010). "Gaussian processes for machine learning (GPML) toolbox". Journal of Machine Learning Research. 11 (2): 3011–3015. doi:10.1016/0002-9610(74)90157-3. PMID   4204594.
  9. 1 2 Matthews, Alexander G. de G.; van der Wilk, Mark; Nickson, Tom; Fujii, Keisuke; Boukouvalas, Alexis; León-Villagrá, Pablo; Ghahramani, Zoubin; Hensman, James (April 2017). "GPflow: A Gaussian process library using TensorFlow". Journal of Machine Learning Research. 18 (40): 1–6. arXiv: 1610.08733 . Retrieved 6 July 2020.
  10. Gardner, Jacob R; Pleiss, Geoff; Bindel, David; Weinberger, Kilian Q; Wilson, Andrew Gordon (2018). "GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration" (PDF). Advances in Neural Information Processing Systems. 31: 7576–7586. arXiv: 1809.11165 . Retrieved 23 May 2020.
  11. Zilber, Daniel; Katzfuss, Matthias (January 2021). "Vecchia–Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data". Computational Statistics & Data Analysis. 153: 107081. arXiv: 1906.07828 . doi:10.1016/j.csda.2020.107081. ISSN   0167-9473. S2CID   195068888 . Retrieved 1 September 2021.
  12. Neumann, Marion; Huang, Shan; E. Marthaler, Daniel; Kersting, Kristian (2015). "pyGPs — A Python Library for Gaussian Process Regression and Classification". Journal of Machine Learning Research. 16: 2611–2616.
  13. Kalaitzis, Alfredo; Lawrence, Neil D. (May 20, 2011). "A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression". BMC Bioinformatics. 12 (1): 180. doi: 10.1186/1471-2105-12-180 . ISSN   1471-2105. PMC   3116489 . PMID   21599902.
  14. Novak, Roman; Xiao, Lechao; Hron, Jiri; Lee, Jaehoon; Alemi, Alexander A.; Sohl-Dickstein, Jascha; Schoenholz, Samuel S. (2020). "Neural Tangents: Fast and Easy Infinite Neural Networks in Python". International Conference on Learning Representations. arXiv: 1912.02803 .
  15. Roustant, Olivier; Ginsbourger, David; Deville, Yves (2012). "DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization". Journal of Statistical Software. 51 (1): 1–55. doi: 10.18637/jss.v051.i01 . S2CID   60672249.
  16. Baudin, Michaël; Dutfoy, Anne; Iooss, Bertrand; Popelin, Anne-Laure (2015). "OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation". In Roger Ghanem; David Higdon; Houman Owhadi (eds.). Handbook of Uncertainty Quantification. pp. 1–38. arXiv: 1501.05242 . doi:10.1007/978-3-319-11259-6_64-1. ISBN   978-3-319-11259-6. S2CID   88513894.
  17. Marelli, Stefano; Sudret, Bruno (2014). "UQLab: a framework for uncertainty quantification in MATLAB" (PDF). Vulnerability, Uncertainty, and Risk. Quantification, Mitigation, and Management: 2554–2563. doi:10.3929/ethz-a-010238238 . Retrieved 28 May 2020.
  18. Couckuyt, Ivo; Dhaene, Tom; Demeester, Piet (2014). "ooDACE toolbox: a flexible object-oriented Kriging implementation" (PDF). Journal of Machine Learning Research. 15: 3183–3186. Retrieved 8 July 2020.