Comparison of Gaussian process software

Last updated

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Contents

This article is written from the point of view of Bayesian statistics, which may use a terminology different from the one commonly used in kriging. The next section should clarify the mathematical/computational meaning of the information provided in the table independently of contextual terminology.

Description of columns

This section details the meaning of the columns in the table below.

Solvers

These columns are about the algorithms used to solve the linear system defined by the prior covariance matrix, i.e., the matrix built by evaluating the kernel.

Input

These columns are about the points on which the Gaussian process is evaluated, i.e. if the process is .

Output

These columns are about the values yielded by the process, and how they are connected to the data used in the fit.

Hyperparameters

These columns are about finding values of variables which enter somehow in the definition of the specific problem but that can not be inferred by the Gaussian process fit, for example parameters in the formula of the kernel.

If both the "Prior" and "Posterior" cells contain "Manually", the software provides an interface for computing the marginal likelihood and its gradient w.r.t. hyperparameters, which can be feed into an optimization/sampling algorithm, e.g., gradient descent or Markov chain Monte Carlo.

Linear transformations

These columns are about the possibility of fitting datapoints simultaneously to a process and to linear transformations of it.

Comparison table

Name License Language SolversInputOutputHyperparametersLinear transformationsName
ExactSpecializedApproximateNDNon-realLikelihoodErrorsPriorPosteriorDeriv.FiniteSum
PyMC Apache Python YesKroneckerSparseNDNoAnyCorrelatedYesYesNoYesYes PyMC
Stan BSD, GPL customYesNoNoNDNoAnyCorrelatedYesYesNoYesYes Stan
scikit-learn BSD Python YesNoNoNDYesBernoulliUncorrelatedManuallyManuallyNoNoNo scikit-learn
fbm [7] Free C YesNoNoNDNoBernoulli, PoissonUncorrelated, StationaryManyYesNoNoYes fbm
GPML [8] [7] BSD MATLAB YesNoSparseNDNoManyi.i.d.ManuallyManuallyNoNoNo GPML
GPstuff [7] GNU GPL MATLAB, R YesMarkovSparseNDNoManyCorrelatedManyYesFirst RBFNoYes GPstuff
GPy [9] BSD Python YesNoSparseNDNoManyUncorrelatedYesYesNoNoNo GPy
GPflow [9] Apache Python YesNoSparseNDNoManyUncorrelatedYesYesNoNoNo GPflow
GPyTorch [10] MIT Python YesToeplitz, KroneckerSparseNDNoManyUncorrelatedYesYesFirst RBFManuallyManually GPyTorch
GPvecchia [11] GNU GPL R YesNoSparse, HierarchicalNDNoExponential familyUncorrelatedNoNoNoNoNo GPvecchia
pyGPs [12] BSD Python YesNoSparseNDGraphs, ManuallyBernoullii.i.d.ManuallyManuallyNoNoNo pyGPs
gptk [13] BSD R YesBlock?SparseNDNoGaussianNoManuallyManuallyNoNoNo gptk
celerite [3] MIT Python, Julia, C++ NoSemisep. [lower-alpha 1] No1DNoGaussianUncorrelatedManuallyManuallyNoNoNo celerite
george [6] MIT Python, C++ YesNoHierarchicalNDNoGaussianUncorrelatedManuallyManuallyNoNoManually george
neural-tangents [14] [lower-alpha 2] Apache Python YesBlock, KroneckerNoNDNoGaussianNoNoNoNoNoNo neural-tangents
DiceKriging [15] GNU GPL R YesNoNoNDNo?GaussianUncorrelatedSCAD RBFMAPNoNoNo DiceKriging
OpenTURNS [16] GNU LGPL Python, C++ YesNoNoNDNoGaussianUncorrelatedManually (no grad.)MAPNoNoNo OpenTURNS
UQLab [17] Proprietary MATLAB YesNoNoNDNoGaussianCorrelatedNoMAPNoNoNo UQLab
ooDACE [18] Proprietary MATLAB YesNoNoNDNoGaussianCorrelatedNoMAPNoNoNo ooDACE
DACE Proprietary MATLAB YesNoNoNDNoGaussianNoNoMAPNoNoNo DACE
GpGp MIT R NoNoSparseNDNoGaussiani.i.d.ManuallyManuallyNoNoNo GpGp
SuperGauss GNU GPL R, C++ NoToeplitz [lower-alpha 3] No1DNoGaussianNoManuallyManuallyNoNoNo SuperGauss
STK GNU GPL MATLAB YesNoNoNDNoGaussianUncorrelatedManuallyManuallyNoNoManually STK
GSTools GNU LGPL Python YesNoNoNDNoGaussianYesYesYesYesNoNo GSTools
PyKrige BSD Python YesNoNo2D,3DNoGaussiani.i.d.NoNoNoNoNo PyKrige
GPR Apache C++ YesNoSparseNDNoGaussiani.i.d.Some, ManuallyManuallyFirstNoNo GPR
celerite2 MIT Python NoSemisep. [lower-alpha 1] No1DNoGaussianUncorrelatedManually [lower-alpha 4] ManuallyNoNoYes celerite2
SMT [19] [20] BSD Python YesPOD [lower-alpha 5] SparseNDYesGaussiani.i.dYesYesYesNoNo SMT
GPJax Apache Python YesNoSparseNDGraphsBernoulliNoYesYesNoNoNo GPJax
Stheno MIT Python YesLow rankSparseNDNoGaussiani.i.d.ManuallyManuallyApproximateNoYes Stheno
CODES MATLAB YesHeteroskedastic, VAE, POD [lower-alpha 5] SparseNDNoGaussiani.i.dSome, AutomaticMean AposterioriNoNoNo CODES
Egobox-gp [22] Apache Rust YesNoSparseNDYesAnyi.i.dYesYesYesNoNo Egobox-gp
Name License Language ExactSpecializedApproximateNDNon-realLikelihoodErrorsPriorPosteriorDeriv.FiniteSumName
SolversInputOutputHyperparametersLinear transformations

Notes

  1. 1 2 celerite implements only a specific subalgebra of kernels which can be solved in . [3]
  2. neural-tangents is a specialized package for infinitely wide neural networks.
  3. SuperGauss implements a superfast Toeplitz solver with computational complexity .
  4. celerite2 has a PyMC3 interface.
  5. 1 2 POD (Proper Orthogonal Decomposition) is a dimensionality reduction technique used in Gaussian Process regression to approximate complex systems by projecting data onto a lower-dimensional subspace, making computations more efficient. It assumes the system is governed by a few dominant modes, making it ideal for problems with clear separability of scales, but less effective when all dimensions contribute equally to the system's behavior. [21]

Related Research Articles

<span class="mw-page-title-main">Principal component analysis</span> Method of data analysis

Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.

Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Nonlinear dimensionality reduction</span> Projection of data onto lower-dimensional manifolds

Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.

Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.

A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables. For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.

There are many types of artificial neural networks (ANN).

Bayesian optimization is a sequential design strategy for global optimization of black-box functions, that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions. With the rise of artificial intelligence innovation in the 21st century, Bayesian optimizations have found prominent use in machine learning problems, for optimizing hyperparameter values.

mlpack

mlpack is a free, open-source and header-only software library for machine learning and artificial intelligence written in C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack has also a light deployment infrastructure with minimum dependencies, making it perfect for embedded systems and low resource devices. Its intended target users are scientists and engineers.

Gradient-enhanced kriging (GEK) is a surrogate modeling technique used in engineering. A surrogate model is a prediction of the output of an expensive computer code. This prediction is based on a small number of evaluations of the expensive computer code.

The following outline is provided as an overview of and topical guide to machine learning:

In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts.

In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.

A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.

In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

References

  1. P. Cunningham, John; Gilboa, Elad; Saatçi, Yunus (Feb 2015). "Scaling Multidimensional Inference for Structured Gaussian Processes". IEEE Transactions on Pattern Analysis and Machine Intelligence. 37 (2): 424–436. arXiv: 1209.4120 . doi:10.1109/TPAMI.2013.192. PMID   26353252. S2CID   6878550.
  2. Leith, D. J.; Zhang, Yunong; Leithead, W. E. (2005). "Time-series Gaussian Process Regression Based on Toeplitz Computation of O(N²) Operations and O(N)-level Storage". Proceedings of the 44th IEEE Conference on Decision and Control. pp. 3711–3716. doi:10.1109/CDC.2005.1582739. ISBN   0-7803-9567-0. S2CID   13627455.
  3. 1 2 3 Foreman-Mackey, Daniel; Angus, Ruth; Agol, Eric; Ambikasaran, Sivaram (9 November 2017). "Fast and Scalable Gaussian Process Modeling with Applications to Astronomical Time Series". The Astronomical Journal. 154 (6): 220. arXiv: 1703.09710 . Bibcode:2017AJ....154..220F. doi: 10.3847/1538-3881/aa9332 . S2CID   88521913.
  4. Sarkka, Simo; Solin, Arno; Hartikainen, Jouni (2013). "Spatiotemporal Learning via Infinite-Dimensional Bayesian Filtering and Smoothing: A Look at Gaussian Process Regression Through Kalman Filtering". IEEE Signal Processing Magazine. 30 (4): 51–61. doi:10.1109/MSP.2013.2246292. S2CID   7485363 . Retrieved 2 September 2021.
  5. Quiñonero-Candela, Joaquin; Rasmussen, Carl Edward (5 December 2005). "A Unifying View of Sparse Approximate Gaussian Process Regression". Journal of Machine Learning Research. 6: 1939–1959. Retrieved 23 May 2020.
  6. 1 2 Ambikasaran, S.; Foreman-Mackey, D.; Greengard, L.; Hogg, D. W.; O’Neil, M. (1 Feb 2016). "Fast Direct Methods for Gaussian Processes". IEEE Transactions on Pattern Analysis and Machine Intelligence. 38 (2): 252–265. arXiv: 1403.6015 . doi:10.1109/TPAMI.2015.2448083. PMID   26761732. S2CID   15206293.
  7. 1 2 3 Vanhatalo, Jarno; Riihimäki, Jaakko; Hartikainen, Jouni; Jylänki, Pasi; Tolvanen, Ville; Vehtari, Aki (Apr 2013). "GPstuff: Bayesian Modeling with Gaussian Processes". Journal of Machine Learning Research. 14: 1175−1179. Retrieved 23 May 2020.
  8. Rasmussen, Carl Edward; Nickisch, Hannes (Nov 2010). "Gaussian processes for machine learning (GPML) toolbox". Journal of Machine Learning Research. 11 (2): 3011–3015. doi:10.1016/0002-9610(74)90157-3. PMID   4204594.
  9. 1 2 Matthews, Alexander G. de G.; van der Wilk, Mark; Nickson, Tom; Fujii, Keisuke; Boukouvalas, Alexis; León-Villagrá, Pablo; Ghahramani, Zoubin; Hensman, James (April 2017). "GPflow: A Gaussian process library using TensorFlow". Journal of Machine Learning Research. 18 (40): 1–6. arXiv: 1610.08733 . Retrieved 6 July 2020.
  10. Gardner, Jacob R; Pleiss, Geoff; Bindel, David; Weinberger, Kilian Q; Wilson, Andrew Gordon (2018). "GPyTorch: Blackbox Matrix-Matrix Gaussian Process Inference with GPU Acceleration" (PDF). Advances in Neural Information Processing Systems. 31: 7576–7586. arXiv: 1809.11165 . Retrieved 23 May 2020.
  11. Zilber, Daniel; Katzfuss, Matthias (January 2021). "Vecchia–Laplace approximations of generalized Gaussian processes for big non-Gaussian spatial data". Computational Statistics & Data Analysis. 153: 107081. arXiv: 1906.07828 . doi:10.1016/j.csda.2020.107081. ISSN   0167-9473. S2CID   195068888 . Retrieved 1 September 2021.
  12. Neumann, Marion; Huang, Shan; E. Marthaler, Daniel; Kersting, Kristian (2015). "pyGPs — A Python Library for Gaussian Process Regression and Classification". Journal of Machine Learning Research. 16: 2611–2616.
  13. Kalaitzis, Alfredo; Lawrence, Neil D. (May 20, 2011). "A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression". BMC Bioinformatics. 12 (1): 180. doi: 10.1186/1471-2105-12-180 . ISSN   1471-2105. PMC   3116489 . PMID   21599902.
  14. Novak, Roman; Xiao, Lechao; Hron, Jiri; Lee, Jaehoon; Alemi, Alexander A.; Sohl-Dickstein, Jascha; Schoenholz, Samuel S. (2020). "Neural Tangents: Fast and Easy Infinite Neural Networks in Python". International Conference on Learning Representations. arXiv: 1912.02803 .
  15. Roustant, Olivier; Ginsbourger, David; Deville, Yves (2012). "DiceKriging, DiceOptim: Two R Packages for the Analysis of Computer Experiments by Kriging-Based Metamodeling and Optimization". Journal of Statistical Software. 51 (1): 1–55. doi: 10.18637/jss.v051.i01 . S2CID   60672249.
  16. Baudin, Michaël; Dutfoy, Anne; Iooss, Bertrand; Popelin, Anne-Laure (2015). "OpenTURNS: An Industrial Software for Uncertainty Quantification in Simulation". In Roger Ghanem; David Higdon; Houman Owhadi (eds.). Handbook of Uncertainty Quantification. pp. 1–38. arXiv: 1501.05242 . doi:10.1007/978-3-319-11259-6_64-1. ISBN   978-3-319-11259-6. S2CID   88513894.
  17. Marelli, Stefano; Sudret, Bruno (2014). "UQLab: a framework for uncertainty quantification in MATLAB" (PDF). Vulnerability, Uncertainty, and Risk. Quantification, Mitigation, and Management: 2554–2563. doi:10.3929/ethz-a-010238238 . Retrieved 28 May 2020.
  18. Couckuyt, Ivo; Dhaene, Tom; Demeester, Piet (2014). "ooDACE toolbox: a flexible object-oriented Kriging implementation" (PDF). Journal of Machine Learning Research. 15: 3183–3186. Retrieved 8 July 2020.
  19. Bouhlel, Mohamed A.; Hwang, John T.; Bartoli, Nathalie; Lafage, Rémi; Morlier, Joseph; Martins, Joaquim R.R.A. (2019). "A Python surrogate modeling framework with derivatives". Advances in Engineering Software. 135 (1): 102662. doi:10.1016/j.advengsoft.2019.03.005.
  20. Saves, Paul; Lafage, Rémi; Bartoli, Nathalie; Diouane, Youssef; Bussemaker, Jasper; Lefebvre, Thierry; Hwang, John T.; Morlier, Joseph; Martins, Joaquim R.R.A. (2024). "SMT 2.0: A Surrogate Modeling Toolbox with a focus on hierarchical and mixed variables Gaussian processes". Advances in Engineering Software. 188 (1): 103571. arXiv: 2305.13998 . doi:10.1016/j.advengsoft.2023.103571.
  21. Porrello, Christian; Dubreuil, Sylvain; Farhat, Charbel (2024). "Bayesian Framework With Projection-Based Model Order Reduction for Efficient Global Optimization". AIAA AVIATION FORUM AND ASCEND 2024: 4580. doi:10.2514/6.2024-4580.
  22. Lafage, Rémi (2022). "egobox, a Rust toolbox for efficient global optimization" (PDF). Journal of Open Source Software. 7 (78): 4737. doi:10.21105/joss.04737.