This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.
This article is written from the point of view of Bayesian statistics, which may use a terminology different from the one commonly used in kriging. The next section should clarify the mathematical/computational meaning of the information provided in the table independently of contextual terminology.
This section details the meaning of the columns in the table below.
These columns are about the algorithms used to solve the linear system defined by the prior covariance matrix, i.e., the matrix built by evaluating the kernel.
These columns are about the points on which the Gaussian process is evaluated, i.e. if the process is .
These columns are about the values yielded by the process, and how they are connected to the data used in the fit.
These columns are about finding values of variables which enter somehow in the definition of the specific problem but that can not be inferred by the Gaussian process fit, for example parameters in the formula of the kernel.
If both the "Prior" and "Posterior" cells contain "Manually", the software provides an interface for computing the marginal likelihood and its gradient w.r.t. hyperparameters, which can be feed into an optimization/sampling algorithm, e.g., gradient descent or Markov chain Monte Carlo.
These columns are about the possibility of fitting datapoints simultaneously to a process and to linear transformations of it.
Name | License | Language | Solvers | Input | Output | Hyperparameters | Linear transformations | Name | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Exact | Specialized | Approximate | ND | Non-real | Likelihood | Errors | Prior | Posterior | Deriv. | Finite | Sum | ||||
PyMC | Apache | Python | Yes | Kronecker | Sparse | ND | No | Any | Correlated | Yes | Yes | No | Yes | Yes | PyMC |
Stan | BSD, GPL | custom | Yes | No | No | ND | No | Any | Correlated | Yes | Yes | No | Yes | Yes | Stan |
scikit-learn | BSD | Python | Yes | No | No | ND | Yes | Bernoulli | Uncorrelated | Manually | Manually | No | No | No | scikit-learn |
fbm [7] | Free | C | Yes | No | No | ND | No | Bernoulli, Poisson | Uncorrelated, Stationary | Many | Yes | No | No | Yes | fbm |
GPML [8] [7] | BSD | MATLAB | Yes | No | Sparse | ND | No | Many | i.i.d. | Manually | Manually | No | No | No | GPML |
GPstuff [7] | GNU GPL | MATLAB, R | Yes | Markov | Sparse | ND | No | Many | Correlated | Many | Yes | First RBF | No | Yes | GPstuff |
GPy [9] | BSD | Python | Yes | No | Sparse | ND | No | Many | Uncorrelated | Yes | Yes | No | No | No | GPy |
GPflow [9] | Apache | Python | Yes | No | Sparse | ND | No | Many | Uncorrelated | Yes | Yes | No | No | No | GPflow |
GPyTorch [10] | MIT | Python | Yes | Toeplitz, Kronecker | Sparse | ND | No | Many | Uncorrelated | Yes | Yes | First RBF | Manually | Manually | GPyTorch |
GPvecchia [11] | GNU GPL | R | Yes | No | Sparse, Hierarchical | ND | No | Exponential family | Uncorrelated | No | No | No | No | No | GPvecchia |
pyGPs [12] | BSD | Python | Yes | No | Sparse | ND | Graphs, Manually | Bernoulli | i.i.d. | Manually | Manually | No | No | No | pyGPs |
gptk [13] | BSD | R | Yes | Block? | Sparse | ND | No | Gaussian | No | Manually | Manually | No | No | No | gptk |
celerite [3] | MIT | Python, Julia, C++ | No | Semisep. [lower-alpha 1] | No | 1D | No | Gaussian | Uncorrelated | Manually | Manually | No | No | No | celerite |
george [6] | MIT | Python, C++ | Yes | No | Hierarchical | ND | No | Gaussian | Uncorrelated | Manually | Manually | No | No | Manually | george |
neural-tangents [14] [lower-alpha 2] | Apache | Python | Yes | Block, Kronecker | No | ND | No | Gaussian | No | No | No | No | No | No | neural-tangents |
DiceKriging [15] | GNU GPL | R | Yes | No | No | ND | No? | Gaussian | Uncorrelated | SCAD RBF | MAP | No | No | No | DiceKriging |
OpenTURNS [16] | GNU LGPL | Python, C++ | Yes | No | No | ND | No | Gaussian | Uncorrelated | Manually (no grad.) | MAP | No | No | No | OpenTURNS |
UQLab [17] | Proprietary | MATLAB | Yes | No | No | ND | No | Gaussian | Correlated | No | MAP | No | No | No | UQLab |
ooDACE [18] | Proprietary | MATLAB | Yes | No | No | ND | No | Gaussian | Correlated | No | MAP | No | No | No | ooDACE |
DACE | Proprietary | MATLAB | Yes | No | No | ND | No | Gaussian | No | No | MAP | No | No | No | DACE |
GpGp | MIT | R | No | No | Sparse | ND | No | Gaussian | i.i.d. | Manually | Manually | No | No | No | GpGp |
SuperGauss | GNU GPL | R, C++ | No | Toeplitz [lower-alpha 3] | No | 1D | No | Gaussian | No | Manually | Manually | No | No | No | SuperGauss |
STK | GNU GPL | MATLAB | Yes | No | No | ND | No | Gaussian | Uncorrelated | Manually | Manually | No | No | Manually | STK |
GSTools | GNU LGPL | Python | Yes | No | No | ND | No | Gaussian | Yes | Yes | Yes | Yes | No | No | GSTools |
PyKrige | BSD | Python | Yes | No | No | 2D,3D | No | Gaussian | i.i.d. | No | No | No | No | No | PyKrige |
GPR | Apache | C++ | Yes | No | Sparse | ND | No | Gaussian | i.i.d. | Some, Manually | Manually | First | No | No | GPR |
celerite2 | MIT | Python | No | Semisep. [lower-alpha 1] | No | 1D | No | Gaussian | Uncorrelated | Manually [lower-alpha 4] | Manually | No | No | Yes | celerite2 |
SMT [19] [20] | BSD | Python | Yes | POD [lower-alpha 5] | Sparse | ND | Yes | Gaussian | i.i.d | Yes | Yes | Yes | No | No | SMT |
GPJax | Apache | Python | Yes | No | Sparse | ND | Graphs | Bernoulli | No | Yes | Yes | No | No | No | GPJax |
Stheno | MIT | Python | Yes | Low rank | Sparse | ND | No | Gaussian | i.i.d. | Manually | Manually | Approximate | No | Yes | Stheno |
CODES | MATLAB | Yes | Heteroskedastic, VAE, POD [lower-alpha 5] | Sparse | ND | No | Gaussian | i.i.d | Some, Automatic | Mean Aposteriori | No | No | No | CODES | |
Egobox-gp [22] | Apache | Rust | Yes | No | Sparse | ND | Yes | Any | i.i.d | Yes | Yes | Yes | No | No | Egobox-gp |
Name | License | Language | Exact | Specialized | Approximate | ND | Non-real | Likelihood | Errors | Prior | Posterior | Deriv. | Finite | Sum | Name |
Solvers | Input | Output | Hyperparameters | Linear transformations |
Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing.
Pattern recognition is the task of assigning a class to an observation based on patterns extracted from data. While similar, pattern recognition (PR) is not to be confused with pattern machines (PM) which may possess (PR) capabilities but their primary function is to distinguish and create emergent patterns. PR has applications in statistical data analysis, signal processing, image analysis, information retrieval, bioinformatics, data compression, computer graphics and machine learning. Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.
In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
Nonlinear dimensionality reduction, also known as manifold learning, is any of various related techniques that aim to project high-dimensional data, potentially existing across non-linear manifolds which cannot be adequately captured by linear decomposition methods, onto lower-dimensional latent manifolds, with the goal of either visualizing the data in the low-dimensional space, or learning the mapping itself. The techniques described below can be understood as generalizations of linear decomposition methods used for dimensionality reduction, such as singular value decomposition and principal component analysis.
In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.
In machine learning, kernel machines are a class of algorithms for pattern analysis, whose best known member is the support-vector machine (SVM). These methods involve using linear classifiers to solve nonlinear problems. The general task of pattern analysis is to find and study general types of relations in datasets. For many algorithms that solve these tasks, the data in raw representation have to be explicitly transformed into feature vector representations via a user-specified feature map: in contrast, kernel methods require only a user-specified kernel, i.e., a similarity function over all pairs of data points computed using inner products. The feature map in kernel machines is infinite dimensional but only requires a finite dimensional matrix from user-input according to the Representer theorem. Kernel machines are slow to compute for datasets larger than a couple of thousand examples without parallel processing.
Nonparametric regression is a category of regression analysis in which the predictor does not take a predetermined form but is constructed according to information derived from the data. That is, no parametric form is assumed for the relationship between predictors and dependent variable. Nonparametric regression requires larger sample sizes than regression based on parametric models because the data must supply the model structure as well as the model estimates.
A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables. For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.
There are many types of artificial neural networks (ANN).
Bayesian optimization is a sequential design strategy for global optimization of black-box functions, that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions. With the rise of artificial intelligence innovation in the 21st century, Bayesian optimizations have found prominent use in machine learning problems, for optimizing hyperparameter values.
mlpack is a free, open-source and header-only software library for machine learning and artificial intelligence written in C++, built on top of the Armadillo library and the ensmallen numerical optimization library. mlpack has an emphasis on scalability, speed, and ease-of-use. Its aim is to make machine learning possible for novice users by means of a simple, consistent API, while simultaneously exploiting C++ language features to provide maximum performance and maximum flexibility for expert users. mlpack has also a light deployment infrastructure with minimum dependencies, making it perfect for embedded systems and low resource devices. Its intended target users are scientists and engineers.
Gradient-enhanced kriging (GEK) is a surrogate modeling technique used in engineering. A surrogate model is a prediction of the output of an expensive computer code. This prediction is based on a small number of evaluations of the expensive computer code.
The following outline is provided as an overview of and topical guide to machine learning:
In machine learning, hyperparameter optimization or tuning is the problem of choosing a set of optimal hyperparameters for a learning algorithm. A hyperparameter is a parameter whose value is used to control the learning process, which must be configured before the process starts.
In the study of artificial neural networks (ANNs), the neural tangent kernel (NTK) is a kernel that describes the evolution of deep artificial neural networks during their training by gradient descent. It allows ANNs to be studied using theoretical tools from kernel methods.
A Neural Network Gaussian Process (NNGP) is a Gaussian process (GP) obtained as the limit of a certain type of sequence of neural networks. Specifically, a wide variety of network architectures converges to a GP in the infinitely wide limit, in the sense of distribution. The concept constitutes an intensional definition, i.e., a NNGP is just a GP, but distinguished by how it is obtained.
In statistics and machine learning, Gaussian process approximation is a computational method that accelerates inference tasks in the context of a Gaussian process model, most commonly likelihood evaluation and prediction. Like approximations of other models, they can often be expressed as additional assumptions imposed on the model, which do not correspond to any actual feature, but which retain its key properties while simplifying calculations. Many of these approximation methods can be expressed in purely linear algebraic or functional analytic terms as matrix or function approximations. Others are purely algorithmic and cannot easily be rephrased as a modification of a statistical model.
Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.
Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.