Probabilistic numerics

Last updated

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference. [1] [2] [3] [4] [5]

Contents

Introduction

A numerical method is an algorithm that approximates the solution to a mathematical problem (examples below include the solution to a linear system of equations, the value of an integral, the solution of a differential equation, the minimum of a multivariate function). In a probabilistic numerical algorithm, this process of approximation is thought of as a problem of estimation, inference or learning and realised in the framework of probabilistic inference (often, but not always, Bayesian inference). [6]

Formally, this means casting the setup of the computational problem in terms of a prior distribution, formulating the relationship between numbers computed by the computer (e.g. matrix-vector multiplications in linear algebra, gradients in optimization, values of the integrand or the vector field defining a differential equation) and the quantity in question (the solution of the linear problem, the minimum, the integral, the solution curve) in a likelihood function, and returning a posterior distribution as the output. In most cases, numerical algorithms also take internal adaptive decisions about which numbers to compute, which form an active learning problem.

Many of the most popular classic numerical algorithms can be re-interpreted in the probabilistic framework. This includes the method of conjugate gradients, [7] [8] [9] Nordsieck methods, Gaussian quadrature rules, [10] and quasi-Newton methods. [11] In all these cases, the classic method is based on a regularized least-squares estimate that can be associated with the posterior mean arising from a Gaussian prior and likelihood. In such cases, the variance of the Gaussian posterior is then associated with a worst-case estimate for the squared error.

Probabilistic numerical methods promise several conceptual advantages over classic, point-estimate based approximation techniques:

These advantages are essentially the equivalent of similar functional advantages that Bayesian methods enjoy over point-estimates in machine learning, applied or transferred to the computational domain.

Numerical tasks

Integration

Bayesian quadrature with a Gaussian process conditioned on
n
=
0
,
3
,
and
8
{\displaystyle n=0,3,\ {\text{and}}\ 8}
evaluations of the integrand (shown in black). Shaded areas in the left column illustrate the marginal standard deviations. The right figure shows the prior (
n
=
0
{\displaystyle n=0}
) and posterior (
n
=
3
,
8
{\displaystyle n=3,8}
) Gaussian distribution over the value of the integral, as well as the true solution. Bayesian quadrature.svg
Bayesian quadrature with a Gaussian process conditioned on evaluations of the integrand (shown in black). Shaded areas in the left column illustrate the marginal standard deviations. The right figure shows the prior () and posterior () Gaussian distribution over the value of the integral, as well as the true solution.

Probabilistic numerical methods have been developed for the problem of numerical integration, with the most popular method called Bayesian quadrature . [15] [16] [17] [18]

In numerical integration, function evaluations at a number of points are used to estimate the integral of a function against some measure . Bayesian quadrature consists of specifying a prior distribution over and conditioning this prior on to obtain a posterior distribution over , then computing the implied posterior distribution on . The most common choice of prior is a Gaussian process as this allows us to obtain a closed-form posterior distribution on the integral which is a univariate Gaussian distribution. Bayesian quadrature is particularly useful when the function is expensive to evaluate and the dimension of the data is small to moderate.

Optimization

Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom. GpParBayesAnimationSmall.gif
Bayesian optimization of a function (black) with Gaussian processes (purple). Three acquisition functions (blue) are shown at the bottom.

Probabilistic numerics have also been studied for mathematical optimization, which consist of finding the minimum or maximum of some objective function given (possibly noisy or indirect) evaluations of that function at a set of points.

Perhaps the most notable effort in this direction is Bayesian optimization, [20] a general approach to optimization grounded in Bayesian inference. Bayesian optimization algorithms operate by maintaining a probabilistic belief about throughout the optimization procedure; this often takes the form of a Gaussian process prior conditioned on observations. This belief then guides the algorithm in obtaining observations that are likely to advance the optimization process. Bayesian optimization policies are usually realized by transforming the objective function posterior into an inexpensive, differentiable acquisition function that is maximized to select each successive observation location. One prominent approach is to model optimization via Bayesian sequential experimental design, seeking to obtain a sequence of observations yielding the most optimization progress as evaluated by an appropriate utility function. A welcome side effect from this approach is that uncertainty in the objective function, as measured by the underlying probabilistic belief, can guide an optimization policy in addressing the classic exploration vs. exploitation tradeoff.

Local optimization

Probabilistic numerical methods have been developed in the context of stochastic optimization for deep learning, in particular to address main issues such as learning rate tuning and line searches, [21] batch-size selection, [22] early stopping, [23] pruning, [24] and first- and second-order search directions. [25] [26]

In this setting, the optimization objective is often an empirical risk of the form defined by a dataset , and a loss that quantifies how well a predictive model parameterized by performs on predicting the target from its corresponding input . Epistemic uncertainty arises when the dataset size is large and cannot be processed at once meaning that local quantities (given some ) such as the loss function itself or its gradient cannot be computed in reasonable time. Hence, generally mini-batching is used to construct estimators of these quantities on a random subset of the data. Probabilistic numerical methods model this uncertainty explicitly and allow for automated decisions and parameter tuning.

Linear algebra

Probabilistic numerical methods for linear algebra [7] [8] [27] [9] [28] [29] have primarily focused on solving systems of linear equations of the form and the computation of determinants . [30] [31]

Illustration of a matrix-based probabilistic linear solver. Matrix-based-probabilistic-linear-solver.svg
Illustration of a matrix-based probabilistic linear solver.

A large class of methods are iterative in nature and collect information about the linear system to be solved via repeated matrix-vector multiplication with the system matrix with different vectors . Such methods can be roughly split into a solution- [8] [28] and a matrix-based perspective, [7] [9] depending on whether belief is expressed over the solution of the linear system or the (pseudo-)inverse of the matrix . The belief update uses that the inferred object is linked to matrix multiplications or via and . Methods typically assume a Gaussian distribution, due to its closedness under linear observations of the problem. While conceptually different, these two views are computationally equivalent and inherently connected via the right-hand-side through . [27]

Probabilistic numerical linear algebra routines have been successfully applied to scale Gaussian processes to large datasets. [31] [32] In particular, they enable exact propagation of the approximation error to a combined Gaussian process posterior, which quantifies the uncertainty arising from both the finite number of data observed and the finite amount of computation expended. [32]

Ordinary differential equations

Samples from the first component of the numerical solution of the Lorenz system obtained with a probabilistic numerical integrator. Lorenz Probabilistic Numerics.jpg
Samples from the first component of the numerical solution of the Lorenz system obtained with a probabilistic numerical integrator.

Probabilistic numerical methods for ordinary differential equations , have been developed for initial and boundary value problems. Many different probabilistic numerical methods designed for ordinary differential equations have been proposed, and these can broadly be grouped into the two following categories:

The boundary between these two categories is not sharp, indeed a Gaussian process regression approach based on randomised data was developed as well. [40] These methods have been applied to problems in computational Riemannian geometry, [41] inverse problems, latent force models, and to differential equations with a geometric structure such as symplecticity.

Partial differential equations

A number of probabilistic numerical methods have also been proposed for partial differential equations. As with ordinary differential equations, the approaches can broadly be divided into those based on randomisation, generally of some underlying finite-element mesh [33] [42] and those based on Gaussian process regression. [4] [3] [43] [44]

Learning to solve a partial differential equation. A problem-specific Gaussian process prior
u
{\displaystyle u}
is conditioned on partially-known physics, given by uncertain boundary conditions (BC) and a linear PDE, as well as on noisy physical measurements from experiment. The boundary conditions and the right-hand side of the PDE are not known but inferred from a small set of noise-corrupted measurements. The plots juxtapose the belief
u
|
[?]
{\displaystyle u\mid \cdots }
with the true solution
u
[?]
{\displaystyle u^{\star }}
of the latent boundary value problem. Linpde-gp.png
Learning to solve a partial differential equation. A problem-specific Gaussian process prior is conditioned on partially-known physics, given by uncertain boundary conditions (BC) and a linear PDE, as well as on noisy physical measurements from experiment. The boundary conditions and the right-hand side of the PDE are not known but inferred from a small set of noise-corrupted measurements. The plots juxtapose the belief with the true solution of the latent boundary value problem.

Probabilistic numerical PDE solvers based on Gaussian process regression recover classical methods on linear PDEs for certain priors, in particular methods of mean weighted residuals, which include Galerkin methods, finite element methods, as well as spectral methods. [44]

The interplay between numerical analysis and probability is touched upon by a number of other areas of mathematics, including average-case analysis of numerical methods, information-based complexity, game theory, and statistical decision theory. Precursors to what is now being called "probabilistic numerics" can be found as early as the late 19th and early 20th century.

The origins of probabilistic numerics can be traced to a discussion of probabilistic approaches to polynomial interpolation by Henri Poincaré in his Calcul des Probabilités. [45] In modern terminology, Poincaré considered a Gaussian prior distribution on a function , expressed as a formal power series with random coefficients, and asked for "probable values" of given this prior and observations for .

A later seminal contribution to the interplay of numerical analysis and probability was provided by Albert Suldin in the context of univariate quadrature. [46] The statistical problem considered by Suldin was the approximation of the definite integral of a function , under a Brownian motion prior on , given access to pointwise evaluation of at nodes . Suldin showed that, for given quadrature nodes, the quadrature rule with minimal mean squared error is the trapezoidal rule; furthermore, this minimal error is proportional to the sum of cubes of the inter-node spacings. As a result, one can see the trapezoidal rule with equally-spaced nodes as statistically optimal in some sense — an early example of the average-case analysis of a numerical method. Suldin's point of view was later extended by Mike Larkin. [47] Note that Suldin's Brownian motion prior on the integrand is a Gaussian measure and that the operations of integration and of point wise evaluation of are both linear maps. Thus, the definite integral is a real-valued Gaussian random variable. In particular, after conditioning on the observed pointwise values of , it follows a normal distribution with mean equal to the trapezoidal rule and variance equal to . This viewpoint is very close to that of Bayesian quadrature, seeing the output of a quadrature method not just as a point estimate but as a probability distribution in its own right.

As noted by Houman Owhadi and collaborators, [3] [48] interplays between numerical approximation and statistical inference can also be traced back to Palasti and Renyi, [49] Sard, [50] Kimeldorf and Wahba [51] (on the correspondence between Bayesian estimation and spline smoothing/interpolation) and Larkin [47] (on the correspondence between Gaussian process regression and numerical approximation). Although the approach of modelling a perfectly known function as a sample from a random process may seem counterintuitive, a natural framework for understanding it can be found in information-based complexity (IBC), [52] the branch of computational complexity founded on the observation that numerical implementation requires computation with partial information and limited resources. In IBC, the performance of an algorithm operating on incomplete information can be analyzed in the worst-case or the average-case (randomized) setting with respect to the missing information. Moreover, as Packel [53] observed, the average case setting could be interpreted as a mixed strategy in an adversarial game obtained by lifting a (worst-case) minmax problem to a minmax problem over mixed (randomized) strategies. This observation leads to a natural connection [54] [3] between numerical approximation and Wald's decision theory, evidently influenced by von Neumann's theory of games. To describe this connection consider the optimal recovery setting of Micchelli and Rivlin [55] in which one tries to approximate an unknown function from a finite number of linear measurements on that function. Interpreting this optimal recovery problem as a zero-sum game where Player I selects the unknown function and Player II selects its approximation, and using relative errors in a quadratic norm to define losses, Gaussian priors emerge [3] as optimal mixed strategies for such games, and the covariance operator of the optimal Gaussian prior is determined by the quadratic norm used to define the relative error of the recovery.

Software

See also

Related Research Articles

In computational mathematics, an iterative method is a mathematical procedure that uses an initial value to generate a sequence of improving approximate solutions for a class of problems, in which the i-th approximation is derived from the previous ones.

<span class="mw-page-title-main">Numerical analysis</span> Methods for numerical approximations

Numerical analysis is the study of algorithms that use numerical approximation for the problems of mathematical analysis. It is the study of numerical methods that attempt to find approximate solutions of problems rather than the exact ones. Numerical analysis finds application in all fields of engineering and the physical sciences, and in the 21st century also the life and social sciences like economics, medicine, business and even the arts. Current growth in computing power has enabled the use of more complex numerical analysis, providing detailed and realistic mathematical models in science and engineering. Examples of numerical analysis include: ordinary differential equations as found in celestial mechanics, numerical linear algebra in data analysis, and stochastic differential equations and Markov chains for simulating living cells in medicine and biology.

<span class="mw-page-title-main">Numerical integration</span> Methods of calculating definite integrals

In analysis, numerical integration comprises a broad family of algorithms for calculating the numerical value of a definite integral. The term numerical quadrature is more or less a synonym for "numerical integration", especially as applied to one-dimensional integrals. Some authors refer to numerical integration over more than one dimension as cubature; others take "quadrature" to include higher-dimensional integration.

An inverse problem in science is the process of calculating from a set of observations the causal factors that produced them: for example, calculating an image in X-ray computed tomography, source reconstruction in acoustics, or calculating the density of the Earth from measurements of its gravity field. It is called an inverse problem because it starts with the effects and then calculates the causes. It is the inverse of a forward problem, which starts with the causes and then calculates the effects.

<span class="mw-page-title-main">Numerical methods for ordinary differential equations</span> Methods used to find numerical solutions of ordinary differential equations

Numerical methods for ordinary differential equations are methods used to find numerical approximations to the solutions of ordinary differential equations (ODEs). Their use is also known as "numerical integration", although this term can also refer to the computation of integrals.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

Global optimization is a branch of applied mathematics and numerical analysis that attempts to find the global minima or maxima of a function or a set of functions on a given set. It is usually described as a minimization problem because the maximization of the real-valued function is equivalent to the minimization of the function .

A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an in silico system. This area includes computational physics, computational chemistry, computational biology and other similar disciplines.

Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

In mathematics numerical analysis, the Nyström method or quadrature method seeks the numerical solution of an integral equation by replacing the integral with a representative weighted sum. The continuous problem is broken into discrete intervals; quadrature or numerical integration determines the weights and locations of representative points for the integral.

Polynomial chaos (PC), also called polynomial chaos expansion (PCE) and Wiener chaos expansion, is a method for representing a random variable in terms of a polynomial function of other random variables. The polynomials are chosen to be orthogonal with respect to the joint probability distribution of these random variables. Note that despite its name, PCE has no immediate connections to chaos theory. The word "chaos" here should be understood as "random".

<span class="mw-page-title-main">Finite element method</span> Numerical method for solving physical or engineering problems

The finite element method (FEM) is a popular method for numerically solving differential equations arising in engineering and mathematical modeling. Typical problem areas of interest include the traditional fields of structural analysis, heat transfer, fluid flow, mass transport, and electromagnetic potential. Computers are usually used to perform the calculations required. With high-speed supercomputers, better solutions can be achieved, and are often required to solve the largest and most complex problems.

Bayesian optimization is a sequential design strategy for global optimization of black-box functions, that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions. With the rise of artificial intelligence innovation in the 21st century, Bayesian optimizations have found prominent use in machine learning problems, for optimizing hyperparameter values.

The Harrow–Hassidim–Lloyd algorithm or HHL algorithm is a quantum algorithm for numerically solving a system of linear equations, designed by Aram Harrow, Avinatan Hassidim, and Seth Lloyd. The algorithm estimates the result of a scalar measurement on the solution vector to a given linear system of equations.

<span class="mw-page-title-main">Multifidelity simulation</span>

Multifidelity methods leverage both low- and high-fidelity data in order to maximize the accuracy of model estimates, while minimizing the cost associated with parametrization. They have been successfully used in impedance cardiography, wing-design optimization, robotic learning, computational biomechanics, and have more recently been extended to human-in-the-loop systems, such as aerospace and transportation. They include both model-based methods, where a generative model is available or can be learned, in addition to model-free methods, that include regression-based approaches, such as stacked-regression. A more general class of regression-based multi-fidelity methods are Bayesian approaches, e.g. Bayesian linear regression, Gaussian mixture models, Gaussian processes, auto-regressive Gaussian processes, or Bayesian polynomial chaos expansions.

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

<span class="mw-page-title-main">Physics-informed neural networks</span> Technique to solve partial differential equations

Physics-informed neural networks (PINNs), also referred to as Theory-Trained Neural Networks (TTNs), are a type of universal function approximators that can embed the knowledge of any physical laws that govern a given data-set in the learning process, and can be described by partial differential equations (PDEs). They overcome the low data availability of some biological and engineering systems that makes most state-of-the-art machine learning techniques lack robustness, rendering them ineffective in these scenarios. The prior knowledge of general physical laws acts in the training of neural networks (NNs) as a regularization agent that limits the space of admissible solutions, increasing the correctness of the function approximation. This way, embedding this prior information into a neural network results in enhancing the information content of the available data, facilitating the learning algorithm to capture the right solution and to generalize well even with a low amount of training examples.

Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

Integrated nested Laplace approximations (INLA) is a method for approximate Bayesian inference based on Laplace's method. It is designed for a class of models called latent Gaussian models (LGMs), for which it can be a fast and accurate alternative for Markov chain Monte Carlo methods to compute posterior marginal distributions. Due to its relative speed even with large data sets for certain problems and models, INLA has been a popular inference method in applied statistics, in particular spatial statistics, ecology, and epidemiology. It is also possible to combine INLA with a finite element method solution of a stochastic partial differential equation to study e.g. spatial point processes and species distribution models. The INLA method is implemented in the R-INLA R package.

References

  1. Hennig, P.; Osborne, M. A.; Kersting, H. P. (2022). Probabilistic Numerics (PDF). Cambridge University Press. ISBN   978-1107163447.
  2. Oates, C. J.; Sullivan, T. J. (2019). "A modern retrospective on probabilistic numerics". Stat. Comput. 29 (6): 1335–1351. arXiv: 1901.04457 . doi:10.1007/s11222-019-09902-z. S2CID   67885786.
  3. 1 2 3 4 5 Owhadi, Houman; Scovel, Clint (2019). Operator-Adapted Wavelets, Fast Solvers, and Numerical Homogenization: From a Game Theoretic Approach to Numerical Approximation and Algorithm Design. Cambridge Monographs on Applied and Computational Mathematics. Cambridge: Cambridge University Press. ISBN   978-1-108-48436-7.
  4. 1 2 Owhadi, Houman (2015). "Bayesian Numerical Homogenization". Multiscale Modeling & Simulation. 13 (3): 812–828. arXiv: 1406.6668 . doi: 10.1137/140974596 . ISSN   1540-3459. S2CID   7245255.
  5. Hennig, P.; Osborne, M. A.; Girolami, M. (2015). "Probabilistic numerics and uncertainty in computations". Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences. 471 (2179): 20150142, 17. arXiv: 1506.01326 . Bibcode:2015RSPSA.47150142H. doi:10.1098/rspa.2015.0142. PMC   4528661 . PMID   26346321.
  6. Cockayne, J.; Oates, C. J.; Sullivan, T. J.; Girolami, M. (2019). "Bayesian probabilistic numerical methods" (PDF). SIAM Review. 61 (4): 756–789. doi:10.1137/17M1139357. S2CID   14696405.
  7. 1 2 3 Hennig, P. (2015). "Probabilistic interpretation of linear solvers". SIAM Journal on Optimization. 25 (1): 2347–260. arXiv: 1402.2058 . doi:10.1137/140955501. S2CID   16121233.
  8. 1 2 3 Cockayne, J.; Oates, C.; Ipsen, I.; Girolami, M. (2019). "A Bayesian conjugate gradient method". Bayesian Analysis. 14 (3). International Society for Bayesian Analysis: 937–1012. doi: 10.1214/19-BA1145 . S2CID   12460125.
  9. 1 2 3 4 Wenger, J.; Hennig, P. (2020). Probabilistic Linear Solvers for Machine Learning. Advances in Neural Information Processing Systems (NeurIPS). Vol. 33. pp. 6731–6742. arXiv: 2010.09691 .
  10. Karvonen, Toni; Särkkä, Simo (2017). Classical quadrature rules via Gaussian processes. 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).
  11. Hennig, Philipp; Kiefel, Martin (2013). "Quasi-Newton methods: A new direction". Journal of Machine Learning Research (JMLR). 14 (1): 843–865. arXiv: 1206.4602 .
  12. Maren Mahsereci; Philipp Hennig (2015). Probabilistic line searches for stochastic optimization. Advances in Neural Information Processing Systems (NeurIPS).
  13. Hans Kersting; Nicholas Krämer; Martin Schiegg; Christian Daniel; Michael Tiemann; Philipp Hennig (2020). Differentiable Likelihoods for Fast Inversion of 'Likelihood-Free' Dynamical Systems. International Conference on Machine Learning (ICML).
  14. Schmidt, Jonathan; Krämer, Peter Nicholas; Hennig, Philipp (2021). A Probabilistic State Space Model for Joint Inference from Differential Equations and Data. Advances in Neural Information Processing Systems (NeurIPS).
  15. Diaconis, P. (1988). "Bayesian Numerical Analysis". Statistical Decision Theory and Related Topics IV: 163–175. doi:10.1007/978-1-4613-8768-8_20 (inactive 1 November 2024). ISBN   978-1-4613-8770-1.{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
  16. O'Hagan, A. (1991). "Bayes–Hermite quadrature". Journal of Statistical Planning and Inference. 29 (3): 245–260. doi:10.1016/0378-3758(91)90002-V.
  17. Rasmussen, C.; Ghahramani, Z. (2002). "Bayesian Monte Carlo" (PDF). Neural Information Processing Systems: 489–496.
  18. Briol, F.-X.; Oates, C. J.; Girolami, M.; Osborne, M. A.; Sejdinovic, D. (2019). "Probabilistic integration: A role in statistical computation? (with discussion and rejoinder)". Statistical Science. 34 (1): 1–22. arXiv: 1512.00933 . doi:10.1214/18-STS660. S2CID   13932715.
  19. Wilson, Samuel (2019-11-22), ParBayesianOptimization R package , retrieved 2019-12-12
  20. Garnett, Roman (2021). Bayesian Optimization. Cambridge: Cambridge University Press.
  21. Mahsereci, M.; Hennig, P. (2017). "Probabilistic Line Searches for Stochastic Optimization". Journal of Machine Learning Research. 18 (119): 1–59.
  22. Balles, L.; Romero, J.; Hennig, H. (2017). "Coupling Adaptive Batch Sizes with Learning Rates" (PDF). Proceedings of the 33rd Conference on Uncertainty in Artificial Intelligence (UAI). arXiv: 1612.05086 .
  23. Mahsereci, M.; Balles, L.; Lassner, C.; Hennig, H. (2021). "Early Stopping without a Validation Set". arXiv: 1703.09580 [cs.LG].
  24. Siems J. N.; Klein A.; Archambeau C.; Mahsereci, M. (2021). "Dynamic Pruning of a Neural Network via Gradient Signal-to-Noise Ratio". 8th ICML Workshop on Automated Machine Learning (AutoML).
  25. Mahsereci, Maren (2018). "Chapter 8: First-Order Filter for Gradients; chapter 9: Second-Order Filter for Hessian Elements". Probabilistic Approaches to Stochastic Optimization (Thesis). Universität Tübingen. doi:10.15496/publikation-26116.
  26. Balles, L.; Hennig, H. (2018). "Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients". Proceedings of the 35th International Conference on Machine Learning: 404–413. arXiv: 1705.07774 .
  27. 1 2 Bartels, S.; Cockayne, J.; Ipsen, I.; Hennig, P. (2019). "Probabilistic linear solvers: a unifying view". Statistics and Computing. 29 (6). Springer: 1249–1263. arXiv: 1810.03398 . doi: 10.1007/s11222-019-09897-7 . S2CID   53571618.
  28. 1 2 Cockayne, J.; Ipsen, I.; Oates, C.; Reid, T. (2021). "Probabilistic iterative methods for linear systems" (PDF). Journal of Machine Learning Research. 22 (232): 1–34. arXiv: 2012.12615 .
  29. Schäfer, Florian; Katzfuss, Matthias; Owhadi, Houman (2021). "Sparse Cholesky Factorization by KullbackLeibler Minimization". SIAM Journal on Scientific Computing. 43 (3): A2019–A2046. arXiv: 2004.14455 . Bibcode:2021SJSC...43A2019S. doi:10.1137/20M1336254. ISSN   1064-8275. S2CID   216914317.
  30. Bartels, Simon (2020). "Probabilistic Kernel-Matrix Determinant Estimation". Probabilistic Linear Algebra (Thesis). doi:10.15496/publikation-56119.
  31. 1 2 Wenger, J.; Pleiss, G.; Hennig, P.; Cunningham, J. P.; Gardner, J. R. (2022). Preconditioning for Scalable Gaussian Process Hyperparameter Optimization. International Conference on Machine Learning (ICML). arXiv: 2107.00243 .
  32. 1 2 Wenger, J.; Pförtner, M.; Hennig, P.; Cunningham, J. P. (2022). Posterior and Computational Uncertainty in Gaussian Processes. Advances in Neural Information Processing Systems (NeurIPS). arXiv: 2205.15449 .
  33. 1 2 3 Conrad, P.R.; Girolami, M.; Särkkä, S.; Stuart, A.M.; Zygalakis, K. (2017). "Statistical analysis of differential equations: introducing probability measures on numerical solutions". Stat. Comput. 27 (4): 1065–1082. doi:10.1007/s11222-016-9671-0. PMC   7089645 . PMID   32226237.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  34. Abdulle, A.; Garegnani, G. (2020). "Random time step probabilistic methods for uncertainty quantification in chaotic and geometric numerical integration". Stat. Comput. 30 (4): 907–932. arXiv: 1801.01340 . doi:10.1007/s11222-020-09926-w. S2CID   42880142.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  35. Skilling, J. (1992). Bayesian solution of ordinary differential equations. Maximum Entropy and Bayesian Methods. pp. 23–37.
  36. Tronarp, F.; Kersting, H.; Särkkä, S.; Hennig, P (2019). "Probabilistic solutions to ordinary differential equations as nonlinear Bayesian filtering: a new perspective". Statistics and Computing. 29 (6): 1297–1315. arXiv: 1810.03440 . doi: 10.1007/s11222-019-09900-1 . S2CID   88517317.
  37. Tronarp, F.; Särkkä, S.; Hennig, P. (2021). "Bayesian ODE solvers: The maximum a posteriori estimate". Statistics and Computing. 31 (3): 1–18. arXiv: 2004.00623 . doi: 10.1007/s11222-021-09993-7 . S2CID   214774980.
  38. Kersting, H.; Hennig, P. (2016). Active Uncertainty Calibration in Bayesian ODE Solvers. Uncertainty in Artificial Intelligence. pp. 309–318.
  39. Schober, M.; Särkkä, S.; Hennig, P (2019). "A probabilistic model for the numerical solution of initial value problems". Statistics and Computing. 29 (1): 99–122. arXiv: 1610.05261 . doi: 10.1007/s11222-017-9798-7 . S2CID   14299420.
  40. Chkrebtii, O.; Campbell, D. A.; Calderhead, B.; Girolami, M. A. (2016). "Bayesian solution uncertainty quantification for differential equations". Bayesian Analysis. 11 (4): 1239–1267. arXiv: 1306.2365 . doi: 10.1214/16-BA1017 . S2CID   14077995.
  41. Hennig, P.; Hauberg, S. (2014). Probabilistic solutions to differential equations and their application to Riemannian statistics. Artificial Intelligence and Statistics. pp. 347–355.
  42. Abdulle, A.; Garegnani, G. (2021). "A probabilistic finite element method based on random meshes: A posteriori error estimators and Bayesian inverse problems". Comput. Methods Appl. Mech. Engrg. 384: 113961. arXiv: 2103.06204 . Bibcode:2021CMAME.384k3961A. doi:10.1016/j.cma.2021.113961. S2CID   232170649.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  43. Chkrebtii, Oksana A.; Campbell, David A.; Calderhead, Ben; Girolami, Mark A. (2016). "Bayesian Solution Uncertainty Quantification for Differential Equations". Bayesian Analysis. 11 (4): 1239–1267. arXiv: 1306.2365 . doi: 10.1214/16-BA1017 . ISSN   1936-0975. S2CID   14077995.
  44. 1 2 3 Pförtner, M.; Steinwart, I.; Hennig, P.; Wenger, J. (2022). "Physics-Informed Gaussian Process Regression Generalizes Linear PDE Solvers". arXiv: 2212.12474 [cs.LG].
  45. Poincaré, Henri (1912). Calcul des Probabilités (second ed.). Gauthier-Villars.
  46. Suldin, A. V. (1959). "Wiener measure and its applications to approximation methods. I". Izv. Vysš. Učebn. Zaved. Matematika. 6 (13): 145–158.
  47. 1 2 Larkin, F. M. (1972). "Gaussian measure in Hilbert space and applications in numerical analysis". Rocky Mountain J. Math. 2 (3): 379–421. doi: 10.1216/RMJ-1972-2-3-379 .
  48. Owhadi, Houman; Scovel, Clint; Schäfer, Florian (2019). "Statistical Numerical Approximation". Notices of the American Mathematical Society. 66 (10): 1608–1617. doi: 10.1090/noti1963 . S2CID   204830421.
  49. Palasti, I.; Renyi, A (1956). "On interpolation theory and the theory of games". MTA Mat. Kat. Int. Kozl. 1: 529–540.
  50. Sard, A. (1963). Linear Approximation. Mathematical Surveys and Monographs. Vol. 9. American Mathematical Society. doi:10.1090/surv/009. ISBN   9780821815090.
  51. Kimeldorf, George S.; Wahba, Grace (1970). "A correspondence between Bayesian estimation on stochastic processes and smoothing by splines". Ann. Math. Statist. 41 (2): 495–502. doi: 10.1214/aoms/1177697089 .
  52. Traub, J. F.; Wasilkowski, G. W.; Woźniakowski, H. (1988). Information-Based Complexity. Computer Science and Scientific Computing. Boston, MA: Academic Press, Inc. ISBN   0-12-697545-0.{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)
  53. Packel, Edward W. (1987). "The algorithm designer versus nature: a game-theoretic approach to information-based complexity". J. Complexity. 3 (3): 244–257. doi: 10.1016/0885-064X(87)90014-8 .
  54. Owhadi, H. (2017). "Multigrid with rough coefficients and multiresolution operator decomposition from hierarchical information games". SIAM Review. 59 (1): 99–149. arXiv: 1503.03467 . doi: 10.1137/15M1013894 . S2CID   5877877.
  55. Micchelli, C. A.; Rivlin, T. J. (1977). "A survey of optimal recovery". Optimal estimation in approximation theory (Proc. Internat. Sympos., Freudenstadt, 1976. pp. 1–54. doi:10.1007/978-1-4684-2388-4_1. ISBN   978-1-4684-2390-7.