Multifidelity simulation

Last updated
Multifidelity simulation methods
Mutlifid.jpg
Multifidelity Simulation Methods for Transportation [1]
Class
Data structureLow- and high-fidelity data
Worst-case performance Not defined
Worst-case space complexity Not defined

Multifidelity (or multi-fidelity) methods leverage both low- and high-fidelity data in order to maximize the accuracy of model estimates, while minimizing the cost associated with parametrization. They have been successfully used in impedance cardiography, [2] [3] [4] wing-design optimization, [5] robotic learning, [6] computational biomechanics, [7] and have more recently been extended to human-in-the-loop systems, such as aerospace [8] and transportation. [9] They include both model-based methods, where a generative model is available or can be learned, in addition to model-free methods, that include regression-based approaches, such as stacked-regression. [8] A more general class of regression-based multi-fidelity methods are Bayesian approaches, e.g. Bayesian linear regression, [3] Gaussian mixture models, [10] [11] Gaussian processes, [12] auto-regressive Gaussian processes, [2] or Bayesian polynomial chaos expansions. [4]

Contents

The approach used depends on the domain and properties of the data available, and is similar to the concept of metasynthesis, proposed by Judea Pearl. [13]

Data fidelity spectrum

Example of data-fidelity spectrum with benefits and limitations. DatSpec.jpg
Example of data-fidelity spectrum with benefits and limitations.

The fidelity of data can vary along a spectrum between low- and high-fidelity. The next sections provide examples of data across the fidelity spectrum, while defining the benefits and limitations of each type of data.

Low fidelity data (LoFi)

Low-fidelity data (LoFi) includes any data that was produced by a person or Stochastic Process that deviates from the real-world system of interest. For example, LoFi data can be produced by models of a physical system that use approximations to simulate the system, rather than modeling the system in an exhaustive manner. [5]

Moreover, in human-in-the-loop (HITL) situations the goal may be to predict the impact of technology on expert behavior within the real-world operational context. Machine learning can be used to train statistical models that predict expert behavior, provided that an adequate amount of high-fidelity (i.e., real-world) data are available or can be produced. [8]

LoFi benefits and limitations

In situations when there is not an adequate amount of high-fidelity data available to train the model, low-fidelity data can sometimes be used. For example, low-fidelity data can be acquired by using a distributed simulation platform, such as X-Plane, and requiring novice participants to operate in scenarios that are approximations of the real-world context. The benefit of using low-fidelity data is that they are relatively inexpensive to acquire, so it is possible to elicit larger amounts of data. However, the limitation is that the low-fidelity data may not be useful for predicting real-world expert (i.e., high-fidelity) performance due to differences between the low-fidelity simulation platform and the real-world context, or between novice and expert performance (e.g., due to training). [8] [9]

High-fidelity data (HiFi)

High-fidelity data (HiFi) includes data that was produced by a person or Stochastic Process that closely matches the operational context of interest. For example, in wing design optimization, high-fidelity data uses physical models in simulation that produce results that closely match the wing in a similar real-world setting. [5] In HITL situations, HiFi data would be produced from an operational expert acting in the technological and situational context of interest. [9]

HiFi benefits and limitations

An obvious benefit of utilizing high-fidelity data is that the estimates produced by the model should generalize well to the real-world context. However, these data are expensive in terms of both time and money, which limits the amount of data that can be obtained. The limited amount of data available can significantly impair the ability of the model to produce valid estimates. [8]

Multifidelity methods (MfM)

Multifidelity methods attempt to leverage the strengths of each data source, while overcoming the limitations. Although small to medium differences between low- and high-fidelity data are sometimes able to be overcome by multifidelity models, large differences (e.g., in KL divergence between novice and expert action distributions) can be problematic leading to decreased predictive performance when compared to models that exclusively relied on high-fidelity data. [8]

Multifidelity models enable low-fidelity data to be collected on different technology concepts to evaluate the risk associated with each concept before actually deploying the system. [14]

Bayesian auto-regressive Gaussian processes

In an auto-regressive model of Gaussian processes (GP), each level of output fidelity, , where a higher denotes a higher fidelity, is modeled as a GP, , [15] [2] which can be expressed in terms of the previous level's GP, , a proportionality constant and a "difference-GP" as follows:

The scaling constant that quantifies the correlation of levels and , and can generally depend on . [16] [17]

Under the assumption, that all information about a level is contained in the data corresponding to the same pivot point at level as well as , semi-analytical first and second moments are feasible. This assumption formally is

I.e. given a data at on level , there is no further information about level to extract from the data on level .

Related Research Articles

Geostatistics is a branch of statistics focusing on spatial or spatiotemporal datasets. Developed originally to predict probability distributions of ore grades for mining operations, it is currently applied in diverse disciplines including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy, geography, forestry, environmental control, landscape ecology, soil science, and agriculture. Geostatistics is applied in varied branches of geography, particularly those involving the spread of diseases (epidemiology), the practice of commerce and military planning (logistics), and the development of efficient spatial networks. Geostatistical algorithms are incorporated in many places, including geographic information systems (GIS).

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution, i.e. every finite linear combination of them is normally distributed. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

<span class="mw-page-title-main">Kriging</span> Method of interpolation

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

Sensitivity analysis is the study of how the uncertainty in the output of a mathematical model or system can be divided and allocated to different sources of uncertainty in its inputs. A related practice is uncertainty analysis, which has a greater focus on uncertainty quantification and propagation of uncertainty; ideally, uncertainty and sensitivity analysis should be run in tandem.

A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an in silico system. This area includes computational physics, computational chemistry, computational biology and other similar disciplines.

<span class="mw-page-title-main">Sensor fusion</span>

Sensor fusion is the process of combining sensor data or data derived from disparate sources such that the resulting information has less uncertainty than would be possible when these sources were used individually. For instance, one could potentially obtain a more accurate location estimate of an indoor object by combining multiple data sources such as video cameras and WiFi localization signals. The term uncertainty reduction in this case can mean more accurate, more complete, or more dependable, or refer to the result of an emerging view, such as stereoscopic vision.

In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear response variable depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions.

<span class="mw-page-title-main">Local regression</span> Moving average and polynomial regression method for smoothing data

Local regression or local polynomial regression, also known as moving regression, is a generalization of the moving average and polynomial regression. Its most common methods, initially developed for scatterplot smoothing, are LOESS and LOWESS, both pronounced. They are two strongly related non-parametric regression methods that combine multiple regression models in a k-nearest-neighbor-based meta-model. In some fields, LOESS is known and commonly referred to as Savitzky–Golay filter.

Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

<span class="mw-page-title-main">Data fusion</span>

Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

Polynomial chaos (PC), also called polynomial chaos expansion (PCE) and Wiener chaos expansion, is a method for representing a random variable in terms of a polynomial function of other random variables. The polynomials are chosen to be orthogonal with respect to the joint probability distribution of these random variables. PCE can be used, e.g., to determine the evolution of uncertainty in a dynamical system when there is probabilistic uncertainty in the system parameters. Note that despite its name, PCE has no immediate connections to chaos theory.

In geophysics, seismic inversion is the process of transforming seismic reflection data into a quantitative rock-property description of a reservoir. Seismic inversion may be pre- or post-stack, deterministic, random or geostatistical; it typically includes other reservoir measurements such as well logs and cores.

Gradient-enhanced kriging (GEK) is a surrogate modeling technique used in engineering. A surrogate model is a prediction of the output of an expensive computer code. This prediction is based on a small number of evaluations of the expensive computer code.

In regression analysis, an interval predictor model (IPM) is an approach to regression where bounds on the function to be approximated are obtained. This differs from other techniques in machine learning, where usually one wishes to estimate point values or an entire probability distribution. Interval Predictor Models are sometimes referred to as a nonparametric regression technique, because a potentially infinite set of functions are contained by the IPM, and no specific distribution is implied for the regressed variables.

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Probabilistic numerics is a scientific field at the intersection of statistics, machine learning and applied mathematics, where tasks in numerical analysis including finding numerical solutions for integration, linear algebra, optimisation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

Bayesian quadrature is a method for approximating intractable integration problems. It falls within the class of probabilistic numerical methods. Bayesian quadrature views numerical integration as a Bayesian inference task, where function evaluations are used to estimate the integral of that function. For this reason, it is sometimes also referred to as "Bayesian probabilistic numerical integration" or "Bayesian numerical integration". The name "Bayesian cubature" is also sometimes used when the integrand is multi-dimensional. A potential advantage of this approach is that it provides probabilistic uncertainty quantification for the value of the integral.

References

  1. 1 2 Erik J. Schlicht (2017). "SAMSI Summer Program on Transportation Statistics: Erik Schlicht, Aug 15, 2017". Using Multifidelity Methods to Estimate the Risk Associated with Transportation Systems.
  2. 1 2 3 Ranftl, Sascha; Melito, Gian Marco; Badeli, Vahid; Reinbacher-Köstinger, Alice; Ellermann, Katrin; von der Linden, Wolfgang (2019-12-31). "Bayesian Uncertainty Quantification with Multi-Fidelity Data and Gaussian Processes for Impedance Cardiography of Aortic Dissection". Entropy. 22 (1): 58. Bibcode:2019Entrp..22...58R. doi: 10.3390/e22010058 . ISSN   1099-4300. PMC   7516489 . PMID   33285833.
  3. 1 2 Ranftl, Sascha; Melito, Gian Marco; Badeli, Vahid; Reinbacher-Köstinger, Alice; Ellermann, Katrin; Linden, Wolfgang von der (2019-12-09). "On the Diagnosis of Aortic Dissection with Impedance Cardiography: A Bayesian Feasibility Study Framework with Multi-Fidelity Simulation Data". Proceedings. 33 (1): 24. doi: 10.3390/proceedings2019033024 . ISSN   2504-3900.
  4. 1 2 Badeli, Vahid; Ranftl, Sascha; Melito, Gian Marco; Reinbacher-Köstinger, Alice; Von Der Linden, Wolfgang; Ellermann, Katrin; Biro, Oszkar (2021-01-01). "Bayesian inference of multi-sensors impedance cardiography for detection of aortic dissection". COMPEL - the International Journal for Computation and Mathematics in Electrical and Electronic Engineering. 41 (3): 824–839. doi:10.1108/COMPEL-03-2021-0072. ISSN   0332-1649. S2CID   245299500.
  5. 1 2 3 Robinson, T.D.; et, al (2006). "Multifidelity Optimization for Variable-Complexity Design". 11th AIAA/ISSMO Multidisciplinary Analysis and Optimization Conference: 1–18.
  6. Cutler, M.; et, al (2015). "Real-world reinforcement learning via multifidelity simulators". IEEE Transactions on Robotics. 31 (3): 655–671. doi:10.1109/TRO.2015.2419431. S2CID   15423476.
  7. Sajjadinia, Seyed Shayan; Carpentieri, Bruno; Shriram, Duraisamy; Holzapfel, Gerhard A. (2022-09-01). "Multi-fidelity surrogate modeling through hybrid machine learning for biomechanical and finite element analysis of soft tissues". Computers in Biology and Medicine. 148: 105699. doi:10.1016/j.compbiomed.2022.105699. ISSN   0010-4825.
  8. 1 2 3 4 5 6 Schlicht, Erik (2014). "Predicting the behavior of interacting humans by fusing data from multiple sources". arXiv: 1408.2053 [cs.AI].
  9. 1 2 3 Schlicht, Erik J; Morris, Nichole L (2017). "Estimating the risk associated with transportation technology using multifidelity simulation". arXiv: 1701.08588 [stat.AP].
  10. Koutsourelakis, Phaedon-Stelios (January 2009). "Accurate Uncertainty Quantification Using Inaccurate Computational Models". SIAM Journal on Scientific Computing. 31 (5): 3274–3300. doi:10.1137/080733565. ISSN   1064-8275.
  11. Biehler, Jonas; Gee, Michael W.; Wall, Wolfgang A. (2015-06-01). "Towards efficient uncertainty quantification in complex and large-scale biomechanical problems based on a Bayesian multi-fidelity scheme". Biomechanics and Modeling in Mechanobiology. 14 (3): 489–513. doi:10.1007/s10237-014-0618-0. ISSN   1617-7940. PMID   25245816. S2CID   42417006.
  12. Nitzler, Jonas; Biehler, Jonas; Fehn, Niklas; Koutsourelakis, Phaedon-Stelios; Wall, Wolfgang A. (2020-01-09). "A Generalized Probabilistic Learning Approach for Multi-Fidelity Uncertainty Propagation in Complex Physical Simulations". arXiv: 2001.02892 [cs.CE].
  13. Judea Pearl (2012). "The Do-Calculus Revisited". Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence (PDF). Corvallis, OR: AUAI Press. pp. 4–11. S2CID   2768684. Archived from the original (PDF) on 2018-02-05.
  14. Reshama Shaikh and Erik J. Schlicht (2017). "The Machine Learning Conference Interview with Dr. Schlicht". Interview Regarding the use of Multifidelity Simulation Methods.
  15. Kennedy, M. (2000-03-01). "Predicting the output from a complex computer code when fast approximations are available". Biometrika. 87 (1): 1–13. doi:10.1093/biomet/87.1.1. ISSN   0006-3444.
  16. Parussini, L.; Venturi, D.; Perdikaris, P.; Karniadakis, G.E. (May 2017). "Multi-fidelity Gaussian process regression for prediction of random fields". Journal of Computational Physics. 336: 36–50. Bibcode:2017JCoPh.336...36P. doi:10.1016/j.jcp.2017.01.047. hdl: 11368/2903585 .
  17. Le Gratiet, Loic; Garnier, Josselin (2014). "Recursive Co-Kriging Model for Design of Computer Experiments with Multiple Levels of Fidelity". International Journal for Uncertainty Quantification. 4 (5): 365–386. doi:10.1615/Int.J.UncertaintyQuantification.2014006914. ISSN   2152-5080. S2CID   14157948.