Surrogate model

Last updated January 31, 2025

A surrogate model is an engineering method used when an outcome of interest cannot be easily measured or computed, so an approximate mathematical model of the outcome is used instead. Most engineering design problems require experiments and/or simulations to evaluate design objective and constraint functions as a function of design variables. For example, in order to find the optimal airfoil shape for an aircraft wing, an engineer simulates the airflow around the wing for different shape variables (e.g., length, curvature, material, etc.). For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as design optimization, design space exploration, sensitivity analysis and "what-if" analysis become impossible since they require thousands or even millions of simulation evaluations.

One way of alleviating this burden is by constructing approximation models, known as surrogate models, metamodels or emulators, that mimic the behavior of the simulation model as closely as possible while being computationally cheaper to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), relying solely on the input-output behavior. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. This approach is also known as behavioral modeling or black-box modeling, though the terminology is not always consistent. When only a single design variable is involved, the process is known as curve fitting.

Though using surrogate models in lieu of experiments and simulations in engineering design is more common, surrogate modeling may be used in many other areas of science where there are expensive experiments and/or function evaluations.

Goals

The scientific challenge of surrogate modeling is the generation of a surrogate that is as accurate as possible, using as few simulation evaluations as possible. The process comprises three major steps which may be interleaved iteratively:

Sample selection (also known as sequential design, optimal experimental design (OED) or active learning)
Construction of the surrogate model and optimizing the model parameters (i.e., bias-variance tradeoff)
Appraisal of the accuracy of the surrogate.

The accuracy of the surrogate depends on the number and location of samples (expensive experiments or simulations) in the design space. Various design of experiments (DOE) techniques cater to different sources of errors, in particular, errors due to noise in the data or errors due to an improper surrogate model.

Types of surrogate models

Popular surrogate modeling approaches are: polynomial response surfaces; kriging; more generalized Bayesian approaches;^[1] gradient-enhanced kriging (GEK); radial basis function; support vector machines; space mapping;^[2] artificial neural networks and Bayesian networks.^[3] Other methods recently explored include Fourier surrogate modeling ^[4]^[5] and random forests.^[6]

For some problems, the nature of the true function is not known a priori, and therefore it is not clear which surrogate model will be the most accurate one. In addition, there is no consensus on how to obtain the most reliable estimates of the accuracy of a given surrogate. Many other problems have known physics properties. In these cases, physics-based surrogates such as space-mapping based models are commonly used.^[2]^[7]

Invariance properties

Recently proposed comparison-based surrogate models (e.g., ranking support vector machines) for evolutionary algorithms, such as CMA-ES, allow preservation of some invariance properties of surrogate-assisted optimizers:^[8]

Invariance with respect to monotonic transformations of the function (scaling)
Invariance with respect to orthogonal transformations of the search space (rotation)

Applications

An important distinction can be made between two different applications of surrogate models: design optimization and design space approximation (also known as emulation).

In surrogate model-based optimization, an initial surrogate is constructed using some of the available budgets of expensive experiments and/or simulations. The remaining experiments/simulations are run for designs which the surrogate model predicts may have promising performance. The process usually takes the form of the following search/update procedure.

Initial sample selection (the experiments and/or simulations to be run)
Construct surrogate model
Search surrogate model (the model can be searched extensively, e.g., using a genetic algorithm, as it is cheap to evaluate)
Run and update experiment/simulation at new location(s) found by search and add to sample
Iterate steps 2 to 4 until out of time or design is "good enough"

Depending on the type of surrogate used and the complexity of the problem, the process may converge on a local or global optimum, or perhaps none at all.^[9]

In design space approximation, one is not interested in finding the optimal parameter vector, but rather in the global behavior of the system. Here the surrogate is tuned to mimic the underlying model as closely as needed over the complete design space. Such surrogates are a useful, cheap way to gain insight into the global behavior of the system. Optimization can still occur as a post-processing step, although with no update procedure (see above), the optimum found cannot be validated.

Surrogate modeling software

Surrogate Modeling Toolbox (SMT: https://github.com/SMTorg/smt) is a Python package that contains a collection of surrogate modeling methods, sampling techniques, and benchmarking functions. This package provides a library of surrogate models that is simple to use and facilitates the implementation of additional methods. SMT is different from existing surrogate modeling libraries because of its emphasis on derivatives, including training derivatives used for gradient-enhanced modeling, prediction derivatives, and derivatives with respect to the training data. It also includes new surrogate models that are not available elsewhere: kriging by partial-least squares reduction and energy-minimizing spline interpolation.^[10]
Python library SAMBO Optimization supports sequential optimization with arbitrary models, with tree-based models and Gaussian process models built in.^[11]
Surrogates.jl is a Julia packages which offers tools like random forests, radial basis methods and kriging.

Related Research Articles

Mathematical optimization or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfields: discrete optimization and continuous optimization. Optimization problems arise in all quantitative disciplines from computer science and engineering to operations research and economics, and the development of solution methods has been of interest in mathematics for centuries.

In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.

In statistics, originally in geostatistics, kriging or Kriging, also known as Gaussian process regression, is a method of interpolation based on Gaussian process governed by prior covariances. Under suitable assumptions of the prior, kriging gives the best linear unbiased prediction (BLUP) at unsampled locations. Interpolating methods based on other criteria such as smoothness may not yield the BLUP. The method is widely used in the domain of spatial analysis and computer experiments. The technique is also known as Wiener–Kolmogorov prediction, after Norbert Wiener and Andrey Kolmogorov.

Multi-disciplinary design optimization (MDO) is a field of engineering that uses optimization methods to solve design problems incorporating a number of disciplines. It is also known as multidisciplinary system design optimization (MSDO), and multidisciplinary design analysis and optimization (MDAO).

A computer experiment or simulation experiment is an experiment used to study a computer simulation, also referred to as an in silico system. This area includes computational physics, computational chemistry, computational biology and other similar disciplines.

Model predictive control (MPC) is an advanced method of process control that is used to control a process while satisfying a set of constraints. It has been in use in the process industries in chemical plants and oil refineries since the 1980s. In recent years it has also been used in power system balancing models and in power electronics. Model predictive controllers rely on dynamic models of the process, most often linear empirical models obtained by system identification. The main advantage of MPC is the fact that it allows the current timeslot to be optimized, while keeping future timeslots in account. This is achieved by optimizing a finite time-horizon, but only implementing the current timeslot and then optimizing again, repeatedly, thus differing from a linear–quadratic regulator (LQR). Also MPC has the ability to anticipate future events and can take control actions accordingly. PID controllers do not have this predictive ability. MPC is nearly universally implemented as a digital control, although there is research into achieving faster response times with specially designed analog circuitry.

Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

Fitness approximation aims to approximate the objective or fitness functions in evolutionary optimization by building up machine learning models based on data collected from numerical simulations or physical experiments. The machine learning models for fitness approximation are also known as meta-models or surrogates, and evolutionary optimization based on approximated fitness evaluations are also known as surrogate-assisted evolutionary approximation. Fitness approximation in evolutionary optimization can be seen as a sub-area of data-driven evolutionary optimization.

Optimus is a Process Integration and Design Optimization (PIDO) platform developed by Noesis Solutions. Noesis Solutions takes part in key research projects, such as PHAROS and MATRIX.

The space mapping methodology for modeling and design optimization of engineering systems was first discovered by John Bandler in 1993. It uses relevant existing knowledge to speed up model generation and design optimization of a system. The knowledge is updated with new validation information from the system when available.

Bayesian optimization is a sequential design strategy for global optimization of black-box functions, that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions. With the rise of artificial intelligence innovation in the 21st century, Bayesian optimizations have found prominent use in machine learning problems for optimizing hyperparameter values.

pSeven is a design space exploration (DSE) software platform that was developed by pSeven SAS that features design, simulation, and analysis capabilities and assists in design decisions. It provides integration with third-party CAD and CAE software tools; multi-objective and robust optimization algorithms; data analysis, and uncertainty quantification tools.

<span class="mw-page-title-main">OptiSLang</span>

optiSLang is a software platform for CAE-based sensitivity analysis, multi-disciplinary optimization (MDO) and robustness evaluation. It was originally developed by Dynardo GmbH and provides a framework for numerical Robust Design Optimization (RDO) and stochastic analysis by identifying variables which contribute most to a predefined optimization goal. This includes also the evaluation of robustness, i.e. the sensitivity towards scatter of design variables or random fluctuations of parameters. In 2019, Dynardo GmbH was acquired by Ansys.

Simulation-based optimization integrates optimization techniques into simulation modeling and analysis. Because of the complexity of the simulation, the objective function may become difficult and expensive to evaluate. Usually, the underlying simulation model is stochastic, so that the objective function must be estimated using statistical estimation techniques.

Gradient-enhanced kriging (GEK) is a surrogate modeling technique used in engineering. A surrogate model is a prediction of the output of an expensive computer code. This prediction is based on a small number of evaluations of the expensive computer code.

This is a comparison of statistical analysis software that allows doing inference with Gaussian processes often using approximations.

Probabilistic numerics is an active field of study at the intersection of applied mathematics, statistics, and machine learning centering on the concept of uncertainty in computation. In probabilistic numerics, tasks in numerical analysis such as finding numerical solutions for integration, linear algebra, optimization and simulation and differential equations are seen as problems of statistical, probabilistic, or Bayesian inference.

References

↑ Ranftl, Sascha; von der Linden, Wolfgang (2021-11-13). "Bayesian Surrogate Analysis and Uncertainty Propagation". Physical Sciences Forum. 3 (1): 6. arXiv: 2101.04038 . doi: 10.3390/psf2021003006 . ISSN 2673-9984.
1 2 J.W. Bandler, Q. Cheng, S.A. Dakroury, A.S. Mohamed, M.H. Bakr, K. Madsen and J. Søndergaard, "Space mapping: the state of the art," IEEE Trans. Microwave Theory Tech., vol. 52, no. 1, pp. 337-361, Jan. 2004.
↑ Cardenas, IC (2019). "On the use of Bayesian networks as a meta-modeling approach to analyse uncertainties in slope stability analysis". Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards. 13 (1): 53–65. Bibcode:2019GAMRE..13...53C. doi:10.1080/17499518.2018.1498524. S2CID 216590427.
↑ Manzoni, L.; Papetti, D. M.; Cazzaniga, P.; Spolaor, S.; Mauri, G.; Besozzi, D.; Nobile, M. S. Surfing on Fitness Landscapes: A Boost on Optimization by Fourier Surrogate Modeling. Entropy 2020, 22, 285.
↑ Bliek, L.; Verstraete, H. R.; Verhaegen, M.; Wahls, S. Online optimization with costly and noisy measurements using random Fourier expansions. IEEE transactions on neural networks and learning systems 2016, 29(1), 167-182.
↑ Dasari, S.K.; P. Andersson; A. Cheddad (2019). "Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case". Artificial Intelligence Applications and Innovations (AIAI 2019). Springer. pp. 532–544. Retrieved 2019-06-02.
↑ J.E. Rayas-Sanchez,"Power in simplicity with ASM: tracing the aggressive space mapping algorithm over two decades of development and engineering applications", IEEE Microwave Magazine, vol. 17, no. 4, pp. 64-76, April 2016.
↑ Loshchilov, I.; M. Schoenauer; M. Sebag (2010). "Comparison-Based Optimizers Need Comparison-Based Surrogates" (PDF). Parallel Problem Solving from Nature (PPSN XI). Springer. pp. 364–1373.
↑ Jones, D.R (2001), "A taxonomy of global optimization methods based on response surfaces," Journal of Global Optimization, 21:345–383.
↑ Bouhlel, M.A.; Hwang, J.H.; Bartoli, Nathalie; Lafage, R.; Morlier, J.; Martins, J.R.R.A. (2019). "A Python surrogate modeling framework with derivatives". Advances in Engineering Software. 135: 102662. doi: 10.1016/j.advengsoft.2019.03.005 . S2CID 128324330.
↑ Kernc, SAMBO: Sequential And Model-Based Optimization: Efficient global optimization in Python, doi:10.5281/zenodo.14461363

External links

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Ranftl, Sascha; von der Linden, Wolfgang (2021-11-13). "Bayesian Surrogate Analysis and Uncertainty Propagation". Physical Sciences Forum. 3 (1): 6. arXiv: 2101.04038 . doi: 10.3390/psf2021003006 . ISSN 2673-9984.

[space_mapping-2] 1 2 J.W. Bandler, Q. Cheng, S.A. Dakroury, A.S. Mohamed, M.H. Bakr, K. Madsen and J. Søndergaard, "Space mapping: the state of the art," IEEE Trans. Microwave Theory Tech., vol. 52, no. 1, pp. 337-361, Jan. 2004.

[3] Cardenas, IC (2019). "On the use of Bayesian networks as a meta-modeling approach to analyse uncertainties in slope stability analysis". Georisk: Assessment and Management of Risk for Engineered Systems and Geohazards. 13 (1): 53–65. Bibcode:2019GAMRE..13...53C. doi:10.1080/17499518.2018.1498524. S2CID 216590427.

[4] Manzoni, L.; Papetti, D. M.; Cazzaniga, P.; Spolaor, S.; Mauri, G.; Besozzi, D.; Nobile, M. S. Surfing on Fitness Landscapes: A Boost on Optimization by Fourier Surrogate Modeling. Entropy 2020, 22, 285.

[5] Bliek, L.; Verstraete, H. R.; Verhaegen, M.; Wahls, S. Online optimization with costly and noisy measurements using random Fourier expansions. IEEE transactions on neural networks and learning systems 2016, 29(1), 167-182.

[6] Dasari, S.K.; P. Andersson; A. Cheddad (2019). "Random Forest Surrogate Models to Support Design Space Exploration in Aerospace Use-Case". Artificial Intelligence Applications and Innovations (AIAI 2019). Springer. pp. 532–544. Retrieved 2019-06-02.

[7] J.E. Rayas-Sanchez,"Power in simplicity with ASM: tracing the aggressive space mapping algorithm over two decades of development and engineering applications", IEEE Microwave Magazine, vol. 17, no. 4, pp. 64-76, April 2016.

[8] Loshchilov, I.; M. Schoenauer; M. Sebag (2010). "Comparison-Based Optimizers Need Comparison-Based Surrogates" (PDF). Parallel Problem Solving from Nature (PPSN XI). Springer. pp. 364–1373.

[9] Jones, D.R (2001), "A taxonomy of global optimization methods based on response surfaces," Journal of Global Optimization, 21:345–383.

[bouhlel2019-10] Bouhlel, M.A.; Hwang, J.H.; Bartoli, Nathalie; Lafage, R.; Morlier, J.; Martins, J.R.R.A. (2019). "A Python surrogate modeling framework with derivatives". Advances in Engineering Software. 135: 102662. doi: 10.1016/j.advengsoft.2019.03.005 . S2CID 128324330.

[11] Kernc, SAMBO: Sequential And Model-Based Optimization: Efficient global optimization in Python, doi:10.5281/zenodo.14461363

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]