Verification and validation of computer simulation models

Last updated

Verification and validation of computer simulation models is conducted during the development of a simulation model with the ultimate goal of producing an accurate and credible model. [1] [2] "Simulation models are increasingly being used to solve problems and to aid in decision-making. The developers and users of these models, the decision makers using information obtained from the results of these models, and the individuals affected by decisions based on such models are all rightly concerned with whether a model and its results are "correct". [3] This concern is addressed through verification and validation of the simulation model.

Contents

Simulation models are approximate imitations of real-world systems and they never exactly imitate the real-world system. Due to that, a model should be verified and validated to the degree needed for the model's intended purpose or application. [3]

The verification and validation of a simulation model starts after functional specifications have been documented and initial model development has been completed. [4] Verification and validation is an iterative process that takes place throughout the development of a model. [1] [4]

Verification

In the context of computer simulation, verification of a model is the process of confirming that it is correctly implemented with respect to the conceptual model (it matches specifications and assumptions deemed acceptable for the given purpose of application). [1] [4] During verification the model is tested to find and fix errors in the implementation of the model. [4] Various processes and techniques are used to assure the model matches specifications and assumptions with respect to the model concept. The objective of model verification is to ensure that the implementation of the model is correct.

There are many techniques that can be utilized to verify a model. These include, but are not limited to, having the model checked by an expert, making logic flow diagrams that include each logically possible action, examining the model output for reasonableness under a variety of settings of the input parameters, and using an interactive debugger. [1] Many software engineering techniques used for software verification are applicable to simulation model verification. [1]

Validation

Validation checks the accuracy of the model's representation of the real system. Model validation is defined to mean "substantiation that a computerized model within its domain of applicability possesses a satisfactory range of accuracy consistent with the intended application of the model". [3] A model should be built for a specific purpose or set of objectives and its validity determined for that purpose. [3]

There are many approaches that can be used to validate a computer model. The approaches range from subjective reviews to objective statistical tests. One approach that is commonly used is to have the model builders determine validity of the model through a series of tests. [3]

Naylor and Finger [1967] formulated a three-step approach to model validation that has been widely followed: [1]

Step 1. Build a model that has high face validity.

Step 2. Validate model assumptions.

Step 3. Compare the model input-output transformations to corresponding input-output transformations for the real system. [5]

Face validity

A model that has face validity appears to be a reasonable imitation of a real-world system to people who are knowledgeable of the real world system. [4] Face validity is tested by having users and people knowledgeable with the system examine model output for reasonableness and in the process identify deficiencies. [1] An added advantage of having the users involved in validation is that the model's credibility to the users and the user's confidence in the model increases. [1] [4] Sensitivity to model inputs can also be used to judge face validity. [1] For example, if a simulation of a fast food restaurant drive through was run twice with customer arrival rates of 20 per hour and 40 per hour then model outputs such as average wait time or maximum number of customers waiting would be expected to increase with the arrival rate.

Validation of model assumptions

Assumptions made about a model generally fall into two categories: structural assumptions about how system works and data assumptions. Also we can consider the simplification assumptions that are those that we use to simplify the reality. [6]

Structural assumptions

Assumptions made about how the system operates and how it is physically arranged are structural assumptions. For example, the number of servers in a fast food drive through lane and if there is more than one how are they utilized? Do the servers work in parallel where a customer completes a transaction by visiting a single server or does one server take orders and handle payment while the other prepares and serves the order. Many structural problems in the model come from poor or incorrect assumptions. [4] If possible the workings of the actual system should be closely observed to understand how it operates. [4] The systems structure and operation should also be verified with users of the actual system. [1]

Data assumptions

There must be a sufficient amount of appropriate data available to build a conceptual model and validate a model. Lack of appropriate data is often the reason attempts to validate a model fail. [3] Data should be verified to come from a reliable source. A typical error is assuming an inappropriate statistical distribution for the data. [1] The assumed statistical model should be tested using goodness of fit tests and other techniques. [1] [3] Examples of goodness of fit tests are the Kolmogorov–Smirnov test and the chi-square test. Any outliers in the data should be checked. [3]

Simplification assumptions

Are those assumptions that we know that are not true, but are needed to simplify the problem we want to solve. [6] The use of this assumptions must be restricted to assure that the model is correct enough to serve as an answer for the problem we want to solve.

Validating input-output transformations

The model is viewed as an input-output transformation for these tests. The validation test consists of comparing outputs from the system under consideration to model outputs for the same set of input conditions. Data recorded while observing the system must be available in order to perform this test. [3] The model output that is of primary interest should be used as the measure of performance. [1] For example, if system under consideration is a fast food drive through where input to model is customer arrival time and the output measure of performance is average customer time in line, then the actual arrival time and time spent in line for customers at the drive through would be recorded. The model would be run with the actual arrival times and the model average time in line would be compared with the actual average time spent in line using one or more tests.

Hypothesis testing

Statistical hypothesis testing using the t-test can be used as a basis to accept the model as valid or reject it as invalid.

The hypothesis to be tested is

H0 the model measure of performance = the system measure of performance

versus

H1 the model measure of performance ≠ the system measure of performance.

The test is conducted for a given sample size and level of significance or α. To perform the test a number n statistically independent runs of the model are conducted and an average or expected value, E(Y), for the variable of interest is produced. Then the test statistic, t0 is computed for the given α, n, E(Y) and the observed value for the system μ0

and the critical value for α and n-1 the degrees of freedom
is calculated.

If

reject H0, the model needs adjustment.

There are two types of error that can occur using hypothesis testing, rejecting a valid model called type I error or "model builders risk" and accepting an invalid model called Type II error, β, or "model user's risk". [3] The level of significance or α is equal the probability of type I error. [3] If α is small then rejecting the null hypothesis is a strong conclusion. [1] For example, if α = 0.05 and the null hypothesis is rejected there is only a 0.05 probability of rejecting a model that is valid. Decreasing the probability of a type II error is very important. [1] [3] The probability of correctly detecting an invalid model is 1 - β. The probability of a type II error is dependent of the sample size and the actual difference between the sample value and the observed value. Increasing the sample size decreases the risk of a type II error.

Model accuracy as a range

A statistical technique where the amount of model accuracy is specified as a range has recently been developed. The technique uses hypothesis testing to accept a model if the difference between a model's variable of interest and a system's variable of interest is within a specified range of accuracy. [7] A requirement is that both the system data and model data be approximately Normally Independent and Identically Distributed (NIID). The t-test statistic is used in this technique. If the mean of the model is μm and the mean of system is μs then the difference between the model and the system is D = μm - μs. The hypothesis to be tested is if D is within the acceptable range of accuracy. Let L = the lower limit for accuracy and U = upper limit for accuracy. Then

H0 L ≤ D ≤ U

versus

H1 D < L or D > U

is to be tested.

The operating characteristic (OC) curve is the probability that the null hypothesis is accepted when it is true. The OC curve characterizes the probabilities of both type I and II errors. Risk curves for model builder's risk and model user's can be developed from the OC curves. Comparing curves with fixed sample size tradeoffs between model builder's risk and model user's risk can be seen easily in the risk curves. [7] If model builder's risk, model user's risk, and the upper and lower limits for the range of accuracy are all specified then the sample size needed can be calculated. [7]

Confidence intervals

Confidence intervals can be used to evaluate if a model is "close enough" [1] to a system for some variable of interest. The difference between the known model value, μ0, and the system value, μ, is checked to see if it is less than a value small enough that the model is valid with respect that variable of interest. The value is denoted by the symbol ε. To perform the test a number, n, statistically independent runs of the model are conducted and a mean or expected value, E(Y) or μ for simulation output variable of interest Y, with a standard deviation S is produced. A confidence level is selected, 100(1-α). An interval, [a,b], is constructed by

,

where

is the critical value from the t-distribution for the given level of significance and n-1 degrees of freedom.

If |a-μ0| > ε and |b-μ0| > ε then the model needs to be calibrated since in both cases the difference is larger than acceptable.
If |a-μ0| < ε and |b-μ0| < ε then the model is acceptable as in both cases the error is close enough.
If |a-μ0| < ε and |b-μ0| > ε or vice versa then additional runs of the model are needed to shrink the interval.

Graphical comparisons

If statistical assumptions cannot be satisfied or there is insufficient data for the system a graphical comparisons of model outputs to system outputs can be used to make a subjective decisions, however other objective tests are preferable. [3]

ASME Standards

Documents and standards involving verification and validation of computational modeling and simulation are developed by the American Society of Mechanical Engineers (ASME) Verification and Validation (V&V) Committee. ASME V&V 10 provides guidance in assessing and increasing the credibility of computational solid mechanics models through the processes of verification, validation, and uncertainty quantification. [8] ASME V&V 10.1 provides a detailed example to illustrate the concepts described in ASME V&V 10. [9] ASME V&V 20 provides a detailed methodology for validating computational simulations as applied to fluid dynamics and heat transfer. [10] ASME V&V 40 provides a framework for establishing model credibility requirements for computational modeling, and presents examples specific in the medical device industry. [11]

See also

Related Research Articles

Biostatistics are the development and application of statistical methods to a wide range of topics in biology. It encompasses the design of biological experiments, the collection and analysis of data from those experiments and the interpretation of the results.

A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data. A statistical model represents, often in considerably idealized form, the data-generating process.

The statistical power of a binary hypothesis test is the probability that the test correctly rejects the null hypothesis when a specific alternative hypothesis is true. It is commonly denoted by , and represents the chances of a "true positive" detection conditional on the actual existence of an effect to detect. Statistical power ranges from 0 to 1, and as the power of a test increases, the probability of making a type II error by wrongly failing to reject the null hypothesis decreases.

Computer simulation Process of mathematical modelling, performed on a computer

Computer simulation is the process of mathematical modelling, performed on a computer, which is designed to predict the behaviour of, or the outcome of, a real-world or physical system. The reliability of some mathematical models can be determined by comparing their results to the real-world outcomes they aim to predict. Computer simulations have become a useful tool for the mathematical modeling of many natural systems in physics, astrophysics, climatology, chemistry, biology and manufacturing, as well as human systems in economics, psychology, social science, health care and engineering. Simulation of a system is represented as the running of the system's model. It can be used to explore and gain new insights into new technology and to estimate the performance of systems too complex for analytical solutions.

Cross-validation (statistics) Statistical model validation technique

Cross-validation, sometimes called rotation estimation or out-of-sample testing, is any of various similar model validation techniques for assessing how the results of a statistical analysis will generalize to an independent data set. Cross-validation is a resampling method that uses different portions of the data to test and train a model on different iterations. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. In a prediction problem, a model is usually given a dataset of known data on which training is run, and a dataset of unknown data against which the model is tested. The goal of cross-validation is to test the model's ability to predict new data that was not used in estimating it, in order to flag problems like overfitting or selection bias and to give an insight on how the model will generalize to an independent dataset.

The Akaike information criterion (AIC) is an estimator of prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection.

Monte Carlo methods are used in corporate finance and mathematical finance to value and analyze (complex) instruments, portfolios and investments by simulating the various sources of uncertainty affecting their value, and then determining the distribution of their value over the range of resultant outcomes. This is usually done by help of stochastic asset models. The advantage of Monte Carlo methods over other techniques increases as the dimensions of the problem increase.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

A permutation test is an exact statistical hypothesis test making use of the proof by contradiction in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under possible rearrangements of the observed data. Permutation tests are, therefore, a form of resampling.

In statistics, resampling is any of a variety of methods for doing one of the following:

  1. Estimating the precision of sample statistics by using subsets of available data (jackknifing) or drawing randomly with replacement from a set of data points (bootstrapping)
  2. Permutation tests are exact tests: Exchanging labels on data points when performing significance tests
  3. Validating models by using random subsets

In statistical hypothesis testing, a type I error is the mistaken rejection of an actually true null hypothesis, while a type II error is the failure to reject a null hypothesis that is actually false. Much of statistical theory revolves around the minimization of one or both of these errors, though the complete elimination of either is a statistical impossibility if the outcome is not determined by a known, observable causal process. By selecting a low threshold (cut-off) value and modifying the alpha (α) level, the quality of the hypothesis test can be increased. The knowledge of type I errors and type II errors is widely used in medical science, biometrics and computer science.

Bootstrapping is any test or metric that uses random sampling with replacement, and falls under the broader class of resampling methods. Bootstrapping assigns measures of accuracy to sample estimates. This technique allows estimation of the sampling distribution of almost any statistic using random sampling methods.

Approximate Bayesian computation (ABC) constitutes a class of computational methods rooted in Bayesian statistics that can be used to estimate the posterior distributions of model parameters.

Mauchly's sphericity test or Mauchly's W is a statistical test used to validate a repeated measures analysis of variance (ANOVA). It was developed in 1940 by John Mauchly.

In statistics, one-way analysis of variance is a technique that can be used to compare whether two sample's means are significantly different or not. This technique can be used only for numerical response data, the "Y", usually one variable, and numerical or (usually) categorical input data, the "X", always one variable, hence "one-way".

Tukey's range test, also known as Tukey's test, Tukey method, Tukey's honest significance test, or Tukey's HSDtest, is a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other.

In statistics, model validation is the task of confirming that the outputs of a statistical model are acceptable with respect to the real data-generating process. In other words, model validation is the task of confirming that the outputs of a statistical model have enough fidelity to the outputs of the data-generating process that the objectives of the investigation can be achieved.

In mathematics, error analysis is the study of kind and quantity of error, or uncertainty, that may be present in the solution to a problem. This issue is particularly prominent in applied areas such as numerical analysis and statistics.

Occam learning Model of algorithmic learning

In computational learning theory, Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related to probably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

The harmonic mean p-value(HMP) is a statistical technique for addressing the multiple comparisons problem that controls the strong-sense family-wise error rate. It improves on the power of Bonferroni correction by performing combined tests, i.e. by testing whether groups of p-values are statistically significant, like Fisher's method. However, it avoids the restrictive assumption that the p-values are independent, unlike Fisher's method. Consequently, it controls the false positive rate when tests are dependent, at the expense of less power when tests are independent. Besides providing an alternative to approaches such as Bonferroni correction that controls the stringent family-wise error rate, it also provides an alternative to the widely-used Benjamini-Hochberg procedure (BH) for controlling the less-stringent false discovery rate. This is because the power of the HMP to detect significant groups of hypotheses is greater than the power of BH to detect significant individual hypotheses.

References

  1. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Banks, Jerry; Carson, John S.; Nelson, Barry L.; Nicol, David M. Discrete-Event System Simulation Fifth Edition, Upper Saddle River, Pearson Education, Inc. 2010 ISBN   0136062121
  2. Schlesinger, S.; et al. (1979). "Terminology for model credibility". Simulation. 32 (3): 103–104. doi:10.1177/003754977903200304.
  3. 1 2 3 4 5 6 7 8 9 10 11 12 13 Sargent, Robert G. "VERIFICATION AND VALIDATION OF SIMULATION MODELS". Proceedings of the 2011 Winter Simulation Conference.
  4. 1 2 3 4 5 6 7 8 Carson, John, "MODEL VERIFICATION AND VALIDATION". Proceedings of the 2002 Winter Simulation Conference.
  5. NAYLOR, T. H., AND J. M. FINGER [1967], "Verification of Computer Simulation Models", Management Science, Vol. 2, pp. B92– B101., cited in Banks, Jerry; Carson, John S.; Nelson, Barry L.; Nicol, David M. Discrete-Event System Simulation Fifth Edition, Upper Saddle River, Pearson Education, Inc. 2010 p. 396. ISBN   0136062121
  6. 1 2 1. Fonseca, P. Simulation hypotheses. In Proceedings of SIMUL 2011; 2011; pp. 114–119. https://www.researchgate.net/publication/262187532_Simulation_hypotheses_A_proposed_taxonomy_for_the_hypotheses_used_in_a_simulation_model
  7. 1 2 3 Sargent, R. G. 2010. "A New Statistical Procedure for Validation of Simulation and Stochastic Models." Technical Report SYR-EECS-2010-06, Department of Electrical Engineering and Computer Science, Syracuse University, Syracuse, New York.
  8. “V&V 10 – 2006 Guide for Verification and Validation in Computational Solid Mechanics”. Standards. ASME. Retrieved 2 September 2018.
  9. “V&V 10.1 – 2012 An Illustration of the Concepts of Verification and Validation in Computational Solid Mechanics”. Standards. ASME. Retrieved 2 September 2018.
  10. “V&V 20 – 2009 Standard for Verification and Validation in Computational Fluid Dynamics and Heat Transfer”. Standards. ASME. Retrieved 2 September 2018.
  11. “V&V 40 Industry Day”. Verification and Validation Symposium. ASME. Retrieved 2 September 2018.