Line sampling

Last updated

Line sampling is a method used in reliability engineering to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The method is particularly suitable for high-dimensional reliability problems, in which the performance function exhibits moderate non-linearity with respect to the uncertain parameters [1] The method is suitable for analyzing black box systems, and unlike the importance sampling method of variance reduction, does not require detailed knowledge of the system.

Contents

The basic idea behind line sampling is to refine estimates obtained from the first-order reliability method (FORM), which may be incorrect due to the non-linearity of the limit state function. Conceptually, this is achieved by averaging the result of different FORM simulations. In practice, this is made possible by identifying the importance direction   in the input parameter space, which points towards the region which most strongly contributes to the overall failure probability. The importance direction can be closely related to the center of mass of the failure region, or to the failure point with the highest probability density, which often falls at the closest point to the origin of the limit state function, when the random variables of the problem have been transformed into the standard normal space. Once the importance direction has been set to point towards the failure region, samples are randomly generated from the standard normal space and lines are drawn parallel to the importance direction in order to compute the distance to the limit state function, which enables the probability of failure to be estimated for each sample. These failure probabilities can then be averaged to obtain an improved estimate.

Mathematical approach

Firstly the importance direction must be determined. This can be achieved by finding the design point, or the gradient of the limit state function.

A set of samples is generated using Monte Carlo simulation in the standard normal space. For each sample , the probability of failure in the line parallel to the important direction is defined as:

where   is equal to one for samples contributing to failure, and is zero otherwise:

  is the important direction,   is the probability density function of a Gaussian distribution (and   is a real number). In practice the roots of a nonlinear function must be found to estimate the partial probabilities of failure along each line. This is either done by interpolation of a few samples along the line, or by using the Newton–Raphson method.

The global probability of failure is the mean of the probability of failure on the lines:

where   is the total number of lines used in the analysis and the   are the partial probabilities of failure estimated along all the lines.

For problems in which the dependence of the performance function is only moderately non-linear with respect to the parameters modeled as random variables, setting the importance direction as the gradient vector of the performance function in the underlying standard normal space leads to highly efficient Line Sampling. In general it can be shown that the variance obtained by line sampling is always smaller than that obtained by conventional Monte Carlo simulation, and hence the line sampling algorithm converges more quickly. [1] The rate of convergence is made quicker still by recent advancements which allow the importance direction to be repeatedly updated throughout the simulation, and this is known as adaptive line sampling. [2]

An illustration of the line sampling algorithm. Two line samples are shown approaching the limit state surface. LineSamplingScheme.svg
An illustration of the line sampling algorithm. Two line samples are shown approaching the limit state surface.

Industrial application

The algorithm is particularly useful for performing reliability analysis on computationally expensive industrial black box models, since the limit state function can be non-linear and the number of samples required is lower than for other reliability analysis techniques such as subset simulation. [3] The algorithm can also be used to efficiently propagate epistemic uncertainty in the form of probability boxes, or random sets. [4] [5] A numerical implementation of the method is available in the open source software OpenCOSSAN. [6]

See also

Related Research Articles

The likelihood function describes the joint probability of the observed data as a function of the parameters of the chosen statistical model. For each specific parameter value in the parameter space, the likelihood function therefore assigns a probabilistic prediction to the observed data . Since it is essentially the product of sampling densities, the likelihood generally encapsulates both the data-generating process as well as the missing-data mechanism that produced the observed sample.

Multivariate normal distribution Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.

Least squares Approximation method in statistics

The method of least squares is a standard approach in regression analysis to approximate the solution of overdetermined systems by minimizing the sum of the squares of the residuals made in the results of each individual equation.

Logistic regression Statistical model for a binary dependent variable

In statistics, the (binary) logistic model is a statistical model that models the probability of one event taking place by having the log-odds for the event be a linear combination of one or more independent variables ("predictors"). In regression analysis, logistic regression is estimating the parameters of a logistic model. Formally, in binary logistic regression there is a single binary dependent variable, coded by a indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

Regression analysis Set of statistical processes for estimating the relationships among variables

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

Gauss–Newton algorithm

The Gauss–Newton algorithm is used to solve non-linear least squares problems, which is equivalent to minimizing a sum of squared function values. It is an extension of Newton's method for finding a minimum of a non-linear function. Since a sum of squares must be nonnegative, the algorithm can be viewed as using Newton's method to iteratively approximate zeroes of the sum, and thus minimizing the sum. It has the advantage that second derivatives, which can be challenging to compute, are not required.

In statistics, ordinary least squares (OLS) is a type of linear least squares method for estimating the unknown parameters in a linear regression model. OLS chooses the parameters of a linear function of a set of explanatory variables by the principle of least squares: minimizing the sum of the squares of the differences between the observed dependent variable in the given dataset and those predicted by the linear function of the independent variable.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the variance of observations is incorporated into the regression. WLS is also a specialization of generalized least squares.

In statistics, multinomial logistic regression is a classification method that generalizes logistic regression to multiclass problems, i.e. with more than two possible discrete outcomes. That is, it is a model that is used to predict the probabilities of the different possible outcomes of a categorically distributed dependent variable, given a set of independent variables.

In statistics, binomial regression is a regression analysis technique in which the response has a binomial distribution: it is the number of successes in a series of independent Bernoulli trials, where each trial has probability of success . In binomial regression, the probability of a success is related to explanatory variables: the corresponding concept in ordinary regression is to relate the mean value of the unobserved response to explanatory variables.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box-Cox transformed regressors.

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions.

Subset simulation is a method used in reliability engineering to compute small failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare event problem into a series of frequent event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest, covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using Markov chain Monte Carlo (MCMC).

In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables. The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable.

Quantile-parameterized distributions (QPDs) are probability distributions that are directly parameterized by data. They were motivated by the need for easy-to-use continuous probability distributions flexible enough to represent a wide range of uncertainties, such as those commonly encountered in business, economics, engineering, and science. Because QPDs are directly parameterized by data, they have the practical advantage of avoiding the intermediate step of parameter estimation, a time-consuming process that typically requires non-linear iterative methods to estimate probability-distribution parameters from data. Some QPDs have virtually unlimited shape flexibility and closed-form moments as well.

References

  1. 1 2 Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.
  2. de Angelis, Marco; Patelli, Edoardo; Beer, Michael (2015). "Advanced Line Sampling for efficient robust reliability analysis". Structural Safety. 52: 170–182. doi:10.1016/j.strusafe.2014.10.002. ISSN   0167-4730.
  3. Zio, E; Pedroni, N (2009). "Subset simulation and line sampling for advanced Monte Carlo reliability analysis". Reliability, Risk, and Safety. doi:10.1201/9780203859759.ch94. ISBN   978-0-415-55509-8.
  4. De Angelis, Marco (2015). Efficient Random Set Uncertainty Quantification by means of Advanced Sampling Techniques (Ph.D.). University of Liverpool.
  5. Patelli, E; de Angelis, M (2015). "Line sampling approach for extreme case analysis in presence of aleatory and epistemic uncertainties". Safety and Reliability of Complex Engineered Systems. pp. 2585–2593. doi:10.1201/b19094-339. ISBN   978-1-138-02879-1.
  6. Patelli, Edoardo (2016). "COSSAN: A Multidisciplinary Software Suite for Uncertainty Quantification and Risk Management". Handbook of Uncertainty Quantification. pp. 1–69. doi:10.1007/978-3-319-11259-6_59-1. ISBN   978-3-319-11259-6.