Line sampling

Last updated November 12, 2024

Line sampling is a method used in reliability engineering to compute small (i.e., rare event) failure probabilities encountered in engineering systems. The method is particularly suitable for high-dimensional reliability problems, in which the performance function exhibits moderate non-linearity with respect to the uncertain parameters ^[1] The method is suitable for analyzing black box systems, and unlike the importance sampling method of variance reduction, does not require detailed knowledge of the system.

The basic idea behind line sampling is to refine estimates obtained from the first-order reliability method (FORM), which may be incorrect due to the non-linearity of the limit state function. Conceptually, this is achieved by averaging the result of different FORM simulations. In practice, this is made possible by identifying the importance direction ${\boldsymbol {\alpha }}$ in the input parameter space, which points towards the region which most strongly contributes to the overall failure probability. The importance direction can be closely related to the center of mass of the failure region, or to the failure point with the highest probability density, which often falls at the closest point to the origin of the limit state function, when the random variables of the problem have been transformed into the standard normal space. Once the importance direction has been set to point towards the failure region, samples are randomly generated from the standard normal space and lines are drawn parallel to the importance direction in order to compute the distance to the limit state function, which enables the probability of failure to be estimated for each sample. These failure probabilities can then be averaged to obtain an improved estimate.

Mathematical approach

Firstly the importance direction must be determined. This can be achieved by finding the design point, or the gradient of the limit state function.

A set of samples is generated using Monte Carlo simulation in the standard normal space. For each sample ${\boldsymbol {x}}$ , the probability of failure in the line parallel to the important direction is defined as:

p_{f}({\boldsymbol {x}})=\int _{-\infty }^{+\infty }I({\boldsymbol {x}}+\beta \cdot {\boldsymbol {\alpha }})\varphi (\beta )\,d\beta

where $I(\cdot )$ is equal to one for samples contributing to failure, and is zero otherwise:

I_{f}({\boldsymbol {x}})={\begin{cases}1&{\text{if }}{\boldsymbol {x}}\in \Omega _{f}\\0&{\text{else}}\end{cases}}

${\boldsymbol {\alpha }}$ is the important direction, $\varphi$ is the probability density function of a Gaussian distribution (and $\beta$ is a real number). In practice the roots of a nonlinear function must be found to estimate the partial probabilities of failure along each line. This is either done by interpolation of a few samples along the line, or by using the Newton–Raphson method.

The global probability of failure is the mean of the probability of failure on the lines:

{\tilde {p}}_{f}={\frac {1}{N_{L}}}\sum _{i=1}^{N_{L}}p_{f}^{(i)}

where $N_{L}$ is the total number of lines used in the analysis and the $p_{f}^{(i)}$ are the partial probabilities of failure estimated along all the lines.

For problems in which the dependence of the performance function is only moderately non-linear with respect to the parameters modeled as random variables, setting the importance direction as the gradient vector of the performance function in the underlying standard normal space leads to highly efficient Line Sampling. In general it can be shown that the variance obtained by line sampling is always smaller than that obtained by conventional Monte Carlo simulation, and hence the line sampling algorithm converges more quickly.^[1] The rate of convergence is made quicker still by recent advancements which allow the importance direction to be repeatedly updated throughout the simulation, and this is known as adaptive line sampling.^[2]

An illustration of the line sampling algorithm. Two line samples are shown approaching the limit state surface. LineSamplingScheme.svg — An illustration of the line sampling algorithm. Two line samples are shown approaching the limit state surface.

Industrial application

The algorithm is particularly useful for performing reliability analysis on computationally expensive industrial black box models, since the limit state function can be non-linear and the number of samples required is lower than for other reliability analysis techniques such as subset simulation.^[3] The algorithm can also be used to efficiently propagate epistemic uncertainty in the form of probability boxes, or random sets.^[4]^[5] A numerical implementation of the method is available in the open source software OpenCOSSAN.^[6]

Related Research Articles

A likelihood function measures how well a statistical model explains observed data by calculating the probability of seeing that data under different parameter values of the model. It is constructed from the joint probability distribution of the random variable that (presumably) generated the observations. When evaluated on the actual data points, it becomes a function solely of the model parameters.

<span class="mw-page-title-main">Multivariate normal distribution</span> Generalization of the one-dimensional normal distribution to higher dimensions

In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables, each of which clusters around a mean value.

The method of least squares is a parameter estimation method in regression analysis based on minimizing the sum of the squares of the residuals made in the results of each individual equation.

In statistics, the logistic model is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables. In regression analysis, logistic regression estimates the parameters of a logistic model. In binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled "0" and "1", while the independent variables can each be a binary variable or a continuous variable. The corresponding probability of the value labeled "1" can vary between 0 and 1, hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. See § Background and § Definition for formal mathematics, and § Example for a worked example.

In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.

In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable and one or more error-free independent variables. The most common form of regression analysis is linear regression, in which one finds the line that most closely fits the data according to a specific mathematical criterion. For example, the method of ordinary least squares computes the unique line that minimizes the sum of squared differences between the true data and that line. For specific mathematical reasons, this allows the researcher to estimate the conditional expectation of the dependent variable when the independent variables take on a given set of values. Less common forms of regression use slightly different procedures to estimate alternative location parameters or estimate the conditional expectation across a broader collection of non-linear models.

In mathematics and computing, the Levenberg–Marquardt algorithm, also known as the damped least-squares (DLS) method, is used to solve non-linear least squares problems. These minimization problems arise especially in least squares curve fitting. The LMA interpolates between the Gauss–Newton algorithm (GNA) and the method of gradient descent. The LMA is more robust than the GNA, which means that in many cases it finds a solution even if it starts very far off the final minimum. For well-behaved functions and reasonable starting parameters, the LMA tends to be slower than the GNA. LMA can also be viewed as Gauss–Newton using a trust region approach.

Weighted least squares (WLS), also known as weighted linear regression, is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

In statistics, generalized least squares (GLS) is a method used to estimate the unknown parameters in a linear regression model. It is used when there is a non-zero amount of correlation between the residuals in the regression model. GLS is employed to improve statistical efficiency and reduce the risk of drawing erroneous inferences, as compared to conventional least squares and weighted least squares methods. It was first described by Alexander Aitken in 1935.

The cross-entropy (CE) method is a Monte Carlo method for importance sampling and optimization. It is applicable to both combinatorial and continuous problems, with either a static or noisy objective.

Uncertainty quantification (UQ) is the science of quantitative characterization and estimation of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if the speed was exactly known, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense.

Bayesian linear regression is a type of conditional modeling in which the mean of one variable is described by a linear combination of other variables, with the goal of obtaining the posterior probability of the regression coefficients and ultimately allowing the out-of-sample prediction of the regressandconditional on observed values of the regressors. The simplest and most widely used version of this model is the normal linear model, in which $given is distributed Gaussian. In this model, and under a particular choice of prior probabilities for the parameters—so-called conjugate priors—the posterior can be found analytically. With more arbitrarily chosen priors, the posteriors generally have to be approximated.$

Probabilistic design is a discipline within engineering design. It deals primarily with the consideration and minimization of the effects of random variability upon the performance of an engineering system during the design phase. Typically, these effects studied and optimized are related to quality and reliability. It differs from the classical approach to design by assuming a small probability of failure instead of using the safety factor. Probabilistic design is used in a variety of different applications to assess the likelihood of failure. Disciplines which extensively use probabilistic design principles include product design, quality control, systems engineering, machine design, civil engineering and manufacturing.

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ( $).$

Linear least squares (LLS) is the least squares approximation of linear functions to data. It is a set of formulations for solving statistical problems involved in linear regression, including variants for ordinary (unweighted), weighted, and generalized (correlated) residuals. Numerical methods for linear least squares include inverting the matrix of the normal equations and orthogonal decomposition methods.

Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions.

Subset simulation is a method used in reliability engineering to compute small failure probabilities encountered in engineering systems. The basic idea is to express a small failure probability as a product of larger conditional probabilities by introducing intermediate failure events. This conceptually converts the original rare-event problem into a series of frequent-event problems that are easier to solve. In the actual implementation, samples conditional on intermediate failure events are adaptively generated to gradually populate from the frequent to rare event region. These 'conditional samples' provide information for estimating the complementary cumulative distribution function (CCDF) of the quantity of interest, covering the high as well as the low probability regions. They can also be used for investigating the cause and consequence of failure events. The generation of conditional samples is not trivial but can be performed efficiently using Markov chain Monte Carlo (MCMC).

In statistics, linear regression is a model that estimates the linear relationship between a scalar response and one or more explanatory variables. A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression. This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.

In regression analysis, an interval predictor model (IPM) is an approach to regression where bounds on the function to be approximated are obtained. This differs from other techniques in machine learning, where usually one wishes to estimate point values or an entire probability distribution. Interval Predictor Models are sometimes referred to as a nonparametric regression technique, because a potentially infinite set of functions are contained by the IPM, and no specific distribution is implied for the regressed variables.

The hyperbolastic functions, also known as hyperbolastic growth models, are mathematical functions that are used in medical statistical modeling. These models were originally developed to capture the growth dynamics of multicellular tumor spheres, and were introduced in 2005 by Mohammad Tabatabai, David Williams, and Zoran Bursac. The precision of hyperbolastic functions in modeling real world problems is somewhat due to their flexibility in their point of inflection. These functions can be used in a wide variety of modeling problems such as tumor growth, stem cell proliferation, pharma kinetics, cancer growth, sigmoid activation function in neural networks, and epidemiological disease progression or regression.

References

1 2 Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.
↑ de Angelis, Marco; Patelli, Edoardo; Beer, Michael (2015). "Advanced Line Sampling for efficient robust reliability analysis". Structural Safety. 52: 170–182. doi:10.1016/j.strusafe.2014.10.002. ISSN 0167-4730.
↑ Zio, E; Pedroni, N (2009). "Subset simulation and line sampling for advanced Monte Carlo reliability analysis". Reliability, Risk, and Safety. doi:10.1201/9780203859759.ch94 (inactive 2024-11-12). ISBN 978-0-415-55509-8.{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)
↑ De Angelis, Marco (2015). Efficient Random Set Uncertainty Quantification by means of Advanced Sampling Techniques (Ph.D.). University of Liverpool.
↑ Patelli, E; de Angelis, M (2015). "Line sampling approach for extreme case analysis in presence of aleatory and epistemic uncertainties". Safety and Reliability of Complex Engineered Systems. pp. 2585–2593. doi:10.1201/b19094-339 (inactive 2024-11-12). ISBN 978-1-138-02879-1.{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)
↑ Patelli, Edoardo (2016). "COSSAN: A Multidisciplinary Software Suite for Uncertainty Quantification and Risk Management". Handbook of Uncertainty Quantification. pp. 1–69. doi:10.1007/978-3-319-11259-6_59-1. ISBN 978-3-319-11259-6.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[Schueller-1] 1 2 Schueller, G. I.; Pradlwarter, H. J.; Koutsourelakis, P. (2004). "A critical appraisal of reliability estimation procedures for high dimensions". Probabilistic Engineering Mechanics. 19 (4): 463–474. doi:10.1016/j.probengmech.2004.05.004.

[de_AngelisPatelli2015-2] Angelis, Marco; Patelli, Edoardo; Beer, Michael (2015). "Advanced Line Sampling for efficient robust reliability analysis". Structural Safety. 52: 170–182. doi:10.1016/j.strusafe.2014.10.002. ISSN 0167-4730.

[ZioPedroni2009-3] Zio, E; Pedroni, N (2009). "Subset simulation and line sampling for advanced Monte Carlo reliability analysis". Reliability, Risk, and Safety. doi:10.1201/9780203859759.ch94 (inactive 2024-11-12). ISBN 978-0-415-55509-8.{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)

[4] De Angelis, Marco (2015). Efficient Random Set Uncertainty Quantification by means of Advanced Sampling Techniques (Ph.D.). University of Liverpool.

[Patellide_Angelis2015-5] Patelli, E; de Angelis, M (2015). "Line sampling approach for extreme case analysis in presence of aleatory and epistemic uncertainties". Safety and Reliability of Complex Engineered Systems. pp. 2585–2593. doi:10.1201/b19094-339 (inactive 2024-11-12). ISBN 978-1-138-02879-1.{{cite book}}: CS1 maint: DOI inactive as of November 2024 (link)

[Patelli2015-6] Patelli, Edoardo (2016). "COSSAN: A Multidisciplinary Software Suite for Uncertainty Quantification and Risk Management". Handbook of Uncertainty Quantification. pp. 1–69. doi:10.1007/978-3-319-11259-6_59-1. ISBN 978-3-319-11259-6.

[1]

[2]

[3]

[4]

[5]

[6]

Line sampling

Contents

Mathematical approach

Industrial application

See also

Related Research Articles

References