5 OpenTURNS' methods for Step C': ranking uncertainty sources / sensitivity analysis

Ranking methods can be used to analyse the respective importance of each uncertainty source with respect to a probabilistic criterion. OpenTURNS proposes ranking methods for two probabilistic criteria defined in the [global methodology guide] : probabilist criterion on central dispersion (expectation and variance), probability of exceeding a threshold / failure probability.

5.1 Probabilistic criteria

5.1.1 Central dispersion probabilistic criterion

Each propagation method available for this criterion (see step C) leads to one or several ranking methods.

5.1.2 Probability of exceeding a threshold / failure probability


5.2 Methods description

5.2.1 Step C'  – Importance Factors derived from Taylor Variance Decomposition Method

Mathematical description

Goal

The importance factors derived from a quadratic combination method are defined to discriminate the influence of the different inputs towards the output variable for central dispersion analysis.

Principles

The importance factors are derived from the following expression. It can be shown by Taylor expansion of the output variable z (n Z =1) around x ̲=μ ̲ X and computation of the variance that :

Var Zh(μ ̲ X ). Cov X ̲. t h(μ ̲ X )

which can be re written :

1 i=1 n X h(μ ̲ X ) X i × j=1 n X h(μ ̲ X ) x j .( Cov X ̲) ij Var Y 1 + 2 +...+ n X

Vectorial definition

̲=h(μ ̲ X )× Cov X ̲. t h(μ ̲ X ) Var Z

Scalar definition

i =h(μ ̲ X ) x i × j=1 n X h(μ ̲ X ) x j .( Cov X ̲) ij Var Y

where:

Interpretation of the importance factors

Let us note that this interpretation supposes that (X i ) i are independent.

Each coefficient h(x ̲) x i is a linear estimate of the number of units change in the variable y=h(x ̲) as a result of a unit change in the variable x i . This first term depends on the physical units of the variables and is meaningful only when the units of the model are known. In the general case, as the variables have different physical units, it is not possible to compare these sensitivities h(x ̲) x i the one with the others. This is the reason why the importance factor used within OpenTURNS are normalized factors. These factors enable to make the results comparable independently of the original units of the inputs of the model. The second term j=1 n X h(μ ̲ X ) x j .( Cov X ̲) ij Var Z is the renormalization factor.

To summarize, the coefficients ( i ) i=1,...,n X represent a linear estimate of the percentage change in the variable z=h(x ̲) caused by one percent change in the variable x i . The importance factors are independent of the original units of the model, and are comparable with each other.

Other notations

Importance Factors derived from Perturbation Methods

Link with OpenTURNS methodology

These computations are part of the step C' of the global methodology. It requires to have performed the steps A, B and C.

References and theoretical basics

The computation of these importance factors enables to rank the influence of the input variables towards the output variable. These factors are computed 'near' the mean value of the output. Thus, it should not be used to evaluate the importance of the input variable around the tail of the output distribution (high level quantile for example).

Examples


5.2.2 Step C'  – Uncertainty ranking using Pearson's correlation

Mathematical description

Goal

This method deals with analysing the influence the random vector X ̲=X 1 ,...,X n X has on a random variable Y j which is being studied for uncertainty. Here we attempt to measure linear relationships that exist between Y j and the different components X i .

Principle

Pearson's correlation coefficient ρ Y j ,X i , defined in [Pearson's Coefficient] , measures the strength of a linear relation between two random variables Y j and X i . If we have a sample made up of N pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ), ..., (y N j ,x N i ), we can obtain ρ ^ Y j ,X i an estimation of Pearson's coefficient. The hierarchical ordering of Pearson's coefficients is of interest in the case where the relationship between Y j and n X variables X 1 ,...,X n X is close to being a linear relation:

Y j a 0 + i=1 n X a i X i

To obtain an indication of the role played by each X i in the dispersion of Y j , the idea is to estimate Pearson's correlation coefficient ρ ^ X i ,Y j for each i. One can then order the n X variables X 1 ,...,X n X taking absolute values of the correlation coefficients: the higher the value of ρ ^ X i ,Y j the greater the impact the variable X i has on the dispersion of Y j .

Other notations

-

Link with OpenTURNS methodology

After a propagation of uncertainty (step C) using [Standard Monte Carlo] simulation, a hierarchy of sources of uncertainty can be obtained using Pearson's correlation coefficients. In fact, the N simulations enable the pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ),..., (y N j ,x N i ) to be generated, where:
  • X ̲=X 1 ,...,X n describes the input vector specified in step A "Specifying Criteria and the Case Study",

  • Y j describes a variable of interest or output variable defined in the same step.

The results produced as output of this method are the estimated Pearson's correlation coefficients ρ ^ X i ,Y j that the user may use, taking absolute values, to order the variables X i hierarchically.

References and theoretical basics
This method of uncertainty ranking is particularly useful:
  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values,

  • when the relationships between Y j and each of the components of X ̲ are close to linear relationships (so that Pearson's correlation coefficient can be interpreted),

  • when this linear relationship is close to Y j =a 0 + i=1 n X a i X i (i.e. no product terms of the type X i X j ), and when the components of vector X ̲ are statistically independent. If this is not the case, ρ ^ X i ,Y j reflects not only the influence of X i on Y j but equally the influence of other variables X j related to X i (e.g. an unimportant variable X i could have a strong coefficient for the correlation with Y j only because it is related – statistically or by a product term – to another variable X j which has enormous impact on Y j ).

Readers interested in other methods of uncertainty ranking that can be applied after Monte-Carlo simulation when the assumptions of linearity and/or independence are violated are also referred to [Uncertainty ranking using Spearman] , [Hierarchical Ordering using SRC] , [Uncertainty ranking with Pearson's Partial Correlation Coefficients] and [Uncertainty ranking using Spearman's Partial Correlation Coefficients] .

The following references provide an interesting bibliographic starting point to further study of the method described here:

  • Saltelli, A., Chan, K., Scott, M. (2000). "Sensitivity Analysis", John Wiley & Sons publishers, Probability and Statistics series

  • J.C. Helton, F.J. Davis (2003). "Latin Hypercube sampling and the propagation of uncertainty analyses of complex systems". Reliability Engineering and System Safety 81, p.23-69

  • J.P.C. Kleijnen, J.C. Helton (1999). "Statistical analyses of scatterplots to identify factors in large-scale simulations, part 1 : review and comparison of techniques". Reliability Engineering and System Safety 65, p.147-185


5.2.3 Step C'  – Uncertainty ranking using Spearman's correlation

Mathematical description

Goal

This method deals with analyzing the influence the random vector X ̲=X 1 ,...,X n X has on a random variable Y j which is being studied for uncertainty. Here we attempt to measure monotonic relationships that exist between Y j and the different components X i .

Principle

Spearman's correlation coefficient ρ Y j ,X i S , defined in [Spearman's Coefficient] , measures the strength of a monotonic relation between two random variables Y j and X i . If we have a sample made up of N pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ), ..., (y N j ,x N i ), we can obtain ρ ^ Y j ,X i S an estimation of Spearman's coefficient.

Hierarchical ordering using Spearman's coefficients deals with the case where the variable Y j monotonically depends on the n X variables X 1 ,...,X n X . To obtain an indication of the role played by each X i in the dispersion of Y j , the idea is to estimate the Spearman correlation coefficients ρ ^ X i ,Y j S for each i. One can then order the n X variables X 1 ,...,X n X taking absolute values of the Spearman coefficients: the higher the value of ρ ^ X i ,Y j S , the greater the impact the variable X i has on the dispersion of Y j .

Other notations

Link with OpenTURNS methodology

After a propagation of uncertainty (step C) using [Standard Monte Carlo] simulation, a hierarchy of sources of uncertainty can be obtained using Spearman's correlation coefficients. In fact, the N simulations enable the pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ),..., (y N j ,x N i ) to be generated, where:
  • X ̲=X 1 ,...,X n describes the input vector specified in step A "Specifying Criteria and the Case Study",

  • Y j describes the final variable of interest or output variable defined in the same step.

The results produced as output of this method are the estimated Spearman's correlation coefficients ρ ^ X i ,Y j S that the user may use, taking absolute values, to order the variables X i hierarchically.

References and theoretical basics
This method of hierarchical ordering is particularly useful:
  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values,

  • when the relationships between Y j and each of the components of X ̲ are monotonic relationships (so that Spearman's correlation coefficient can be interpreted),

  • when the components of vector X ̲ are statistically independent. If this is not the case, ρ ^ X i ,Y j S reflects not only the influence of X i on Y j but equally the influence of other variables X j related to X i (e.g. an unimportant variable X i could have a strong coefficient for the correlation with Y j only because it is related to another variable X j which has enormous impact on Y j ).

Readers interested in other methods of uncertainty ranking that can be applied after Monte-Carlo simulation when the assumptions of independence are violated are also referred to [Uncertainty ranking using SRC] , [Uncertainty ranking with Pearson's Partial Correlation Coefficients] and [Uncertainty ranking using Spearman's Partial Correlation Coefficients] .

The following references provide an interesting bibliographic starting point to further study of the method described here:

  • Saltelli, A., Chan, K., Scott, M. (2000). "Sensitivity Analysis", John Wiley & Sons publishers, Probability and Statistics series

  • J.C. Helton, F.J. Davis (2003). "Latin Hypercube sampling and the propagation of uncertainty analyses of complex systems". Reliability Engineering and System Safety 81, p.23-69

  • J.P.C. Kleijnen, J.C. Helton (1999). "Statistical analyses of scatterplots to identify factors in large-scale simulations, part 1 : review and comparison of techniques". Reliability Engineering and System Safety 65, p.147-185


5.2.4 Step C'  – Uncertainty Ranking using Standard Regression Coefficients

Mathematical description

Goal

This method deals with analysing the influence the random vector X ̲=X 1 ,...,X n X has on a random variable Y j which is being studied for uncertainty. Here we attempt to measure linear relationships that exist between Y j and the different components X i .

Principle

The principle of the multiple linear regression model (see [Linear Regression] for more details) consists of attempting to find the function that links the variable Y j to the n x variables X 1 ,...,X n X by means of a linear model:

Y j =a 0 j + i=1 n X a i j X i +ε j

where ε j describes a random variable with zero mean and standard deviation σ ε j independent of the input variables X i . If the random variables X 1 ,...,X n X are independent and with finite variance Var X k =(σ k ) 2 , the variance of Y j can be written as follows:

Var Y j = i=1 n (a i j ) 2 Var X i +(σ ε j ) 2

The estimators for the regression coefficients a 0 j ,...,a n X j , and the standard deviation σ j are obtained from a sample of (Y j ,X 1 ,...,X n X ). Uncertainty ranking by linear regression ranks the n X variables X 1 ,...,X n X in terms of the estimated contribution of each X k to the variance of Y j :

C k j =(a k j ) 2 Var X k Var Y j

which is estimated by :

C ^ k j =(a ^ k j ) 2 σ ^ k 2 i=1 n X (a i j ) 2 σ ^ i 2 +(σ ^ ε j ) 2

where σ ^ i describes the empirical standard deviation of the sample of the input variables. This estimated contribution is by definition between 0 and 1. The closer it is to 1, the greater the impact the variable X i has on the dispersion of Y j .

Other notations
The contribution to the variance C i is sometimes described in the literature as the "importance factor", because of the similarity between this approach to linear regression and the method of cumulative variance quadratic which uses the term importance factor (see [Quadratic combination – Perturbation method] and [Importance Factors] ).

Link with OpenTURNS methodology

After a propagation of uncertainty (step C) using [Standard Monte Carlo] simulation, a hierarchy of sources of uncertainty can be obtained using Linear Regression. In fact, the N simulations enable the pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ),..., (y N j ,x N i ) to be generated, where:
  • X ̲=X 1 ,...,X n describes the input vector specified in step A "Specifying Criteria and the Case Study",

  • Y j describes the final variable of interest or output variable defined in the same step.

The results produced as output of this method are the estimated variance contributions C ^ i that the user may use to order the variables X i hierarchically.

References and theoretical basics
This method of hierarchical ordering is particularly useful:
  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values, item when the relationships between Y j and the components of X ̲ are close to linear relationships, and more generally when all the underlying assumptions of the multiple linear regression model are valid,

  • when the components of vector X ̲ are independent, because if this is not the case the decomposition of the variance of Y j given here would be no longer exact,

  • when the number N of Monte-Carlo simulations is significantly higher than the number n X of input random variables (it is preferable to have N/n X at least greater by a factor of 10 so that the estimation of the n X correlation coefficients provides a reasonable picture of reality).

Readers interested in the assumptions made for multiple linear regression models and in the tests needed to validate these assumptions are referred to [Linear Regression] .

Other methods of uncertainty ranking can be applied after Monte-Carlo simulation, requiring a lesser number N of simulations or that can deal with non-linear/non-independent cases, are described in [Uncertainty Ranking using Pearson] , [Uncertainty Ranking using Spearman] , [Uncertainty Ranking using Pearson's Partial Correlation Coefficients] and [Uncertainty Ranking using Pearson's Partial Correlation Coefficients] .

The following references provide an interesting bibliographic starting point to further study of the method described here:

  • Saltelli, A., Chan, K., Scott, M. (2000). "Sensitivity Analysis", John Wiley & Sons publishers, Probability and Statistics series

  • J.C. Helton, F.J. Davis (2003). "Latin Hypercube sampling and the propagation of uncertainty analyses of complex systems". Reliability Engineering and System Safety 81, p.23-69

  • J.P.C. Kleijnen, J.C. Helton (1999). "Statistical analyses of scatterplots to identify factors in large-scale simulations, part 1 : review and comparison of techniques". Reliability Engineering and System Safety 65, p.147-185


5.2.5 Step C'  – Uncertainty Ranking using Pearson's Partial Correlation Coefficients

Mathematical description

Goal

This method deals with analyzing the influence the random vector X ̲=X 1 ,...,X n X has on a random variable Y j which is being studied for uncertainty. Here we attempt to measure linear relationships that exist between Y j and the different components X i .

Principle

The basic method of hierarchical ordering using Pearson's coefficients (see [Uncertainty Ranking using Pearson] ) deals with the case where the variable Y j linearly depends on n X variables X 1 ,...,X n X but this can be misleading when statistical dependencies or interactions between the variables X i (e.g. a crossed term X i ×X j ) exist. In such a situation, the partial correlation coefficients can be more useful in ordering the uncertainty hierarchically: the partial correlation coefficients PCC X i ,Y j between the variables Y j and X i attempts to measure the residual influence of X i on Y j once influences from all other variables X j have been eliminated.

The estimation for each partial correlation coefficient PCC X i ,Y j uses a set made up of N values (y 1 j ,x 1 1 ,...,x 1 n X ),...,(y N j ,x N 1 ,...,x N n X ) of the vector (Y j ,X 1 ,...,X n X ). This requires the following three steps to be carried out:

  1. Determine the effect of other variables X j ,ji on Y j by linear regression (see [Linear Regression] ); when the values of variable X j ,ji are known, the average forecast for the value of Y j is then available in the form of the equation:

    Y j ^= ki,1kn X a ^ k X k
  2. Determine the effect of other variables X j ,ji on X i by linear regression; when the values of variable X j ,ji are known, the average forecast for the value of Y j is then available in the form of the equation:

    X ^ i = ki,1kn X b ^ k X k
  3. PCC X i ,Y j is then equal to the Pearson's correlation coefficient ρ ^ Y j -Y j ^,X i -X ^ i estimated for the variables Y j -Y j ^ and X i -X ^ i on the N-sample of simulations (see [Pearson's Coefficient] ).

One can then class the n X variables X 1 ,...,X n X according to the absolute value of the partial correlation coefficients: the higher the value of PCC X i ,Y j , the greater the impact the variable X i has on Y j .

Other notations

-

Link with OpenTURNS methodology

After a propagation of uncertainty (step C) using [Standard Monte Carlo] simulation, a hierarchy of sources of uncertainty can be obtained Partial Pearson's Correlation Coefficients. In fact, the N simulations enable the pairs (y 1 j ,x 1 i ), (y 2 j ,x 2 i ),..., (y N j ,x N i ) to be generated, where:
  • X ̲=X 1 ,...,X n describes the input vector specified in step A "Specifying Criteria and the Case Study",

  • Y j describes the final variable of interest or output variable defined in the same step.

The results produced as output of this method are Pearson's partial correlation coefficients PCC X i ,Y j , that the user may use, taking absolute values, to order the variables X i hierarchically.

References and theoretical basics
This method of hierarchical ordering is particularly useful:
  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values,

  • when the relationships between Y j and each of the components of X ̲ are close to linear relationships (so that Pearson's correlation coefficient can be interpreted),

  • when the number N of Monte-Carlo simulations is significantly higher than the number n X of input random variables (it is preferable to have N/n X at least greater than a factor of 10 so that the estimation of the n X partial correlation coefficients provides a reasonable picture of reality).

Readers interested in the assumptions made for multiple linear regression models and in the tests needed to validate these assumptions are referred to [Linear Regression] .

Other methods of uncertainty ranking can be applied after Monte-Carlo simulation, requiring a lesser number N of simulations or that can treat non-linear cases, are described in [Uncertainty Ranking using Pearson] , [Uncertainty ranking using Spearman] , and [Uncertainty Ranking using Spearman's Partial Correlation Coefficients] .

The following references provide an interesting bibliographic starting point to further study of the method described here:

  • Saltelli, A., Chan, K., Scott, M. (2000). "Sensitivity Analysis", John Wiley & Sons publishers, Probability and Statistics series

  • J.C. Helton, F.J. Davis (2003). "Latin Hypercube sampling and the propagation of uncertainty analyses of complex systems". Reliability Engineering and System Safety 81, p.23-69

  • J.P.C. Kleijnen, J.C. Helton (1999). "Statistical analyses of scatterplots to identify factors in large-scale simulations, part 1 : review and comparison of techniques". Reliability Engineering and System Safety 65, p.147-185


5.2.6 Step C'  – Uncertainty Ranking using Partial Rank Correlation Coefficients

Mathematical description

Goal

This method deals with analyzing the influence the random vector X ̲=X 1 ,...,X n X has on the random variable Y j which is being studied for uncertainty. Here we attempt to measure monotonic relationships that exist between Y j and the different components X i .

Principle

The basic method of hierarchical ordering using Spearman's coefficients (see [Uncertainty Ranking using Spearman] ) deals with the case where the variable Y j monotonically depends on n X variables X 1 ,...,X n X but this can be misleading when statistical dependencies between the variables X i exist. In such a situation, the partial rank correlation coefficients can be more useful in ordering the uncertainty hierarchically: the partial rank correlation coefficients PRCC X i ,Y j between the variables Y j and X i attempts to measure the residual influence of X i on Y j once influences from all other variables X j have been eliminated.

The estimation for each partial rank correlation coefficient PRCC X i ,Y j uses a set made up of N values (y j 1,x 1 1 ,...,x 1 n X ),...,(y j N,x N 1 ,...,x N n X ) of the vector (Y j ,X 1 ,...,X n X ). This requires the following three steps to be carried out:

  1. Determine the effect of other variables X j ,ji on Y j by linear regression (see [Linear Regression] ); when the values of variable X j ,ji are known, the average forecast for the value of Y j is then available in the form of the equation:

    Y j ^= ki,1kn X a ^ k X k
  2. Determine the effect of other variables X j ,ji on X i by linear regression; when the values of variable X j ,ji are known, the average forecast for the value of Y j is then available in the form of the equation:

    X ^ i = ki,1kn X b ^ k X k
  3. PRCC X i ,Y j is then equal to the Spearman's correlation coefficient ρ ^ Y j -Y j ^,X i -X ^ i S estimated for the variables Y j -Y j ^ and X i -X ^ i on the N-sample of simulations (see [Spearman's Coefficient] ).

One can then class the n X variables X 1 ,...,X n X according to the absolute value of the partial rank correlation coefficients: the higher the value of PRCC X i ,Y j , the greater the impact the variable X i has on Y j .

Other notations
-

Link with OpenTURNS methodology

After a propagation of uncertainty (step C) using [Standard Monte Carlo] simulation, a hierarchy of sources of uncertainty can be obtained Partial Rank Correlation Coefficients. In fact, the N simulations enable the pairs (y j 1,x 1 i ), (y j 2,x 2 i ),..., (y j N,x N i ) to be generated, where:
  • X ̲=X 1 ,...,X n describes the input vector specified in step A "Specifying Criteria and the Case Study",

  • Y j describes the final variable of interest or output variable defined in the same step.

The results produced as output of this method are partial rank correlation coefficients PRCC X i ,Y j , that the user may use, taking absolu

References and theoretical basics
This method of hierarchical ordering is particularly useful:
  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values,

  • when the relationships between Y j and each of the components of X ̲ are monotonic relationships (so that Spearman's correlation coefficient can be interpreted),

  • when the number N of Monte-Carlo simulations is significantly higher than the number n X of input random variables (it is preferable to have N/n X at least greater than a factor of 10 so that the estimation of the n X partial rank correlation coefficients provides a reasonable picture of reality).

Readers interested in the assumptions made for multiple linear regression models and in the tests needed to validate these assumptions are referred to [Linear Regression] .

Other methods of uncertainty ranking can be applied after Monte-Carlo simulation, requiring a lesser number N of simulations, are described in [Uncertainty Ranking using Pearson] , [Uncertainty ranking using Spearman] .

The following references provide an interesting bibliographic starting point to further study of the method described here:

  • Saltelli, A., Chan, K., Scott, M. (2000). "Sensitivity Analysis", John Wiley & Sons publishers, Probability and Statistics series

  • J.C. Helton, F.J. Davis (2003). "Latin Hypercube sampling and the propagation of uncertainty analyses of complex systems". Reliability Engineering and System Safety 81, p.23-69

  • J.P.C. Kleijnen, J.C. Helton (1999). "Statistical analyses of scatterplots to identify factors in large-scale simulations, part 1 : review and comparison of techniques". Reliability Engineering and System Safety 65, p.147-185


5.2.7 Step C'  – Sensivity analysis using Sobol indices

Mathematical description

Goal

This method deals with analysing the influence the random vector X ̲=X 1 ,...,X n X has on a random variable Y k which is being studied for uncertainty. Here we attempt to evaluate the part of variance of Y k due to the different components X i .

Principle

The estimators for the mean of m Y j and the standard deviation σ of Y k can be obtained from a first sample, as Sobol indices estimation requires two samples of the input variables : (X 1 ,...,X n X ), that is two sets of N vectors of dimension n X (x 11 (1) ,...,x 1n X ) (1) ,...,(x N 1 (1) ,...,x Nn X (1) ) and (x 11 (2) ,...,x 1n X ) (2) ,...,(x N 1 (2) ,...,x Nn X (2) )

The estimation of sensivity indices for first order consists in estimating the quantity

V i = Var 𝔼Y k |X i =𝔼𝔼Y k |X i 2 -𝔼𝔼Y k |X i 2 =U i -𝔼Y k 2

Sobol proposes to estimate the quantity U i =𝔼𝔼Y k |X i 2 by swaping every variables in the two samples except the variable X i between the two calls of the function :

U ^ i =1 N k=1 N Y k x k1 (1) ,,x k(i-1) (1) ,x ki (1) ,x k(i+1) (1) ,,x kn X (1) ×Y k x k1 (2) ,,x k(i-1) (2) ,x ki (1) ,x k(i+1) (2) ,,x kn X (2)

Then the n X first order indices are estimated by

S ^ i =V ^ i σ ^ 2 =U ^ i -m Y k 2 σ ^ 2

For the second order, the two variables X i and X j are not swapped to estimate U ij , and so on for higher orders, assuming that order <n X . Then the n X 2 second order indices are estimated by

S ^ ij =V ^ ij σ ^ 2 =U ^ ij -m Y k 2 -V ^ i -V ^ j σ ^ 2

For the n X total order indices T i , we only swap the variable X i between the two samples.

Other notations

Link with OpenTURNS methodology

The results produced as output of this method are the estimated relative (indices values belong to 0;1 ) variance contributions of subsets of variables S ^ i ,S ^ ij ,T ^ i that the user may use to order the variables X i hierarchically.

This method of hierarchical ordering is particularly useful :

  • when the study of uncertainty deals with the central dispersion of the variable of interest Y j and not with its extreme values.

  • when we have no particular hypothesis on the model other than the independance of the input variables X i .

  • when the size N of both samples is high enough to provide a 'reasonable' picture of reality (the law of large numbers assures this method will show a N -1 2 convergence order).

References and theoretical basics

The following references provide an interesting bibliographic starting point to further study of the method described here:
  • Saltelli, A. (2002). “Making best use of model evaluations to compute sensitivity indices", Computer Physics Communication, 145, 580-297

Examples


5.2.8 Step C'  – Sensivity analysis for models with correlated inputs

Mathematical description

Goal

The ANCOVA (ANalysis of COVAriance) method, is a variance-based method generalizing the ANOVA (ANalysis Of VAriance) decomposition for models with correlated input parameters.

Principle

Let us consider a model Y=h(X ̲) without making any hypothesis on the dependence structure of X ̲={X 1 ,...,X n X }, a n X -dimensional random vector. The covariance decomposition requires a functional decomposition of the model. Thus the model response Y is expanded as a sum of functions of increasing dimension as follows:

h(X ̲)=h 0 + u{1,,n X } h u (X u ) (138)

h 0 is the mean of Y. Each function h u represents, for any non empty set u{1,,n X }, the combined contribution of the variables X u to Y.

Using the properties of the covariance, the variance of Y can be decomposed into a variance part and a covariance part as follows:

Var[Y]=Covh 0 + u{1,,n X } h u (X u ),h 0 + u{1,,n} h u (X u )= u{1,,n X } Covh u (X u ), u{1,,n X } h u (X u )= u{1,,n X } Var[h u (X u )]+Cov[h u (X u ), v{1,,n X },vu= h v (X v )]

The total part of variance of Y due to X u reads:

S u =Cov[Y,h u (X u )] Var[Y]

The variance formula described above enables to define each sensitivity measure S u as the sum of a 𝑝ℎ𝑦𝑠𝑖𝑐𝑎𝑙 (or 𝑢𝑛𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑) part and a 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑒𝑑 part such as:

S u =S u U +S u C

where S u U is the uncorrelated part of variance of Y due to X u :

S u U =Var[h u (X u )] Var[Y]

and S u C is the contribution of the correlation of X u with the other parameters:

S u C =Cov[h u (X u ), v{1,,n X },vu= h v (X v )] Var[Y]

As the computational cost of the indices with the numerical model h can be very high, it is suggested to approximate the model response with a polynomial chaos expansion. However, for the sake of computational simplicity, the latter is constructed considering 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 components {X 1 ,,X n X }. Thus the chaos basis is not orthogonal with respect to the correlated inputs under consideration, and it is only used as a metamodel to generate approximated evaluations of the model response and its summands in Eq. (138).

Yh ^= j=0 P-1 α j Ψ j (x)

Then one may identify the component functions. For instance, for u={1}:

h 1 (X 1 )= α|α 1 0,α i1 =0 y α Ψ α (X ̲)

where α is a set of degrees associated to the n X univariate polynomial ψ i α i (X i ).

Then the model response Y is evaluated using a sample X={x k ,k=1,,N} of the correlated joint distribution. Finally, the several indices are computed using the model response and its component functions that have been identified on the polynomial chaos.

Other notations

Link with OpenTURNS methodology

The ANCOVA method is a generalization of the well-established Sobol sensitivity indices when the input parameters of the model are correlated. The Sobol indices measure the contribution of the input variables X i to the variance of the output Y. The ANCOVA decomposition allows one to distinguish which part of this contribution is due to the variable itself and which part is due to its correlation with the other input parameters. So if a variable has a high contribution, this method enables to know if it is due to its physical role in the model h or because it is correlated with variables with high contributions.
References and theoretical basics
The following reference provides more details on the ANCOVA method:
  • Caniou, Y. (2012). "Global sensitivity analysis for nested and multiscale modelling." PhD thesis. Blaise Pascal University-Clermont II, France.

Examples


5.2.9 Step C'  – Sensivity analysis by Fourier decomposition

Mathematical description

Goal

FAST is a sensitivity analysis method which is based upon the ANOVA decomposition of the variance of the model response y=f(X ̲), the latter being represented by its Fourier expansion. X ̲={X 1 ,,X n X } is an input random vector of n X independent components.

Principle

OpenTURNS implements the extended FAST method consisting in computing alternately the first order and the total-effect indices of each input. This approach relies upon a Fourier decomposition of the model response. Its key idea is to recast this representation as a function of a 𝑠𝑐𝑎𝑙𝑎𝑟 parameter s, by defining parametric curves sx i (s), i=1,,n X exploring the support of the input random vector X ̲.

For each input, the same procedure is realized in three steps:

  • Sampling:

    Deterministic space-filling paths with random starting points are defined, i.e. each input X i is transformed as follows:

    x j i =1 2+1 πarcsin(sin(ω i s j +φ i )),i=1,,n X ,j=1,,N

    where n X is the number of input variables. N is the length of the discretization of the s-space, with s varying in (-π,π) by step of 2π/N. φ i is a random phase-shift chosen uniformly in [0,2π] which enables to make the curves start anywhere within the unit hypercube K n X =(X ̲|0x i 1;i=1,,n X ). The selection of the set {φ 1 ,,φ n X } induces a part of randomness in the procedure. So it can be asked to realize the procedure Nr times and then to calculate the arithmetic means of the results over the Nr estimates. This operation is called 𝑟𝑒𝑠𝑎𝑚𝑝𝑙𝑖𝑛𝑔.

    {ω i },i=1,,n X is a set of integer frequencies assigned to each input X i . The frequency associated with the input of interest is set to the maximum admissible frequency satisfying the Nyquist criterion (which ensures to avoid aliasing effects):

    ω i =N-1 2M

    with M the interference factor usually equal to 4 or higher. It corresponds to the truncation level of the Fourier series, i.e. the number of harmonics that are retained in the decomposition realized in the third step of the procedure.

    In the paper by Saltelli et al. (1999), for high sample size, it is suggested that 16ω i /N r 64.

    And the maximum frequency of the complementary set of frequencies is:

    max(ω -i )=ω i 2M=N-1 4M 2

    with the index '-i' which meaning 'all but i'.

    The other frequencies are distributed uniformly between 1 and max(ω -i ). The set of frequencies is the same whatever the number of resamplings is.

    Let us make an example with eight input factors, N=513 and M=4 i.e. ω i =N-1 2M=64 and max(ω -i )=N-1 4M 2 =8 with i the index of the input of interest.

    When computing the sensitivity indices for the first input, the considered set of frequencies is : {64,1,2,3,4,5,6,8}.

    When computing the sensitivity indices for the second input, the considered set of frequencies is : {1,64,2,3,4,5,6,8}.

    etc.

    The transformation defined above provides a uniformly distributed sample for the x i ,i=1,,n X oscillating between 0 and 1. In order to take into account the real distributions of the inputs, we apply an isoprobabilistic transformation on each x i before the next step of the procedure.

  • Simulations:

    Output is computed such as: y=f(s)=f(x 1 (s),,x n X (s))

    Then f(s) is expanded onto a Fourier series:

    f(s)= k N A k cos(ks)+B k sin(ks)

    where A k and B k are Fourier coefficients defined as follows:

    A k =1 2π -π π f(s)cos(ks)dsB k =1 2π -π π f(s)sin(ks)ds

    These coefficients are estimated thanks to the following discrete formulations:

    A ^ k =1 N j=1 N f(x j 1 ,,x j N X )cos2kπ(j-1) N,-N 2kN 2B ^ k =1 N j=1 N f(x j 1 ,,x j N X )sin2kπ(j-1) N,-N 2kN 2
  • Estimations by frequency analysis:

    The first order indices are estimated as follows:

    S ^ i =D ^ i D ^= p=1 M (A ^ pω i 2 +B ^ pω i 2 ) 2 n=1 (N-1)/2 (A ^ n 2 +B ^ n 2 ) 2

    where D ^ is the total variance and D ^ i the portion of D arising from the uncertainty of the i th input. N the size of the sample using to compute the Fourier series and M is the interference factor. Saltelli et al. (1999) recommanded to set M to a value in the range [4,6].

    The total order indices are estimated as follows:

    T ^ i =1-D ^ -i D ^=1- k=1 ω i /2 (A ^ k 2 +B ^ k 2 ) 2 n=1 (N-1)/2 (A ^ n 2 +B ^ n 2 ) 2

    where D ^ -i is the part of the variance due to all the inputs except the i th input.

Other notations

Link with OpenTURNS methodology

The results produced as output of this method are the estimated relative (indices values belong to 0;1 ) variance contributions of subsets of variables S ^ i ,T ^ i that the user may use to order the variables X i hierarchically.

This method of hierarchical ordering is particularly useful :

  • when the problem focuses on the central dispersion of the variable of interest Y j and not on its extreme values.

  • when no particular hypothesis is made on the model other than the independance of the input variables X i .

The extended FAST method is a convenient alternative technique to the method of Sobol'. However the computational cost of the extended FAST method tends to be lower than that of the method of Sobol'. Indeed, FAST estimates both the first-order indices and the total ones with the same set of model evaluations.

References and theoretical basics
The following reference provides more details on the FAST method:
  • Saltelli, A., Tarantola, S. & Chan, K. (1999). "A quantitative, model independent method for global sensitivity analysis of model output." Technometrics, 41(1), 39-56.

Examples


5.2.10 Step C'  – Importance Factors from FORM-SORM methods

Mathematical description

Goal

Importance Factors are evaluated in the following context : X ̲ denotes a random input vector, representing the sources of uncertainties, f X ̲ (x ̲) its joint density probability, d ̲ a determinist vector, representing the fixed variables g(X ̲,d ̲) the limit state function of the model, 𝒟 f ={X ̲ n /g(X ̲,d ̲)0} the event considered here and g(X ̲,d ̲)=0 its boundary (also called limit state surface).

The probability content of the event 𝒟 f is P f :

P f = g(X ̲,d ̲)0 f X ̲ (x ̲)dx ̲. (139)

In this context, the probability P f can often be efficiently estimated by FORM or SORM approximations (refer to [FORM] and [SORM] ).

The FORM importance factors offer a way to rank the importance of the input components with respect the realization of the event. They are often interpreted also as indicators of the impact of modeling the input components as random variables rather than fixed values. The FORM importance factors are defined as follows.

Principle

The isoprobabilistic transformation T used in the FORM and SORM approximation (refer to [Iso Probabilistic Transformation] ) is a diffeomorphism from supp(X ̲) into n , such that the distribution of the random vector U ̲=T(X ̲) has the following properties : U ̲ and R ̲ ̲U ̲ have the same distribution for all rotations R ̲ ̲𝒮𝒫 n ().

In the standard space, the design point u ̲ * is the point on the limit state boundary the nearest to the origin of the standard space. The design point is x ̲ * in the physical space, where x ̲ * =T -1 (u ̲ * ). We note β HL the Hasofer-Lind reliability index : β HL =||u ̲ * ||.

When the U-space is normal, the litterature proposes to calculate the importance factor α i 2 of the variable X i as the square of the co-factors of the design point in the U-space :

α i 2 =(u i * ) 2 β HL 2 (140)

This definition guarantees the relation : Σ i α i 2 =1.

Let's note that this definition arises the following difficulties :

  • Which signification for α i when the variables X i are correlated? In that case, the isoprobabilistic transformation doesn't associate U i to X i but U i to a set of X i .

  • In the case of dependence of the variables X i , the shape of the limit state function in the U-space depends on the isoprobabilistic transformation and in particular on the order of the variables X i within the random vector X ̲. Thus, changing this order has an impact on the localisation of the design point in the U-space and, concequently, on the importance factors ... (see [R. Lebrun, A. Dutfoy, 2008] to compare the different isoprobabilistic transformations).

It is possible to give another definition to the importance factors which may be defined in the elliptical space of the iso-probabilistic transformation, where the marginal distributions are all elliptical, with cumulative distribution function noted E, and not yet decorrelated.

Y * =E -1 F 1 (X 1 * )E -1 F 2 (X 2 * )E -1 F n (X n * ). (141)

The importance factor α i 2 writes:

α i 2 =(y i * ) 2 ||y ̲ * || 2 (142)

This definition still guarantees the relation : Σ i α i 2 =1.

Other notations

Here, the event considered is explicited directly from the limit state function g(X ̲,d ̲) : this is the classical structural reliability formulation.

However, if the event is a threshold exceedance, it is useful to explicite the variable of interest Z=g ˜(X ̲,d ̲), evaluated from the model g ˜(.). In that case, the event considered, associated to the threshold z s has the formulation: 𝒟 f ={X ̲ n /Z=g ˜(X ̲,d ̲)>z s } and the limit state function is : g(X ̲,d ̲)=z s -Z=z s -g ˜(X ̲,d ̲). P f is the threshold exceedance probability, defined as : P f =P(Zz s )= g(X ̲,d ̲)0 f X ̲ (x ̲)dx ̲. Thus, the FORM importance factors offer a way to rank the importance of the input components with respect to the threshold exceedance by the quantity of interest Z. They can be seen as a specific sensitity analysis technique dedicated to the quantity Z around a particular threshold rather than to its variance.

Link with OpenTURNS methodology

Within the global methodology, these importance factors are used in the step C': "Ranking sources of uncertainty" in the case of the evaluation of the probability of an event by an approximation method.

It requires to have fulfilled the following steps beforehand:

  • step A: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Z=g ˜(X ̲,d ̲), result of the model g ˜(); identify a probabilistic criteria such as a threshold exceedance Z>z s or equivalently a failure event g(X ̲,d ̲)0,

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲,

  • step C: select an appropriate optimization algorithm among those proposed to evaluate the event probability : FORM or SORM.

When not specified, OpenTURNS evaluates the importance factors according to relation (140). Otherwise, OpenTURNS evaluates them according to (142).

Note that the relevance of FORM importance factors as a means to rank the importance of the sources of uncertainty is closely dependant on the validity of FORM approximation (refer to [FORM] and [SORM] ).

The sensitivity factors (refer to [Sensitivity Factors] ) indicate the importance on the Hasofer-Lind reliability index (refer to [Reliability Index] ) of the value of the parameters used to define the distribution of the random vector X ̲.

References and theoretical basics

Interesting litterature on the subject is :
  • H.O. Madsen, "Omission Sensitivity Factors," 1988, Structural Safety, 5, 35-45.

  • R. Lebrun, A. Dutfoy, 2008, "Do Rosenblatt and Nataf isoprobabilistic transformations really differ?", submitted to Probabilistic Engineering Mechanics in august 2008, under temptatively accepted so far.

Examples

Let's apply this method to the following analytical example which considers a cantilever beam, of Young's modulus E, length L, section modulus I. We apply a concentrated bending force at the other end of the beam. The vertical displacement y of the extrême end is equal to :
y(E,F,L,I)=FL 3 3EI

The objective is to propagate until y the uncertainties of the variables (E,F,L,I).

The input random vector is X ̲=(E,F,L,I), which probabilistic modelisation is (unity is not provided):

E=Normal(50,1)F=Normal(1,1)L=Normal(10,1)I=Normal(5,1)

The four random variables are independant.

The event considered is the threshold exceedance : 𝒟 f ={(E,F,L,I) 4 /y(E,F,L,I)3}.

The importance factors obtained are :

α E 2 =9.456e -2 %α F 2 =6.959e +1 %α L 2 =1.948e +1 %α I 2 =1.084e +1 %

5.2.11 Step C'  – Sensitivity Factors from FORM method

Mathematical description

Goal

Sensitivity Factors are evaluated under the following context : X ̲ denotes a random input vector, representing the sources of uncertainties, f X ̲ (x ̲) its joint density probability, d ̲ a determinist vector, representing the fixed variables g(X ̲,d ̲) the limit state function of the model, 𝒟 f ={X ̲ n /g(X ̲,d ̲)0} the event considered here and g(X ̲,d ̲)=0 its boundary (also called limit state surface).

The probability content of the event 𝒟 f is P f :

P f = g(X ̲,d ̲)0 f X ̲ (x ̲)dx ̲. (143)

In this context, the probability P f can often be efficiently estimated by FORM or SORM approximations (refer to [FORM] and [SORM] ).

The FORM importance factors offer a way to analyse the sensitivity of the probability the realization of the event with respect to the parameters of the probability distribution of X.

Principle

A sensitivity factor is defined as the derivative of the Hasofer-Lind reliability index with respect to the paramater θ. The paramater θ is a parameter in a distribution of the random vector X ̲.

If θ ̲ represents the vector of all the parameters of the distribution of X ̲ which appear in the definition of the isoprobabilistic transformation T (refer to [IsoProbabiliticFunction] ), and U θ ̲ * the design point associated to the event considered in the U-space, and if the mapping of the limit state function by the T is noted G(U ̲,θ ̲)=g[T -1 (U ̲,θ ̲),d ̲], then the sensitivity factors vector is defined as :

θ ̲ β HL =+1 || θ ̲ G(U θ ̲ * ,d ̲)|| u ̲ G(U θ ̲ * ,d ̲).

The sensitivity factors indicate the importance on the Hasofer-Lind reliability index (refer to [Reliability Index] ) of the value of the parameters used to define the distribution of the random vector X ̲.

Other notations
Here, the event considered is explicited directly from the limit state function g(X ̲,d ̲) : this is the classical structural reliability formulation.

However, if the event is a threshold exceedance, it is useful to explicite the variable of interest Z=g ˜(X ̲,d ̲), evaluated from the model g ˜(.). In that case, the event considered, associated to the threshold z s has the formulation: 𝒟 f ={X ̲ n /Z=g ˜(X ̲,d ̲)>z s } and the limit state function is : g(X ̲,d ̲)=z s -Z=z s -g ˜(X ̲,d ̲). P f is the threshold exceedance probability, defined as : P f =P(Zz s )= g(X ̲,d ̲)0 f X ̲ (x ̲)dx ̲. Thus, the FORM sensitivity factors offer a way to rank the importance of the parameters of the input components with respect to the threshold exceedance by the quantity of interest Z. They can be seen as a specific sensitity analysis technique dedicated to the quantity Z around a particular threshold rather than to its variance.

Link with OpenTURNS methodology

Within the global methodology, sensitivity factors are evaluated in the step C ' : "Ranking sources of uncertainty" in the case of the evaluation of the probability of an event by an approximation method.

It requires to have fulfilled before the following steps:

  • step A: input vector X ̲, final variable of interest (result of a model), probabilistic criteria (the event considered) g(X ̲,d ̲)0,

  • step B: one of the proposed techniques to describe the probabilistic modelisation of the input vector X ̲,

  • step C: one method to evaluate the probability content of the event : the FORM or SORM approximation

References and theoretical basics

The standard version of OpenTURNS takes into account only the sensitivity with respect to the parameters of the distributino of X ̲ which appear in the definition of the isoprobabilistic transformation T. It does not calculate the sensitivity with respect to the other parameters, in particular those of the limit state function d ̲.

The FORM importance factors (refer to [Importance Factors] ) offer a way to rank the importance of the input components with respect the realization of the event. They are often interpreted also as indicators of the impact of modeling the input components as random variables rather than fixed values.

Let's note some usefull references:

  • O. Ditlevsen, H.O. Madsen, 2004, "Structural reliability methods," Department of mechanical engineering technical university of Denmark - Maritime engineering, internet publication.

Examples

Let's apply this method to the following analytical example which considers a cantilever beam, of Young's modulus E, length L, section modulus I. We apply a concentrated bending force at the other end of the beam. The vertical displacement y of the extrême end is equal to :
y(E,F,L,I)=FL 3 3EI

The objective is to propagate until y the uncertainties of the variables (E,F,L,I).

The input random vector is X ̲=(E,F,L,I), which probabilistic modelisation is (unity is not provided):


E=Normal(50,1)F=Normal(1,1)L=Normal(10,1)I=Normal(5,1)

The event considered is the threshold exceedance : 𝒟 f ={(E,F,L,I) 4 /y(E,F,L,I)3}.

If we note μ the mean and σ the standard deviation a the random variable, we obtain the following results, gathered in the following tables.

β HL   μ   σ  
E   0.0307508   -0.000954364  
F   -0.834221   -0.000954364  
L   -0.441319   -0.000954364  
I   0.329191   -0.000954364  
 
P f,FORM   μ   σ  
E   -0.00737194   0.000228791  
F   0.199989   0.000228791  
L   0.105798   0.000228791  
I   -0.0789175   0.000228791  
 

OpenTURNS' methods for Step C: uncertainty propagation  
Table of contents
OpenTURNS' methods for the construction of response surfaces