6 OpenTURNS' methods for the construction of response surfaces

The current section is dedicated to the building of an analytical approximation of the response of a given model. Such an approximation is commonly referred to as a response surface in the literature. This methodology is relevant if each model evaluation is time consuming. Indeed, once a response surface has been built up, the various methods in Step C and Step C' may be applied at a negligible cost. A special focus will be given to polynomial response surfaces.

6.1 Deterministic response surfaces

In a first time, methods related to the approximation of a function depending on deterministic variables are considered.

6.1.1 Polynomial response surfaces

6.1.2 Error estimation based on cross-validation

6.1.3 Kriging

6.2 Stochastic response surfaces

In this section, the approximation of a model depending on random variables is of interest.

6.2.1 Functional chaos expansions

6.2.2 Polynomial chaos basis

6.2.3 Truncation schemes for the polynomial chaos expansions

6.2.4 Computation of the polynomial chaos coefficients

6.3 Methods description

6.3.1 Step Resp. Surf.  – Linear and Quadratic Taylor Expansions

Mathematical description

Goal

The approximation of the model response y ̲=h(x ̲) around a specific set x ̲ 0 =(x 0,1 ,,x 0,n X ) of input parameters may be of interest. One may then substitute h for its Taylor expansion at point x ̲ 0 . Hence h is replaced with a first or second-order polynomial h ^ whose evaluation is inexpensive, allowing the analyst to apply the uncertainty anaysis methods (Step C).

Principles

We consider the first and second order Taylor expansions around x ̲=x ̲ 0 .

First order Taylor expansion

y ̲h ^(x ̲)=h(x ̲ 0 )+ i=1 n X h x i (x ̲ 0 ).x i -x 0,i

Introducing a vector notation, the previous equation rewrites:

y ̲y ̲ 0 +L ̲ ̲x ̲-x ̲ 0

where:

Second order Taylor expansion

y ̲h ^(x ̲)=h(x ̲ 0 )+ i=1 n X h x i (x ̲ 0 ).x i -x 0,i +1 2 i,j=1 n X 2 h x i x j (x ̲ 0 ).x i -x 0,i .x j -x 0,j

Introducing a vector notation, the previous equation rewrites:

y ̲y ̲ 0 +L ̲ ̲x ̲-x ̲ 0 +1 2Q ̲ ̲ ̲,x ̲-x ̲ 0 ,x ̲-x ̲ 0

where Q ̲ ̲= 2 y 0,k x i x j ,i,j=1,...,n X ,k=1,...,n Y is the transposed Hessian matrix.

Other notations

The described method is similar to the perturbation method presented in:

[Taylor variance decomposition – Perturbation Method] .

Link with OpenTURNS methodology

This method is aimed at building a response surface prior to tackling Step C “Uncertainty Propagation”. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The Taylor expansion allows the analytical derivation of:

References and theoretical basics

A Taylor expansion-based response surface is well suited when a local approximation of the model under consideration is sufficient. It has to be noticed that this may be a strong assumption though for a large class of models, possibly inducing erratic results. In particular, when the estimation of the probability of exceeding a threshold is of interest, the quality of the Taylor response surface has to be strongly justified. However, this method often allows the analyst to fairly study the central trend of the model response at a low computational cost.

6.3.2 Step Resp. Surf.  – Numerical methods to solve least squares problems

Mathematical description

Goal
This section presents numerical methods that can be used in order to solve least squares problems, which can be encountered when the construction of a response surface (i.e. of a meta-model) is of interest, or when one wishes to perform a statistical regression.

Least squares problem

Given a matrix Ψ ̲ ̲ N×P , N>P, and a vector y ̲ N , we want to find a vector a ̲ P such that Ψ ̲ ̲a ̲ is the best approximation to y ̲ in the least squares sense. Mathematically speaking, we want to solve the following minimization problem:

min a ̲ =Ψ ̲ ̲a ̲-y ̲ 2

In the following, it is assumed that the rank of matrix Ψ ̲ ̲ is equal to P.

Several algorithms can be applied to compute the least squares solution, as shown in the sequel.

Numerical solving schemes

Method of normal equations

It is shown that the solution of the least squares problem satisfies the so-called normal equations, which read using a matrix notation:

Ψ ̲ ̲ T Ψ ̲ ̲a ̲=Ψ ̲ ̲ T y ̲

The matrix C ̲ ̲Ψ ̲ ̲ T Ψ ̲ ̲ is symmetric and positive definite. The system can be solved using the following Cholesky factorization:

C ̲ ̲=R ̲ ̲ T R ̲ ̲

where R ̲ ̲ is an upper triangular matrix with positive diagonal entries. Solving the normal equations is equivalent to solving the two following triangular systems, which can be easily solved by backwards substitution:

R ̲ ̲ T z ̲=Ψ ̲ ̲ T y ̲,R ̲ ̲a ̲=z ̲

It has to be noted that this theoretical approach is seldom used in practice though. Indeed the resulting least squares solution is quite sensitive to a small change in the data (i.e. in y ̲ and Ψ ̲ ̲). More precisely, the normal equations are always more badly conditioned than the original overdetermined system, as their condition number is squared compared to the original problem:

κ(Ψ ̲ ̲ T Ψ ̲ ̲)=κ(Ψ ̲ ̲) 2

As a consequence more robust numerical methods should be adopted.

Method based on QR factorization

It is shown that the matrix Ψ ̲ ̲ can be factorized as follows:

Ψ ̲ ̲=Q ̲ ̲R ̲ ̲

where Q ̲ ̲ is a N-by-P-matrix with orthonormal columns and R ̲ ̲ is a P-by-P-upper triangular matrix. Such a QR decomposition may be constructed using several schemes, such as Gram-Schmidt orthogonalization, Householder reflections or Givens rotations.

In this setup the least squares problem is equivalent to solving:

R ̲ ̲a ̲=Q ̲ ̲ T y ̲

This upper triangular system can be solved using backwards substitution.

The solving scheme based on Householder QR factorization leads to a relative error that is proportional to:

κ(Ψ ̲ ̲)+r ̲ 2 κ(Ψ ̲ ̲) 2

where r ̲=Ψ ̲ ̲a ̲-y ̲. Thus this error is expected to be much smaller than the one associated with the normal equations provided that the residual r ̲ is “small”.

Method based on singular value decomposition

The so-called singular value decomposition (SVD) of matrix Ψ ̲ ̲ reads:

Ψ ̲ ̲=U ̲ ̲S ̲ ̲V ̲ ̲ T

where U ̲ ̲ N×N and V ̲ ̲ P×P are orthogonal matrices, and S ̲ ̲ N×P can be cast as:

S ̲ ̲=S ̲ ̲ 1 0 ̲ ̲

In the previous equation, S ̲ ̲ 1 P×P is a diagonal matrix containing the singular values σ 1 σ 2 σ P >0 of Ψ ̲ ̲.

It can be shown that the least squares solution is equal to:

a ̲ ^=V ̲ ̲S ̲ ̲ 1 -1 0 ̲ ̲U ̲ ̲ T y ̲

In practice it is not common to compute a “full” SVD as shown above. Instead, it is often sufficient and more economical in terms of time and memory to compute a reduced version of SVD. The latter reads:

Ψ ̲ ̲=U ̲ ̲ 1 S ̲ ̲ 1 V ̲ ̲ T

where U ̲ ̲ 1 is obtained by extracting the P first columns of U ̲ ̲.

Note that it is also possible to perform SVD in case of a rank-deficient matrix Ψ ̲ ̲. In this case the resulting vector a ̲ ^ corresponds to the minimum norm least squares solution.

The computational cost of the method is proportional to (NP 2 +P 3 ) with a factor ranging from 4 to 10, depending on the numerical scheme used to compute the SVD decomposition. This cost is higher than those associated with the normal equations and with QR factorization. However SVD is relevant insofar as it provides a very valuable information, that is the singular values of matrix Ψ ̲ ̲.

Comparison of the methods

Several conclusions may be drawn concerning the various methods considered so far:

  • If NP, normal equations and Householder QR factorization require about the same computational work. If N>>P, then the QR approach requires about twice as much work as normal equations.

  • However QR appears to be more accurate than normal equations, so it should be almost always preferred in practice.

  • SVD is also robust but it reveals the most computationally expensive scheme. Nonetheless the singular values are obtained as a by-product, which may be particularly useful for analytical and computational purposes.

Other notations

-

Link with OpenTURNS methodology

Numerical methods to solve least square problems can be used at several stages of the global methodology, such as:

References and theoretical basics

Details related to the least squares methods may be found in:
  • Å. Bjorck, 1996, “Numerical methods for least squares problems”, SIAM Press, Philadelphia, PA.


6.3.3 Step Resp. Surf.  – Polynomial response surface based on least squares

Mathematical description

Goal

Instead of replacing the model response h(x ̲) for a local approximation around a given set x ̲ 0 of input parameters as in  [Taylor approximations] , one may seek a global approximation of h(x ̲) over its whole domain of definition. A common choice to this end is global polynomial approximation. For the sake of simplicity, a scalar model response y=h(x ̲) will be considered from now on. Nonetheless, the following derivations hold for a vector-valued response.

Principles

In the sequel, one considers global approximations of the model response using:

  • a linear function, i.e. a polynomial of degree one;

  • a quadratic function, i.e. a polynomial of degree two.

Linear approximation

yh ^(x ̲)=a 0 + i=1 n X a i x i

where (a j ,j=0,,n X ) is a set of unknown coefficients.

Quadratic approximation

y ̲h ^(x ̲)=a 0 + i=1 n X a i x i + i=1 n X j=1 n X a i,j x i x j

General expression

The two previous equations may be recast using a unique formalism as follows:

y ̲h ^(x ̲)= j=0 P-1 a j ψ j (x ̲)

where P denotes the number of terms, which is equal to (n X +1) (resp. to (1+2n X +n X (n X -1)/2)) when using a linear (resp. a quadratic) approximation, and the family (ψ j ,j=0,,P-1) gathers the constant monomial 1, the monomials of degree one x i and possibly the cross-terms x i x j as well as the monomials of degree two x i 2 . Using the vector notation a ̲=(a 0 ,,a P-1 ) 𝖳 and ψ ̲(x ̲)=(ψ 0 (x ̲),,ψ P-1 (x ̲)) 𝖳 , this rewrites:

y ̲h ^(x ̲)=a ̲ 𝖳 ψ ̲(x ̲)

Computation of the coefficients

A global approximation of the model response over its whole definition domain is sought. To this end, the coefficients a j may be computed using a least squares regression approach. In this context, an experimental design 𝒳 ̲=(x (1) ,,x (N) ), i.e. a set of realizations of input parameters is required, as well as the corresponding model evaluations 𝒴 ̲=(y (1) ,,y (N) ). Note that the experimental design 𝒳 may be built using the various techniques introduced in  [Min-Max Approach using Design of Experimentss ] .

The following minimization problem has to be solved:

Finda ̲ ^thatminimizes𝒥(a ̲)= i=1 N y (i) -a ̲ 𝖳 ψ ̲(x ̲ (i) ) 2

The solution is given by:

a ̲ ^=Ψ ̲ ̲ 𝖳 Ψ ̲ ̲ -1 Ψ ̲ ̲ 𝖳 𝒴 ̲

where:

Ψ ̲ ̲=(ψ j (x ̲ (i) ),i=1,,N,j=0,,P-1)

It is clear that the above equation is only valid for a full rank information matrix. A necessary condition is that the size N of the experimental design is not less than the number P of PC coefficients to estimate. In practice, it is not recommended to directly invert Ψ ̲ ̲ 𝖳 Ψ ̲ ̲ since the solution may be particularly sensitive to an ill-conditioning of the matrix. The least-square problem is rather solved using more robust numerical methods such as singular value decomposition (SVD) or QR-decomposition (see [Numerical methods to solve least squares problems] for more details).

Other notations

Link with OpenTURNS methodology

This method is aimed at building a response surface prior to tackling Step C “Uncertainty Propagation”. It requires an experimental design together with the corresponding model evaluations. Then the various uncertainty propagation techniques may be applied at a negligible computational cost.

References and theoretical basics

Provided that the model under consideration is sufficiently smooth, linear or quadratic polynomials may be accurate approximations. This is especially true for studying the model response not too far from its mean value, i.e. for a central trend analysis. Nonetheless, one should be careful when the estimation of the probability of exceeding a threshold is of interest, since the polynomial approximation in the tails of the response distribution may reveal poor.

Numerical details related to the least-square method may be found in:

  • Å. Bjorck, 1996, “Numerical methods for least squares problems”, SIAM Press, Philadelphia, PA.


6.3.4 Step Resp. Surf.  – Polynomial response surface based on sparse least squares

Mathematical description

Goal

The response of the model under consideration may be globally approximated by a multivariate polynomial. In this setup, the response is characterized by a finite number of coefficients which have to be estimated. This may be achieved by least squares (see [Polynomial response surface based on least squares] ). However, this method cannot be applied if the number of unknown coefficients is of similar size to the number of model evaluations, or even possibly greater. Indeed, the resulting underdetermined least squares problem would be ill-posed.

To overcome this difficulty, sparse least squares schemes may be employed (they are also known as variable selection techniques in statistics). These methods are aimed at identifying a small set of significant coefficients in the model response approximation. Eventually, a sparse polynomial response surface, i.e. a polynomial which only contains a low number of non-zero coefficients, is obtained by means of a reduced number of possibly costly model evaluations. In the sequel, we focus on sparse regression schemes based on L 1 -penalization.

Actually the proposed approaches do not provide only one approximation, but a collection of less and less sparse approximations. Eventually an optimal approximation may be retained with regard to a given criterion.

Principles

Context

Consider the mathematical model h of a physical system depending on n X input parameters x ̲=(x 1 ,,x n X ) 𝖳 . Note that these input variables are assumed to be deterministic in this section. The model response may be approximated by a polynomial as follows:

h(X ̲)h ^(x ̲)= j=0 P-1 a j ψ j (x ̲) (144)

Let us consider a set of values taken by the input vector (i.e. an experimental design) 𝒳 ̲ ̲=(x ̲ (1) ,,x ̲ (N) ) 𝖳 as well as the vector 𝒴 ̲=(h(x ̲ (1) ),,h(x ̲ (N) )) 𝖳 =(y (1) ,,y (N) ) 𝖳 of the corresponding model evaluations. It is assumed that the number of terms P in the polynomial basis is of similar size to N, or even possibly significantly larger than N. In such a situation it is not possible to compute the polynomial coefficients by ordinary least squares, since the corresponding system is ill-posed. Methods that may be employed as an alternative are outlined in the sequel.

LASSO

The so-called LASSO (for Least absolute shrinkage and selection operator) method is based on a 1 -penalized regression as follows:

Minimize i=1 N h(x ̲ (i) )-a ̲ 𝖳 ψ ̲(x ̲ (i) ) 2 +Ca ̲ 1 2 (145)

where a ̲ 1 = j=0 P-1 |a j | and C is a non negative constant. A nice feature of LASSO is that it provides a sparse metamodel, i.e. it discards insignificant variables from the set of predictors. The obtained response surface is all the sparser since the value of the tuning parameter C is high.

For a given C0, the solving procedure may be implemented via quadratic programming. Obtaining the whole set of coefficient estimates for C varying from 0 to a maximum value may be computationally expensive though since it requires solving the optimization problem for a fine grid of values of C.

Forward stagewise regression

Another procedure, known as forward stagewise regression, appears to be different from LASSO, but turns out to provide similar results. The procedure first picks the predictor that is most correlated with the vector of observations. However, the value of the corresponding coefficient is only increased by a small amount. Then the predictor with largest correlation with the current residual (possible the same term as in the previous step) is picked, and so on. Let us introduce the vector notation:

ψ ̲ j =(ψ j (x ̲ (1) ),,ψ j (x ̲ (N) )) 𝖳 (146)

The forward stagewise algorithm is outlined below:

  1. Start with R ̲=𝒴 and a 0 ==a P-1 =0.

  2. Find the predictor ψ ̲ j that is most correlated with R ̲.

  3. Update a ^ j =a ^ j +δ j , where δ j =ε·sign(ψ ̲ j 𝖳 R ̲).

  4. Update R ̲=R ̲-δ j ψ ̲ j and repeats Steps 2 and 3 until no predictor has any correlation with R ̲.

Note that parameter ε has to be set equal to a small value in practice, say ε=0.01. This procedure is known to be more stable than traditional stepwise regression.

Least Angle Regression

Least Angle Regression (LAR) may be viewed as a version of forward stagewise that uses mathematical derivations to speed up the computations. Indeed, instead of taking many small steps with the basis term most correlated with current residual R ̲, the related coefficient is directly increased up to the point where some other basis predictor has as much correlation with R ̲. Then the new predictor is entered, and the procedure is continued. The LAR algorithm is detailed below:

  1. Initialize the coefficients to a 0 ,,a P-1 =0. Set the initial residual equal to the vector of observations 𝒴.

  2. Find the vector ψ ̲ j which is most correlated with the current residual.

  3. Move a j from 0 toward the least-square coefficient of the current residual on ψ ̲ j , until some other predictor ψ ̲ k has as much correlation with the current residual as does ψ ̲ j .

  4. Move jointly (a j ,a k ) 𝖳 in the direction defined by their joint least-square coefficient of the current residual on (ψ ̲ j ,ψ ̲ k ), until some other predictor ψ ̲ l has as much correlation with the current residual.

  5. Continue this way until m=min(P,N-1) predictors have been entered.

Steps 2 and 3 correspond to a “move” of the active coefficients toward their least-square value. It corresponds to an updating of the form a ̲ ^ (k+1) =a ̲ ^ (k) +γ (k) w ̲ ˜ (k) . Vector w ̲ ˜ (k) and coefficient γ (k) are referred to as the LAR descent direction and step, respectively. Both quantities may be derived algebraically.

Note that if NP, then the last step of LAR provides the ordinary least-square solution. It is shown that LAR is noticeably efficient since it only requires the computational cost of ordinary least-square regression on P predictors for producing a collection of m metamodels.

LASSO and Forward Stagewise as variants of LAR

It has been shown that with only one modification, the LAR procedure provides in one shot the entire paths of LASSO solution coefficients as the tuning parameter C in Eq.(145) is increased from 0 up to a maximum value. The modified algorithm is as follows:

  • Run the LAR procedure from Steps 1 to 4.

  • If a non zero coefficient hits zero, discard it from the current metamodel and recompute the current joint least-square direction.

  • Continue this way until m=min(P,N-1) predictors have been entered.

Note that the LAR-based LASSO procedure may take more than m steps since the predictors are allowed to be discarded and introduced later again into the metamodel. In a similar fashion, a limiting version of the forward stagewise method when ε0 may be obtained by slightly modifying the original LAR algorithm. In the literature, one commonly uses the label LARS when referring to all these LAR-based algorithms (with S referring to Stagewise and LASSO).

Selection of the optimal LAR solution

The LAR-based approaches (i.e. original LAR, LASSO and forward stagewise) provide a collection of less and less sparse PC approximations. The accuracy of each PC metamodel may be assessed by cross validation [Error estimation based on cross-validation] . Eventually the PC representation associated with the lowest error estimate is retained.

Other notations
Link with OpenTURNS methodology
Within the global methodology, the method is used to build up a deterministic polynomial approximation of the model response prior to tackling the step C: “Uncertainty propagation”. It requires an experimental design together with the corresponding model evaluations. Then the various uncertainty propagation techniques may be applied at a negligible computational cost.
References and theoretical basics
Provided that the model under consideration is sufficiently smooth, linear or quadratic polynomials may be accurate approximations. This is especially true for studying the model response not too far from its mean value, i.e. for a central trend analysis. Nonetheless, one should be careful when the estimation of the probability of exceeding a threshold is of interest, since the polynomial approximation in the tails of the response distribution may reveal poor.

The principles of the original LAR procedure are detailed in:

  • B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, 2004, “Least angle regression”, Annals of Statistics 32, 407–499.

Furthermore, the modifications of LAR allowing one to solve the LASSO and the forward stagewise problems are presented in:

  • T. Hastie, J. Taylor, R. Tibshirani, and G. Walther, 2007, “Forward stagewise regression and the monotone Lasso”, Electronic J. Stat. 1, 1–29.


6.3.5 Step Resp. Surf.  – Kriging

Mathematical description

Goal

Kriging (also known as Gaussian process regression) is a Bayesian technique that aim at approximating functions (most often in order to surrogate it because it is expensive to evaluate). In the following it is assumed we aim at surrogating a scalar-valued model :x ̲y. Note the OpenTURNS implementation of Kriging can deal with vector-valued functions (:x ̲y ̲), but it simply loops over each output. It is also assumed the model was runned over a design of experiments in order to produce a set of observations gathered in the following dataset: x ̲ (i) ,y (i) ,i=1,...,m. Ultimately Kriging aims at producing a predictor (also known as a response surface or metamodel) denoted as ˜.

Mathematical framework

We put the following Gaussian process prior on the model :

Y(x ̲)=f ̲(x ̲) t β ̲+Z(x ̲) (147)

where:

  • f ̲(x ̲) t β ̲ is a generalized linear model based upon a functional basis f ̲=f j ,j=1,...,p and a vector of coefficients β ̲=β j ,j=1,...,p,

  • Z is a zero-mean stationary Gaussian process whose covariance function reads:

    𝔼[Z(x ̲)Z(x ' ̲)]=σ 2 R(x ̲-x ' ̲,θ ̲) (148)

    where σ 2 >0 is the variance and R is the correlation function that solely depends on the Manhattan distance between input points x ̲-x ' ̲ and a vector of parameters θ ̲ n θ .

Under the Gaussian process prior assumption, the observations Y ̲=Y i ,i=1,...,m and a prediction Y(x ̲) at some unobserved input x ̲ are jointly normally distributed:

Y ̲Y(x ̲)𝒩 m+1 F ̲ ̲β ̲f ̲(x ̲) t β ̲,σ 2 R ̲ ̲r ̲(x ̲)r ̲(x ̲) t 1 (149)

where:

F ̲ ̲=f j x ̲ (i) ,i=1,...,m,j=1,...,p (150)

is the regression matrix,

R ̲ ̲=Rx ̲ (i) -x ̲ (j) ,θ ̲,i,j=1,...,m (151)

is the observations' correlation matrix, and:

r ̲(x ̲)=Rx ̲-x ̲ (i) ,θ ̲,i=1,...,m t (152)

is the vector of cross-correlations between the prediction and the observations.

As such, the Kriging predictor is defined as the following conditional distribution:

Y ˜(x ̲)=Y(x ̲)Y ̲=y ̲,θ ̲=θ ̲ * ,σ 2 =(σ 2 ) * (153)

where θ ̲ * and (σ 2 ) * are the maximum likelihood estimates of the correlation parameters θ ̲ and variance σ 2 (see references).

It can be shown (see references) that the predictor Y ˜(x ̲) is also Gaussian:

Y ˜(x ̲)=𝒩 1 μ Y ˜ (x ̲),σ Y ˜ 2 (x ̲) (154)
  • with mean:

    μ Y ˜ (x ̲)=f ̲(x ̲) t β ̲ ˜+r ̲(x ̲) t γ ̲ (155)

    where β ˜ ̲ is the generalized least squares solution of the underlying linear regression problem:

    β ̲ ˜=F ̲ ̲ t R ̲ ̲ -1 F ̲ ̲ -1 F ̲ ̲ t R ̲ ̲ -1 y ̲ (156)

    and

    γ ̲=R ̲ ̲ -1 y ̲-F ̲ ̲β ̲ ˜ (157)
  • and variance:

    σ Y ˜ 2 (x ̲)=σ 2 1-r ̲(x ̲) t R ̲ ̲ -1 r ̲(x ̲)+u ̲(x ̲) t F ̲ ̲ t R ̲ ̲ -1 F ̲ ̲ -1 u ̲(x ̲) (158)

    where:

    u ̲(x ̲)=F ̲ ̲ t R ̲ ̲ -1 r ̲(x ̲)-f ̲(x ̲) (159)

For now only the Kriging mean is returned as the metamodel ˜μ Y ˜ .

Other notations

Kriging may also be referred to as Gaussian process regression.

Link with OpenTURNS methodology

References and theoretical basics
More resources in Kriging may be found in:
  • V. Dubourg, 2011, “Adaptative surrogate models for reliability and reliability-based design optimization”, University Blaise Pascal - Clermont II. http://bruno.sudret.free.fr/dubourg.html

  • S. Lophaven, H. Nielsen and J. Sondergaard, 2002, “DACE, A Matlab kriging toolbox”, Technichal University of Denmark. http://www2.imm.dtu.dk/ hbni/dace/

  • T. Santner, B. Williams and W. Notz, 2003. “The design and analysis of computer experiments”, Springer, New York.

  • C. Rasmussen and C. Williams, 2006, T. Dietterich (Ed.), “Gaussian processes for machine learning”, MIT Press.


6.3.6 Step Resp. Surf.  – Assessment of the polynomial approximations by cross validation

Mathematical description

Goal

Once a polynomial response surface h ^ has been built up, it is crucial to estimate the approximation error, i.e. the discrepancy between the response surface and the true model response in terms of a suitable norm such as the L 2 -norm:

Err=h(x ̲)-h(x ̲) L 2 2 = 𝒟 h(x ̲)-h(x ̲) 2 dx ̲ (160)

where 𝒟 denotes the domain of variation of the input parameters. As any model evaluation may be time-consuming, error estimates that are only based on already performed computer experiments are of interest. The so-called cross validation techniques may be used in this context.

Principles

Any cross-validation scheme consists in dividing the data sample (i.e. the experimental design) into two subsamples. A metamodel h ^ is built from one subsample, i.e. the training set, and its performance is assessed by comparing its predictions to the other subset, i.e. the test set. A refinement of the method is the ν-fold cross-validation, in which the observations are randomly assigned to one of ν partitions of nearly equal size. The learning set contains in turn all but one of the partitions which is considered as the test set. The generalization error is estimated for each of the ν sets and then averaged over ν.

Classical leave-one-out error estimate

The leave-one-out (LOO) cross-validation corresponds to the special case of ν-fold cross-validation with ν=N. It is recalled that a P-term polynomial approximation of the model response reads:

h ^(x ̲)= j=0 P-1 a ^ j ψ j (x ̲) (161)

where the a ^ j 's are estimates of the coefficients obtained by a specific method, e.g. least squares.

Let us denote by h ^ (-i) the approximation that has been built from the experimental design 𝒳{x ̲ (i) }, i.e. when removing the i-th observation. The predicted residual is defined as the difference between the model evaluation at x ̲ (i) and its prediction based on h ^ (-i) :

Δ (i) h(x ̲ (i) )-h ^ (-i) (x ̲ (i) ) (162)

The expected risk is then estimated by the following LOO error:

Err LOO 1 N i=1 N Δ (i) 2 (163)

A nice feature of the LOO method is that the quantity Err LOO may be computed without performing further regression calculations when the PC coefficients have been estimated by regression. Indeed, the LOO error is given in this case by:

Δ (i) =h(x ̲ (i) )-h ^(x ̲ (i) ) 1-h i (164)

where h i is the i-th diagonal term of the matrix Ψ ̲(Ψ ̲ 𝖳 Ψ ̲) -1 Ψ ̲ 𝖳 , using the notation:

Ψ ̲ ij ψ j (x ̲ (i) ),i=1,,N,j=0,,P-1 (165)

In practice, one often computes the following normalized LOO error:

ε LOO Err LOO Cov 𝒴 ^ (166)

where Cov 𝒴 ^ denotes the empirical covariance of the response sample 𝒴:

Cov 𝒴 ^1 N-1 i=1 N y (i) -𝒴 ¯ 2 ,𝒴 ¯1 N i=1 N y (i) (167)

Corrected leave-one-out error estimate

A penalized variant of ε LOO may be used in order to increase its robustness with respect to overfitting, i.e. to penalize a large number of terms in the PC expansion compared to the size of the experimental design:

ε LOO * ε LOO T(P,N) (168)

The penality factor is defined by:

T(P,N)=N N-P1+trC ̲ ̲ emp -1 N (169)

where:

C ̲ ̲ emp 1 NΨ ̲ ̲ 𝖳 Ψ ̲ ̲,Ψ ̲ ̲=ψ j (x ̲ (i) ),i=1,,N,j=0,,P-1 (170)
Other notations
Leave-one-out cross validation is also known as jackknife in statistics.

Link with OpenTURNS methodology

Within the global methodology, the method is used to assess the accuracy of a polynomial response surface of a model output prior to tackling the step C: “Uncertainty propagation”.
References and theoretical basics
The use of leave-one-out error goes back to:
  • D. Allen, 1971, “The prediction sum of squares as a criterion for selecting prediction variables”, Technical Report 23, Dept. of Statistics, University of Kentucky.

The definition of the penalized variant has been inspired from:

  • O. Chapelle, V. Vapnik, and Y. Bengio, 2002, “Model selection for small sample regression”, Machine Learning 48 (1), 9–23.


6.3.7 Step Resp. Surf.  – Functional Chaos Expansion

Mathematical description

Goal

Accounting for the joint probability density function (PDF) f X ̲ (x ̲) of the input random vector X ̲, one seeks the joint PDF of the model response Y ̲=h(X ̲). This may be achieved using Monte Carlo (MC) simulation ( [Estimating the mean and variance using the Monte Carlo Method] ), i.e. by evaluating the model h at a large number of realizations x ̲ (i) of X ̲ and then by estimating the empirical distribution of the corresponding sample of model output h(x ̲ (i) ). However it is well-known that the MC method requires a large number of model evaluations, i.e. a great computational cost, in order to obtain accurate results.

In fact, when using MC simulation, each model run is performed independently. Thus, whereas it is expected that h(x ̲ (i) )h(x ̲ (j) ) if x ̲ (i) x ̲ (j) , the model is evaluated twice without accounting for this information. In other words, the functional dependence between X ̲ and Y ̲ is lost.

A possible solution to overcome this problem and thereby to reduce the computational cost of MC simulation is to represent the random response Y ̲ in a suitable functional space, such as the Hilbert space L 2 of square-integrable functions with respect to the PDF f X ̲ (x ̲). Precisely, an expansion of the model response onto an orthonormal basis of L 2 is of interest.

Mathematical framework

The principles of the building of a (infinite numerable) basis of this space, i.e. the PC basis, are described in the sequel.

Principle of the functional chaos expansion

Consider a model h depending on a set of random variables X ̲=(X 1 ,,X n X ) 𝖳 . We call functional chaos expansion the class of spectral methods which gathers all types of response surface that can be seen as a projection of the physical model in an orthonormal basis. This class of methods uses the Hilbertian space (square-integrable space: L 2 ) to construct the response surface.

Assuming that the physical model has a finite second order measure (i.e. Eh(X ̲) 2 <+), it may be uniquely represented as a converging series onto an orthonormal basis as follows:

h(x ̲)= i=0 + y ̲ i Φ i (x ̲).

where the y ̲ i =(y i,1 ,,y i,n Y ) 𝖳 's are deterministic vectors that fully characterize the random vector Y ̲, and the Φ i 's are given basis functions (e.g. orthonormal polynomials, wavelets).

The orthonormality property of the functional chaos basis reads:

Φ i ,Φ j = D Φ i (x ̲)Φ j (x ̲)f X ̲ (x ̲)dx ̲=δ i,j .

where δ i,j =1 if i=j and 0 otherwise. The metamodel h ^(x ̲) is represented by a finite subset of coefficients {y i ,i𝒜(N)} in a truncated basis {Φ i ,i𝒜(N)} as follows:

h ^(x ̲)= i𝒜N y i Φ i (x ̲)

As an example of this type of expansion, one can mention responses by wavelet expansion, polynomial chaos expansion, etc.

Other notations
-

Link with OpenTURNS methodology

References and theoretical basics
The chaos representation is also known as orthogonal series decomposition. An elegant mathematical framework of chaos representations may be found in:
  • C. Soize and R. Ghanem, 2004, “Physical systems with random uncertainties: chaos representations with arbitrary probability measure”, SIAM J. Sci. Comput. 26(2), 395-410.


6.3.8 Step Resp. Surf.  – Orthogonal polynomials

Mathematical description

Goal

This section provides some mathematical details on sequences of orthogonal polynomials. Some of these sequences will be used to construct the basis of the so-called polynomial chaos expansion.

Mathematical framework

The orthogonal polynomials are associated to an inner product, defined as follows:

Given an interval of orthogonality [α,β] (α{-}, β{}, α<β) and a weight function w(x)>0, every pair of polynomials P and Q are orthogonal iff :

P,Q= α β P(x)Q(x)w(x)dx=0

Therefore, a sequence of orthogonal polynomials (P n ) n0 (P n : polynomial of degree n) verifies:

P m ,P n =0foreverymn

The chosen inner product induces a norm on polynomials in the usual way:

P n =P n ,P n 1/2

In the following, we consider weight functions w(x) corresponding to probability density functions, which satisfy:

α β w(x)dx=1

Moreover, we consider families of orthonormal polynomials (P n ) n0 , that is polynomials with a unit norm:

P n =1

Recurrence: Any sequence of orthogonal polynomials has a recurrence formula relating any three consecutive polynomials as follows:

P n+1 =(a n x+b n )P n +c n P n-1

Orthogonormal polynomials with respect to usual probability distributions

Below, a table showing an example of particular (normalized) orthogonal polynomials associated with continuous weight functions.

Note that the orthogonormal polynomials determined by OpenTURNS are orthonormal with respect to the standard representative distribution of the given distribution. Refer to [Standard Parametric Models] to get information on the parameters and the moments of the standard representative of each distribution.

Ortho. poly.   P n (x)   Weight w(x) (   Recurrence coefficients (a n ,b n ,c n )  
Hermite   He n (x) (   1 2πe -x 2 2   a n =1 n+1b n =0c n =-n n+1  
Legendre   Le n (x)α>-1   1 2 ( ×𝕀 [-1,1] (x)   a n =(2n+1)(2n+3) n+1b n =0c n =-n2n+3 (n+1)2n-1  
Generalized  
Laguerre  
 
L n (α) (x)   x k-1 Γ(k)e -x 𝕀 [0,+[ (x)   ω n =(n+1)(n+k+1) -1/2 a n =ω n b n =-(2n+k+1)ω n c n =-(n+k)nω n  
Jacobi   J n (α,β) (x)α,β>-1   (1-x) α (1+x) β 2 α+β+1 B(β+1,α+1)𝕀 [-1,1] (x)   K 1,n =2n+α+β+3 (n+1)(n+α+1)(n+β+1)(n+α+β+1)K 2,n =1 2(2n+α+β+1)K 1,n a n =K 2,n (2n+α+β+2)b n =K 2,n (α-β)(α+β) 2n+α+βc n =-2n+α+β+2 2n+α+β[(n+α)(n+β)×(n+α+β)nK 1,n 2n+α+β-1] 1/2  

Furthermore, two families of orthonormal polynomials with respect to discrete probability distributions are available in OpenTURNS, namely the so-called Charlier and Krawtchouk polynomials:

Ortho. poly.   P n (x)   Probability mass function   Recurrence coefficients (a n ,b n ,c n )  
Charlier   Ch n (λ) (x)λ>0   λ k k!e -λ k=0,1,2,   a n =-1 λ(n+1)b n =n+λ λ(n+1)c n =-1-1 n+1  
Krawtchouk   Kr n (m,p) (x)m,p[0,1]   m kp k (1-p) m-k k=0,1,2,   a n =-1 (n+1)(m-n)p(1-p)b n =p(m-n)+n(1-p) (n+1)(m-n)p(1-p)c n =-(1-1 n+1)(1+1 m-n)  

The Krawtchouk polynomials are only defined up to a degree n equal to m-1. Indeed, for n=m, some factors of the denominators of the recurrence coefficients would be equal to zero.

To sum up, the distribution types handled by OpenTURNS are reported in the table below together with the associated families of orthonormal polynomials.

Distribution   Support   Polynomial  
Normal 𝒩(0,1)     Hermite  
Uniform 𝒰(-1,1)   [-1,1]   Legendre  
Gamma Γ(k,1,0)   (0,+)   Laguerre  
Beta B(α,β,-1,1)   (-1,1)   Jacobi  
Poisson 𝒫(λ)     Charlier  
Binomial (m,p)   {0,,m}   Krawtchouk  

It is recalled that the Krawtchouk polynomials are only defined up to a degree n equal to m-1.

Orthogonal polynomials with respect to arbitrary probability distributions

It is also possible to generate a family of orthonormal polynomials with respect to an arbitrary probability distribution w(x). The well-known Gram-Schmidt algorithm can be used to this end. Note that this algorithm gives a constructive proof of the existence of orthonormal bases.

However it is known to be numerically unstable, so alternative procedures are often used in practice. The two available algorithms in OpenTURNS so far are the so-called modified Gram-Schmidt algorithm and the modified Chebyshev algorithm. Another procedure based on Cholesky decomposition should be also available soon.

Other notations
-

Link with OpenTURNS methodology

Within the global methodology, the orthogonal polynomial families are generated to construct the basis of the polynomial chaos expansion. The latter corresponds to a specific approach in order to tackle step C: “Uncertainty propagation”. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The method requires to have fulfilled the two following steps:
  • step A1: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Y ̲=h(X ̲);

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲.

References and theoretical basics

-

6.3.9 Step Resp. Surf.  – Polynomial chaos basis

Mathematical description

Goal

The current section is focused on a specific kind of functional chaos representation (see [Functional Chaos Expansion] ) that has been implemented in OpenTURNS, namely polynomial chaos expansions.

Mathematical framework

Throughout this section, the model response is assumed to be a scalar random variable Y=h(X ̲). However, the following derivations hold in case of a vector-valued response.

Let us suppose that:

  • Y=h(X ̲) has a finite variance, i.e. Var h(X ̲)<;

  • X ̲ has independent components.

Then it is shown that Y ̲ may be expanded onto the PC basis as follows:

Yh(X ̲)= j=0 a j ψ j (X ̲) (171)

where the ψ j 's are multivariate polynomials that are orthonormal with respect to the joint PDF f X ̲ (x ̲), that is:

ψ j (X ̲),ψ k (X ̲)𝔼ψ j (X ̲)ψ k (X ̲)=δ j,k (172)

where δ j,k =1 if j=k and 0 otherwise, and the a j 's are deterministic coefficients that fully characterize the response Y ̲.

Building of the PC basis – independent random variables

We first condider the case of independent input random variables. In practice, the components X i of random vector X ̲ are rescaled using a specific mapping T i , usually referred to as an isoprobabilistic transformation (see  [Isoprobabilistic transformation] ). The set of scaled random variables reads:

U i =T i (X i ),i=1,,n (173)

Common choices for U i are standard distributions such as a standard normal distribution or a uniform distribution over [-1,1]. For simplicity, it is assumed from now on that the components of the original input random vector X ̲ have been already scaled, i.e. X i =U i .

Let us first notice that due to the independence of the input random variables, the input joint PDF may be cast as:

f X ̲ (x ̲)= i=1 n f X i (x i ) (174)

where f X i (x i ) is the marginal PDF of X i . Let us consider a family {π j (i) ,j} of orthonormal polynomials with respect to f X i , i.e. :

π j (i) (X i ),π k (i) (X i )𝔼π j (i) (X i )π k (i) (X i )=δ j,k (175)

The reader is referred to  [Orthogonal polynomials] for details on the selection of suitable families of orthogonal polynomials. It is assumed that the degree of π j (i) is j for j>0 and π 0 (i) 1 (i=1,,n). Upon tensorizing the n resulting families of univariate polynomials, one gets a set of orthonormal multivariate polynomials {ψ α ̲ ,α ̲ n } defined by:

ψ α ̲ (x ̲)π α 1 (1) (x 1 )××π α n (n) (x n ) (176)

where the multi-index notation α ̲{α 1 ,,α M } has been introduced.

Building of the PC basis – dependent random variables

In case of dependent variables, it is possible to build up an orthonormal basis as follows:

ψ α ̲ (x ̲)=K(x ̲) i=1 M π α i (i) (x i ) (177)

where K(x ̲) is a function of the copula of X ̲. Note that such a basis is no longer polynomial. When dealing with independent random variables, one gets K(x ̲)=1 and each basis element may be recast as in Eq.(176). Determinating K(x ̲) is usually computionally expensive though, hence an alternative strategy for specfific types of input random vectors.

If X ̲ has an elliptical copula instead of an independent one, it may be recast as a random vector U ̲ with independent components using a suitable mapping T:X ̲U ̲ such as the Nataf tranformation [Generalized Nataf Transformation] . The so-called Rosenblatt tranformation may also be applied in case of a Gaussian copula [Rosenblatt Transformation] . Thus the model response Y may be regarded as a function of U ̲ and expanded onto a polynomial chaos expansion with basis elements cast as in Eq.(176).

Link with classical deterministic polynomial approximation

In a deterministic setting (i.e. when the input parameters are considered to be deterministic), it is of common practice to substitute the model function h by a polynomial approximation over its whole domain of definition as shown in [Polynomial response surface based on least squares] . Actually this approach is strictly equivalent to:

  1. Regarding the input parameters as random uniform random variables

  2. Expanding any quantity of interest provided by the model onto a PC expansion made of Legendre polynomials

Other notations

Link with OpenTURNS methodology

Within the global methodology, the polynomial chaos expansion of the model under consideration is built up prior to tackling the step C: “Uncertainty propagation”. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The method requires to have fulfilled the two following steps:
  • step A1: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Y ̲=h(X ̲);

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲.

References and theoretical basics

The method is sometimes also known as orthogonal series decomposition. The reader is referred to the following pioneering work in mechanical engineering:
  • R. Ghanem and P. Spanos, 1991, “Stochastic finite elements – A spectral approach”, Springer Verlag. (Reedited by Dover Publications, 2003).

Examples

Let us consider the so-called (scalar valued) Ishigami function defined by:
Y=h(X ̲)=sinX 1 +7sin 2 X 2 +0.1X 3 4 sinX 1 (178)

where the X i 's (i=1,...,3) are independent random variables that are uniformly distributed over [-π,π].

The input random variables are first transformed into uniform variables over [-1,1] by applying Eq.(173):

U i =T i (X i )=X π,i=1,,3 (179)

Then the model response Y is expanded onto the following PC basis made of normalized Legendre polynomials:

Y= (α 1 ,α 2 ,α 3 ) 3 a α 1 ,α 2 ,α 3 ψ α 1 ,α 2 ,α 3 (U 1 ,U 2 ,U 3 ) (180)

where:

ψ α 1 ,α 2 ,α 3 (U 1 ,U 2 ,U 3 )=Le α 1 (U 1 )×Le α 2 (U 2 )×Le α 3 (U 3 ) (181)

denoting by Le j the (univariate) normalized Legendre polynomial of degree j.

The basis polynomials of total degree not greater than 2 in the above representation are reported in the following table.

Total degree i=1 3 α i   Multi-index α ̲   Polynomial ψ α ̲ (U ̲)  
0   {0,0,0}   1  
  {1,0,0}   3U 1  
1   {0,1,0}   3U 2  
  {0,0,1}   3U 3  
  {2,0,0}   (3U 1 2 -1)5/2  
  {0,2,0}   (3U 2 2 -1)5/2  
2   {0,0,2}   (3U 3 2 -1)5/2  
  {1,1,0}   3U 1 U 2  
  {1,0,1}   3U 1 U 3  
  {0,1,1}   3U 2 U 3  

6.3.10 Step Resp. Surf.  – Strategies for enumerating the polynomial chaos basis

Mathematical description

Goal

The polynomial chaos (PC) expansion allows one to obtain an explicit representation of the random response Y ̲ of the model under consideration, see  [PC basis] . More precisely, the response is cast as a converging series featuring orthonormal polynomials. For computational purpose, it is necessary though to retain a finite number of terms by truncating the expansion. First of all, a specific strategy for enumerating the infinite PC series has to be defined. This is the scope of the current section.

Enumerating strategies

Given an input random vector X ̲ with prescribed probability density function (PDF) f X ̲ (x ̲), it is possible to build up a polynomial chaos (PC) basis {ψ α ̲ ,α ̲ n X } [Polynomial chaos basis] . Of interest is the definition of enumeration strategies for exploring this basis, i.e. of suitable enumeration functions τ from to n X , which creates a one-to-one mapping between an integer j and a multi-index α ̲.

Linear enumeration strategy

Let us first define the total degree of any multi-index α ̲ in n X by i=1 n X α i . A natural choice to sort the PC basis (i.e. the multi-indices α ̲) is the lexicographical order with a constraint of increasing total degree. Mathematically speaking, a bijective enumeration function τ is defined by:

τ: n X j{α 1 ,,α M }{τ 1 (j),,τ M (j)} (182)

such that:

τ(0)={0,,0} (183)

and

1j<k, i=1 n X τ i (j)< i=1 n X τ i (k)orm{1,,n X }:im,τ i (j)=τ i (k)andτ m (j)<τ m (k) (184)

Such an enumeration strategy is illustrated in a two-dimensional case (i.e. n X =2) in the figure below:

This corresponds to the following enumeration of the multi-indices:

j   α ̲={α 1 ,α 2 }  
0   {0,0}  
1   {0,1}  
2   {1,0}  
3   {2,0}  
4   {1,1}  
5   {0,2}  
6   {3,0}  
7   {2,1}  
8   {1,2}  
9   {0,3}  

Hyperbolic enumeration strategy

The hyperbolic truncation strategy is inspired by the so-called sparsity-of-effects principle, which states that most models are principally governed by main effects and low-order interactions. Accordingly, one wishes to define an enumeration strategy which first selects those multi-indices related to main effects, i.e. with a reasonably small number of nonzero components, prior to selecting those associated with higher-order interactions.

For any real number q in (0,1], one defines the q-hyperbolic norm (or q-norm for short) of a multi-index α ̲ by:

α ̲ q = i=1 n X α q 1/q (185)

Strictly speaking, · q is not properly a norm but rather a quasi-norm since it does not satisfy the triangular inequality. However this abuse of language will be used in the following. Note that the case q=1 corresponds to the definition of the total degree.

Let λ be a real positive number. One defines the set of multi-indices with q-norm not greater than λ as follows:

𝒜 λ ={α ̲ n X :α ̲ q λ} (186)

Moreover, one defines the front of 𝒜 λ by:

𝒜 λ =α ̲𝒜 λ :i{1,,n X },α ̲+e i ̲𝒜 λ (187)

where e i ̲ is a multi-index with a unit i-entry and zero k-entries, ki.

The idea consists in exploring the space n X by progressively increasing the q-norm of its elements. In this purpose, one wants to construct an enumeration function that relies upon (1) the bijection τ defined in the previous paragraph and (2) an appropriate increasing sequence (λ n ) that tends to infinity. Such a sequence can be used to define a specific partition of n X into stratas (Δ n ) . Precisely, the enumeration of the multi-indices is achieved by sorting the elements of Δ n in ascending order of the q-norm, and then by sorting the possible elements having the same q-norm using the bijection τ. Several examples of partition are given in the sequel.

Partition based on the total degree. We can simply define the sequence (λ n ) as the set of natural integers . Thus we build up a sequence (Δ n ) of stratas as follows:

Δ 0 ={0 ̲}n1,Δ n =𝒜 n 𝒜 n-1 ={α ̲ n X :n-1<α ̲ q n} (188)

The progressive exploration of n X is depicted in the two-dimensional case for various values of the parameter q:

As expected, the hyperbolic norms penalize the indices associated with high-order interactions all the more since q is low. Note that setting q equal to 1 corresponds to the usual linear enumeration strategy. Then the retained polynomials are located under a straight line, hence the label linear enumeration strategy. In contrast, when q<1, the retained basis polynomials are located under an hyperbola, hence the name hyperbolic enumeration strategy.

Partition based on disjoint fronts. Instead of considering the sequence of natural integers, we define the sequence (λ n ) recursively by:

λ 0 =0n1,λ n =inf λ + λλ n-1 and𝒜 λ 𝒜 λ n-1 = (189)

In other words, λ n is the infimum of the real numbers λ for which the new front contains only element which do not belong to the former one. Hence the sequence of stratas:

Δ 0 ={0 ̲}n1,Δ n =𝒜 λ n 𝒜 λ n-1 (190)

Note that this partition of n X is finer than the one based on total degrees, since the cardinality of the stratas is smaller.

Anisotropic hyperbolic enumeration strategy

One might also consider enumeration strategies based on an anisotropic hyperbolic norm defined by:

α ̲ w ̲,q = i=1 n X w i α q 1/q (191)

where the w i 's are real positive numbers. This would lead to first select the basis polynomials depending on a specific subset of input variables.

In this setup, it is also possible to explore the space n X of the multi-indices by partitioning it according to one of the two schemes outlined in the previous paragraph (it is only necessary to replace the isotropic q-norm in Eq.(186) with the (w ̲,q)-anisotropic one).

We may also build up an alternative partition related to the partial degree of the most important variable, i.e. the one associated to the smallest weight w i . Then the sequence (λ n ) is equal to and the sets 𝒜 λ are defined by:

𝒜 λ ={α ̲ n X :α i * λ},i * =argminw i ,1in X (192)

If stratas with larger cardinalities are of interest, one may rather consider the partial degree of the least significant variable, i.e. the one associated with the greatest weight w i . To this end, the index i * in the previous formula has to be defined by:

i * =argmaxw i ,1in X (193)

Other notations

Link with OpenTURNS methodology

Within the global methodology, the polynomial chaos expansion of the model under consideration is built up prior to tackling the step C: “Uncertainty propagation”. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The method requires to have fulfilled the two following steps:
  • step A1: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Y ̲=h(X ̲);

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲.

References and theoretical basics

The hyperbolic enumeration strategy has been inspired by the following work:
  • G. Blatman, 2009, “Adaptive sparse polynomial chaos expansions for uncertainty propagation and sensitivity analysis”, PhD thesis, Clermont Université.


6.3.11 Step Resp. Surf.  – Truncation schemes for the polynomial chaos expansion

Mathematical description

Goal

The polynomial chaos (PC) expansion allows one to obtain an explicit representation of the random response Y ̲ of the model under consideration, see  [PC basis] . More precisely, the response Y is cast as a converging series featuring orthonormal polynomials. For computational purpose, it is necessary though to retain a finite number of terms by truncating the expansion. Strategies for enumerating the infinite PC basis have been introduced [Strategies for enumerating the polynomial chaos basis] . Then it is possible to device several strategies in order to truncate the representation, as shown in the sequel.

Principle

As shown in  [PC basis] , the random model response Y ̲ may be expanded onto the PC basis as follows:

Y ̲=h(X ̲)= α ̲ n X a ̲ α ̲ ψ α ̲ (X ̲) (194)

where n denotes the dimension of vector X ̲. For practical implementation, finite dimensional polynomial chaoses have to be built. In other words, a suitable finite subset 𝒜 of n X has to be selected. Using a specific enumeration strategy, a one-to-one mapping between natural integers j and the multi-indices α ̲ in n X is created and the PC series rewrites:

Y ̲=h(X ̲)= j=0 a ̲ j ψ j (X ̲) (195)

Fixed strategy

The so-called fixed strategy simply consists in retaining the first P elements of the PC basis, the latter being ordered according to a given enumeration scheme (hyperbolic or not). The retained set is built in a single pass. The truncated PC expansion is given by:

h ^(X ̲)= j=0 P-1 a ̲ j ψ j (X ̲) (196)

In case of a linear enumeration strategy, a usual choice is to set P equal to n X +p p=(n X +p)! n X !p!, for a given natural integer p. This way the set of retained basis functions {ψ j ,j=0,,P-1} gathers all the polynomials with total degree not greater than p. The number of terms P grows polynomially both in n X and p though, which may lead to difficulties in terms of computational efficiency and memory requirements when dealing with high-dimensional problems.

Sequential strategy

The sequential strategy consists in constructing the basis of the truncated PC iteratively. Precisely, one begins with the first term ψ 0 , that is 𝒜 0 ={0}, and one complements the current basis as follows: 𝒜 k+1 =𝒜 k {ψ k+1 }. The construction process is stopped when a given accuracy criterion is reached, or when P is equal to a prescribed maximum basis size P max .

Cleaning strategy

The cleaning strategy is aimed at building a PC expansion containing at most P significant coefficients, i.e. at most P significant basis functions. It proceeds as follows:

  • Generate an initial PC basis made of the P first polynomials (according to the adopted enumeration strategy), or equivalently an initial set of indices 𝒜={0,,P-1}

  • Discard from the basis all those polynomials ψ j associated with insignificant coefficients, i.e. the coefficients that satisfy:

    |a j |ε·max k𝒜 |a k | (197)

    where ε is a given tolerance, e.g. ε=10 -4 .

  • Add the next basis term ψ P to the current basis 𝒜

  • Reiterate the procedure until either P terms have been retained or if a given number P max of candidate terms have been considered

Other notations

Link with OpenTURNS methodology

Within the global methodology, the polynomial chaos expansion of the model under consideration is built up prior to tackling the step C: “Uncertainty propagation”. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The method requires to have fulfilled the two following steps:
  • step A1: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Y ̲=h(X ̲);

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲.

References and theoretical basics


6.3.12 Step Resp. Surf.  – Computation of the polynomial chaos coefficients

Mathematical description

Goal

The polynomial chaos (PC) expansion allows one to obtain an explicit representation of the random response Y ̲=h(X ̲) of the model under consideration, see  [Polynomial chaos basis] . More precisely, the response Y ̲ is cast as a converging series featuring orthonormal polynomials. For computational purpose, it is necessary though to retain a finite number of terms by truncating the expansion as shown in  [Truncation schemes for the polynomial chaos expansion] . Then it is necessary to estimate the PC coefficients in order to characterize Y ̲. This may be achieved using a projection strategy as shown in the sequel.

Projection strategy

The model response is assumed to be scalar for the sake of simplicity, i.e. Y ̲=Y. However the following derivations hold componentwise in case of a vector-valued random response. Let us consider the following truncated PC representation of the model response:

Yh(X ̲) j𝒜 a j ψ j (X ̲) (198)

where 𝒜 is a non empty finite set of indices, whose cardinality is denoted by P. Using the matrix notation a ̲={a 0 ,,a P-1 } and ψ ̲(X ̲)={ψ 0 (X ̲),,ψ P-1 (X ̲)} 𝖳 , the PC representation in Eq.(198) rewrites:

Yh(X ̲)a ̲ 𝖳 ψ ̲(X ̲) (199)

The coefficients may be computed by a L 2 -projection onto the PC basis as follows:

a ̲=argmin b ̲ P 𝔼h(X ̲)-b ̲ 𝖳 ψ ̲(X ̲) 2 (200)

Due to the orthonormality of the PC basis, it may be shown that the previous equation rewrites:

a ̲=𝔼h(X ̲)ψ ̲(X ̲) (201)

Two kinds of estimates can be derived depending in a discretization of either Eq.(200) or Eq.(201)

Least squares strategy

The PC coefficients can be estimated by solving a discretized version of Eq.(200). An experimental design 𝒳 ̲ ̲={x (1) ,,x (N) }, i.e. a set of realizations of input parameters is required, as well as the corresponding model evaluations 𝒴 ̲={y (1) ,,y (N) }. Note that the experimental design 𝒳 ̲ ̲ may be built using the various techniques introduced in  [Min-Max Approach using Design of Experimentss ] .

The following least squares problem has to be solved:

a ̲ ^=argmin b ̲ P i=1 N y (i) -b ̲ 𝖳 ψ ̲(x ̲ (i) ) 2 (202)

A closed-form solution may be obtained as shown in [Polynomial response surface based on least squares] . The numerical solving schemes for the least squares problem are studied in  [Numerical methods to solve least squares problems] .

Sparse least squares strategy

If the size P of the PC basis is of similar size to N, or even possibly significantly larger than N, then the ordinary least squares problem in Eq.(202) is ill-posed. The sparse least squares approaches outlined in [Polynomial response surface based on sparse least squares] , namely LAR, LASSO and Forward Stagewise, may be employed instead. Eventually a sparse PC representation is obtained, that is an approximation which only contains a small number of active basis functions.

Integration strategy

As an alternative, the PC coefficients may be estimated by discretizing the “simplified” expression (201). Although this approach has not been implemented in OpenTURNS yet, we provide the basic principles of the solving scheme.

By definition of the mathematical expectation operator 𝔼·, this corresponds to calculating the following multi-dimensional integral:

a ̲=𝔼h(X ̲)ψ ̲(X ̲)= 𝒟 h(x ̲)ψ ̲(x ̲)f X ̲ (x ̲)dx ̲ (203)

where 𝒟 and f X ̲ (x ̲) are respectively the support and the probability density function of the random vector X ̲. The integral can be approximated by numerical quadrature as follows:

a ̲ ^= i=1 N w (i) h(x ̲ (i) )ψ ̲(x ̲ (i) ) (204)

where the w (i) 's and the x ̲ (i) 's are referred to as the quadrature weights and nodes, respectively. Both quantities may be selected according to well-known quadrature rules, such as Gauss quadrature or Clenshaw-Curtis quadrature. Otherwise, it is commonplace to generate independent realizations x ̲ (i) of X ̲ and to set all the w (i) 's equal to 1/N. Such a scheme is known as Monte Carlo simulation (see   [Standard Monte Carlo simulation] ).

Other notations

Link with OpenTURNS methodology

Within the global methodology, the method is used to build up a PC approximation of the model response prior to tackling the step C: “Uncertainty propagation”. It requires a set of output values obtained with the initial model computed at different input values or the set of inputs and the initial model. Then the various uncertainty propagation techniques may be applied at a negligible computational cost. The method requires to have fulfilled the two following steps:
  • step A1: identify of an input vector X ̲ of sources of uncertainties and an output variable of interest Y ̲=h(X ̲);

  • step B: identify one of the proposed techniques to estimate a probabilistic model of the input vector X ̲.

References and theoretical basics

The computation of the polynomial chaos coefficients by ordinary least squares was proposed in:
  • M. Berveiller, 2005, “Eléments finis stochastiques : approches intrusive et non intrusive pour des analyses de fiabilité.”, PhD thesis, Clermont Université.

Besides, the use of sparse least squares approaches in the same purpose was advocated in:

  • G. Blatman, 2009, “Adaptive sparse polynomial chaos expansions for uncertainty propagation and sensitivity analysis”, PhD thesis, Clermont Université.


OpenTURNS' methods for Step C': ranking uncertainty sources / sensitivity analysis  
Table of contents
Index