1 Probabilistic input vector modelisation

The objective of the section is to model the probabilistic input vector, described with different ways, according to available data .

It corresponds to the step "Step B : Quantification of the uncertainty sources" of the global methodology ( see Reference Guide - Global Methodology for an uncertainty study - Part B : quantification of the uncertainty sources ).

1.0.1 UC : Creation of the random input vector from a distribution

The objective of this Use Case is to model a random vector described by its joint probability density function. This random vector is called a UsualRandomvector. This UC is particularly adapted to the input random vector.

Requirements  

the input distribution : inputDistribution

type:

Distribution

 
Results  

the random input vector : inputRandomVector

type:

RandomVector which implementation is a UsualRandomVector

 

Python script for this UseCase :

                 # Create the UsualRandomVector 'inputRandomVector' from                # the Distribution 'inputDistribution'                inputRandomVector = RandomVector(inputDistribution)


1.1 Without samples of data

1.1.1 UC : List of usual distributions

The objective of this Use Case is to list all the usual distributions proposed by OpenTURNS and to precise how each distribution is created, with its different arguments.

The different distributions proposed by OpenTURNS are listed here after.

Name   probability density function   conditions   param. 1   param. 2 ( (  
Arcsine   1 πb-a 21-x-a+b 2 b-a 2 2   a<b   (a,b)   (μ,σ) with μ=a+b 2σ=b-a 22  
Beta   (x-a) (r-1) ( (b-x) (t-r-1) (b-a) (t-1) B(r,t-r)1 [a,b] (x)   r>0, t>r, a<b   (r,t,a,b)   (μ,σ,a,b) with μ=a+(b-a)r tσ=(b-a)r tt-r r(t+1) (  
Burr   ckx (c-1) (1+x c ) (k+1) 1 ]0,+[ (x)   c>0, k>0,   (c,k)   –  
Chi   x ν-1 e -x 2 /2 2 1-ν ( /2 Γ(ν/2) ( 1 [0,+[ (x)   ν>0   ν   –  
ChiSquare   2 -ν ( /2 Γ(ν/2) ( x (ν/2-1) e -x/2 1 [0,+[ (x)   ν>0   ν   –  
Dirichlet   A1- j=1 d x j (θ d+1 -1) j=1 d x j (θ j -1) 1 Δ (x ̲) with A=Γ( j=1 d+1 θ j ) j=1 d+1 Γ(θ j ), Δ={x ̲ d /i,x i 0, i=1 d x i 1}   d1, θ i >0   (θ 1 ,,θ d+1 )   –  
Epanechnikov   3 ( 4 ( (1-x 2 )1 [-1,1] (x)   –   –   –  
Exponential   λe -λ(x-γ) ( 1 [γ,+[ ( (x)   λ>0   (λ,γ)   –  
Fisher-Snedecor   d 1 x d 1 x+d 2 d 1 /2 d 2 d 1 x+d 2 d 2 /2 ( 1 x0 Ax with A=B(d 1 /2,d 2 /2)   d i >0   d 1 ,d 2 (   –  
Gamma   λ k ( Γ(k) ( (x-γ) (k-1) e -λ(x-γ) 1 [γ,+[ (x)   k>0, λ>0   (k,λ,γ)   (μ,σ,γ) with μ=k λ+γσ=k λ  
Gumbel   αe -α(x-β)-e -α(x-β)   α>0   (α,β)   (μ,σ) with μ=γ * α+βσ=π 61 α ( where γ * =- 0 log(t)e -t dt is Euler's constant. or (a,b) with a=βb=1 α ( (param. 3)  
Histogram   i=1 n h i 1 [x i ,x i+1 ] (x)/S   l i ( =x i+1 -x i S= i=1 n h i l i l i 0   (x 1 ,(l i ,h i )) 1in   –  
Inverse ChiSquare   exp-1 2x Γν 2λ ν ( 2 x ν 2+1 1 x>0   ν>0   (ν)   –  
Inverse Gamma   exp-1 λx ( Γ ( (k)λ k x k+1 1 x>0   k>0, λ>0   (k,λ)   –  
Inverse Normal   λ 2πx 3 1 ( /2 e -λ(x-μ) 2 /(2μ 2 x) 1 x>0   λ>0, μ>0   (λ,μ)   –  
InverseWishart   |V ̲ ̲| ν 2 e - tr (V ̲ ̲X ̲ ̲ -1 ) ( 2 2 νp 2 |X ̲ ̲| ν+p+1 2 Γ p ν 2 ( 1 p + () (X ̲ ̲) where p + () is the set of symmetric positive matrices of dimension p   V ̲ ̲ p + (), ν>p-1   (V ̲ ̲,ν)   –  
Laplace   λ ( 2 ( e -λ|x-μ|   λ>0   (λ,μ)   –  
Logistic   e -x-α β ( β1+e -x-α β ( 2   β>0   (α,β)   –  
LogNormal   e -1 2log(x-γ)-μ σ 2 ( 2πσ (x-γ)1 [γ,+[ (x)   σ >0   (μ ,σ ,γ)   (μ,σ,γ) or (μ,σ μ,γ) (param. 3) with μ>γ, σ>0. We have : μ=e 1 2σ 2 +μ +γσ=(e 1 2σ 2 +μ )e σ 2 -1  
LogUniform   1 x(b -a )1 [a ,b ] (log(x))   b >a   (a ,b )   –  
Meixner   2cos(β/2) 2δ ( 2απΓ(2δ)e β(x-μ) α Γ(δ+ix-μ α) 2 with i 2 =-1   α>0, β]-π,π[, δ>0   α,β,δ,μ   –  
Non Central Chi Square   j=0 e -λ λ j j!p χ 2 (ν+2j) (x) where p χ 2 (q) is the PDF of a χ 2 (q) random variate.   ν>0, λ0   (ν,λ)   –  
Triangular   2x-a (m-a)(b-a)axm2b-x (b-m)(b-a)mxb0otherwise. (   a<m<b   (a,m,b)   –  
Truncated Normal   1 ( σ n φ(x-μ n σ n ) Φ(b-μ n σ n )-Φ(a-μ n σ n )1 [a,b] (x)   σ n >0   (μ n ,σ n ,a,b)   –  
Uniform   1 ( b-a1 [a,b] (x)   a<b   (a,b)   –  
Weibull   β ( αx-γ α β-1 e -x-γ α β 1 [γ,+[ (x)   α>0, β>0   (α,β,γ)   (μ,σ,γ) with μ=αΓ(1+1 β)+γσ=αΓ(1+2 β)-Γ 2 (1+1 β) (  
Wishart   |X ̲ ̲| ν-p-1 2 e - tr (V ̲ ̲ -1 X ̲ ̲) ( 2 2 νp 2 |V ̲ ̲| ν 2 Γ p ν 2 ( 1 p + () (X ̲ ̲) where p + () is the set of symmetric positive matrices of dimension p   V ̲ ̲ p + (), ν>p-1   (V ̲ ̲,ν)   –  

Let's note that a random variable X is said to have a standard non-central student distribution 𝒯(ν,δ) if it can be written as:
X=N C/ν (1)

where N has the normal distribution 𝒩(δ,1) and C has the χ 2 (ν) distribution, N and C being independent.

The non-central Student distribution in OpenTURNS has an additional parameter γ such that the random variable X is said to have a non-central Student distribution 𝒯(ν,δ,γ) if X-γ has a standard 𝒯(ν,δ) distribution.

The probability density function of the Non Central Student writes:

p NCS (x)=exp(-δ 2 /2) νπΓ(ν/2)ν ν+(x-γ) 2 (ν+1)/2 j=0 Γν+j+1 2 Γ(j+1)δ(x-γ)2 ν+(x-γ) 2 j

The Student distribution has the following probability density function, written en dimension d :

p T (x ̲)=Γν+d 2 (πd) d 2 Γν 2|R ̲ ̲| -1/2 k=1 d σ k 1+z ̲ t R ̲ ̲ -1 z ̲ ν -ν+d 2

where z ̲=Δ ̲ ̲ -1 x ̲-μ ̲ with Δ ̲ ̲= diag ̲ ̲(σ ̲).

In dimension d=1 we have the following expression :

p T (x)=Γν+1 2 πΓν 21 σ1+(x-μ) 2 ν -ν+1 2

The Normal Gamma distribution is the distribution of the random vector (X,Y) where Y follows the distribution Γ(α,β) with α>0 and β>0, X|Y follows the distribution 𝒩(μ,1 κY). Its probability density function writes:

p NG (x,y)=Γ(α) β α 2π κy α-1/2 exp-y 2κ(x-μ) 2 +2β

The Inverse ChiSquare distribution parametered by ν, with ν>0 is the distribution of the random variable X such that 1 X follows the ChiSquare(ν) distribution.

Note also that the Inverse ChiSquare distribution parametered by ν is exactly the Inverse Gamma(ν 2,2) distribution.

The Inverse Gamma distribution parametered by (k,λ), with k>0 and λ>0, is the distribution of the random variable X such that 1 X follows the Γ(k,1 λ) distribution.

Name   Distribution   conditions   param. 1 ( (  
Bernoulli   P(X=1) ( =p,P(X=0)=1-p   p[0,1]   p  
Binomial   P(X=k)=C n k p k (1-p) n-k (   k ( {0,,n}np[0,1]   (n,p)  
Dirac   X ̲=point ̲ ( ( =1   -   point  
Geometric   P(X=k)=p(1-p) ( k ( -1   k *   p  

Name   Distribution   conditions   param. 1 ( (  
KPermutations Distribution   X ̲=point=1/d whith d=A n k =n! (n-k)! and point is an injective function (i 0 ,,i k 1 ) from {0,,k-1} into {0,,n-1}   k1,n1   (k,n)  
Multinomial (nD)   P(X ̲=x ̲)=N! x 1 !x n !(N-s)!p 1 x 1 p n x n (1-q) N-s   0 ( p i 1x i q= k=1 n p k 1s= k=1 n x k N (   ((p k ) 1kn ,N)  
Negative Binomial   P(X=k)=Γ(k+r) Γ(r)Γ(k+1)p k (1-p) r (   k ( r(0,+)p(0,1)   (r,p)  
Poisson   P(X=k)=λ k ( k! ( e -λ   k, λ>0   λ  
Skellam   X=k ( =2Y=2λ 1 , Yχ ν=2(k+1),δ=2 ( λ 2 2   k, λ i >0   (λ 1 ,λ 2 )  
User defined (nD)   P(X ̲=x ̲ k )=p k ) 1kN   0p k 1, k=1 N ( p k =1   (x k ̲,p k ) 1kN  
Zipf-Mandelbrot   P(X=k)=1 (k+q) s 1 H(N,q,s) k[1,N], where H(N,q,s)= i=1 N 1 (i+q) s (Generalized Harmonic Number)   0p k 1, k=1 N ( p k =1   N1, q0, s>0  

Let's note that in dimension 1, the Multinomial distribution is the Binomial distribution B(n,p) described as :

k,P(X=k)=C n k p k (1-p) n-k .

Furthermore, for all these 1D usual distributions, it is possible to truncate them within [a,b], [a,+[ or ]-,b] (see UC.1.1.2).

Requirements  

none

 
Results  

the random input distribution

type:

Distribution

 

Python script for this UseCase :

script_docUC_InputNoData_UsualDist.py

########## # Arcsine ##########   # Ppal Param: Arcsine(a, b) arcsine = Arcsine(5.0, 11.0) # Param 1 : (mu, sigma) is coded by 1 arcsine = Arcsine(0.0, 1.0, 1) # It is also possible to write : arcsine = Arcsine(0.0, 1.0, Arcsine.MUSIGMA) # Default construction => Arcsine() = Arcsine(-1.0, 1.0) arcsine = Arcsine()   ########## # Beta ##########   # Ppal Param : Beta(r, t, a, b) beta = Beta(2.0, 3.0, 0.0, 2.0) # Param 1 : (mu, sigma, a, b) is coded by 1 beta = Beta(1.0, 0.5, 0.0, 2.0, 1) # It is also possible to write : beta = Beta(1.0, 0.5, 0.0, 2.0, Beta.MUSIGMA) # Default construction ==> Beta(r, t, a, b)= Beta(2, 4, -1, 1) beta = Beta()   ########## # Burr ##########   # Ppal Param: Burr(c,k) burr = Burr(1.5, 2.7) # Default construction =>  Burr(c,k) = (1.0, 1.0) burr = Burr()   ########## # Chi ##########   # Ppal Param: Chi(nu) chi = Chi(1.5) # Default construction => Chi(nu) = Chi(1.0) chi = Chi()   ########## # ChiSquare ##########   # Ppal Param: ChiSquare(nu) chiSquare = ChiSquare(1.5) # Default construction => ChiSquare(nu) = ChiSquare(1.0) chiSquare = ChiSquare()   ########## # Dirac 1D ##########   # Ppal Param: Dirac(p) = Dirac([p]) dirac = Dirac(1.5)  # = Dirac([1.5]) # Default construction => Dirac() = Dirac(0.0) dirac = Dirac() # Dirac 2D # Ppal Param: Dirac(point) = Dirac([p_1, ..., p_n]) dirac = Dirac([0.5, 1.2])   ########## # Dirichlet bivariate ##########   # Ppal Param: Dirichlet([teta_1, theta_2, theta_3]) dirichlet = Dirichlet([1.5, 2.3, 3.4]) # Default construction => Dirichlet() = Dirichlet([1.0, 1.0]) dirichlet = Dirichlet()   ########## # Epanechnikov ##########   # No parameter # Default construction epanechnikov = Epanechnikov()   ########## # Exponential ##########   # Ppal Param : Exponential(lambda, gamma) exponential = Exponential(1.0, 2.0) # Default construction ==> Exponential(lambda, gamma) = Exponential(1.0, 0.0) exponential = Exponential()   ########## # FisherSnedecor ##########   # Ppal Param : FisherSnedecor(d1,d2) fisherSnedecor = FisherSnedecor(100, 100) # Default construction ==> FisherSnedecor(d1,d2) = FisherSnedecor(1,1) fisherSnedecor = FisherSnedecor()   ########## # Gamma ##########   # Ppal Param : Gamma(k, lambda, gamma) gamma = Gamma(3.0, 1.0, 2.0) # Param 1 : (mu, sigma, gamma) is coded by 1 gamma = Gamma(3.0, 1.0, 2.0, 1) # It is also possible to write : gamma = Gamma(3.0, 1.0, 2.0, Gamma.MUSIGMA) # Default construction ==> Gamma(k, lambda, gamma) = Gamma(1.0, 1.0, 0.0) gamma = Gamma()   ########## # Generalized Pareto Distribution ##########   # Ppal Param : GeneralizedPareto(sigma, xi) gpd = GeneralizedPareto(2.4, 0.3)   ########## # Gumbel ##########   # Ppal Param : Gumbel(alpha, beta) gumbel = Gumbel(1.0, 2.0) # Param 1 : (mu, sigma) is coded by 1 gumbel = Gumbel(1.0, 2.0, 1) # It is also possible to write : gumbel = Gumbel(1.0, 2.0, Gumbel.MUSIGMA) # Default construction ==> Gumbel(alpha, beta) = Gumbel(1.0, 1.0) gumbel = Gumbel()   ########## # Histogram ##########   # Example : n = 3, x1 = 0.0 and # (li,hi)_{i=1, ..., 3} = (1.0, 1.0), (4.0, 2.0), (2.0, 3.0) # The heights (hi) are automatically renormalized # Ppal Param : Histogram(x1, (li,hi)_{i=1, ..., n}) collection = HistogramPairCollection(((1.0, 1.0), (4.0, 2.0), (2.0, 3.0), )) histogram = Histogram(0.0, collection)   ########## # InverseGamma ##########   # Ppal Param : InverseGamma(k,lambda) inverseGamma = InverseGamma(1, 2) # Default construction ==> InverseGamma(k, lambda) = InverseGamma(1.0, 1.0) inverseGamma = InverseGamma()   ########## # InverseNormal ##########   # Ppal Param : InverseNormal(lambda, mu) inverseNormal = InverseNormal(1, 2) # Default construction ==> InverseNormal(lambda, mu) = InverseNormal(1.0, 1.0) inverseNormal = InverseNormal()   ########## # InverseWishart ##########   # Ppal Param : InverseWishart(V, nu) V = CovarianceMatrix(2) V[0, 0] = 2.0 V[1, 0] = 1.0 V[1, 1] = 3.0 nu = 3.5 inverseWishart = InverseWishart(V, nu) # Default construction ==> InverseWishart(V, nu) = InverseWishart(Id_1, 1.0) inverseWishart = InverseWishart()   ########## # Laplace ##########   # Ppal Param : Laplace(lambda, mu) laplace = Laplace(2, 0.5) # Default construction ==> Laplace(lambda, mu) = Laplace(1.0, 0.0) laplace = Laplace()   ########## # Logistic ##########   # Ppal Param : (alpha, beta) logistic = Logistic(1.0, 2.0) # Default construction ==> Logistic(alpha, beta) = Logistic(0.0, 1.0) logistic = Logistic()   ########## # LogNormal ##########   # Ppal Param : LogNormal(mu_l, sigma_l,gamma) lognormal = LogNormal(3.0, 2.0, 1.0) # Param 1 : (mu, sigma, gamma) is coded by 1 lognormal = LogNormal(3.0, 2.0, 1.0, 1) # It is also possible to write : lognormal = LogNormal(3.0, 2.0, 1.0, LogNormal.MUSIGMA) # Param 2 : (mu, sigma/mu, gamma, 2) is coded by 2 lognormal = LogNormal(3.0, 2.0, 1.0, 2) # It is also possible to write : lognormal = LogNormal(3.0, 2.0, 1.0, LogNormal.MU_SIGMAOVERMU) # Default construction ==> LogNormal(mu_l, sigma_l,gamma) = LogNormal(0.0, # 1.0, 0.0) logNormal = LogNormal()   # LogUniform # Ppal Param : LogUniform(mu_l, sigma_l,gamma) logUniform = LogUniform(1.0, 2.0) # Default construction ==> LogNormal(mu_l, sigma_l,gamma) = LogNormal(0.0, # 1.0, 0.0) logUniform = LogUniform()   ########## # MeixnerDistribution ##########   # Ppal Param : MeixnerDistribution(alpha, beta, delta, mu) meixnerDistribution = MeixnerDistribution(2.0, 0.5, 1.3, 0.1) # Default construction ==> MeixnerDistribution(alpha, beta, delta, mu) = # MeixnerDistribution(1,0,1,0) meixnerDistribution = MeixnerDistribution()   ########## # Non Central Chi Square ##########   # Ppal Param : NonCentralChiSquare(nu, lambda) nonCentralChiSquare = NonCentralChiSquare(3.0, 2.0) # Default construction ==> NonCentralChiSquare(nu, delta, gamma) = # NonCentralChiSquare(5,0) nonCentralChiSquare = NonCentralChiSquare()   ########## # Non Central Student ##########   # Ppal Param : NonCentralStudent(nu, delta, gamma) = # NonCentralStudent(3.0, 1.0, 0.0) nonCentralStudent = NonCentralStudent(3.0, 1.0, 0.0) # Default construction ==> NonCentralStudent(nu, delta, gamma) = # NonCentralStudent(5.0, 0.0, 0.0) nonCentralStudent = NonCentralStudent()   ########## # Normal(1D) ##########   # Ppal Param : Normal(mu, sigma) = Normal(2.0, 1.0) normal1D = Normal(2.0, 1.0) # Default construction ==> 1D Normal distribution with zero mean and unit # variance : normal1D_standard = Normal()   ########## # NormalGamma(2D) ##########   # Ppal Param : NormalGamma(mu, kappa, alpha, beta) = NormalGamma(1.0, 2.0, # 1.0, 1.0) myNormalGamma = NormalGamma(1.0, 2.0, 1.0, 1.0) # Default construction ==> (mu, kappa, alpha, beta) = (0, 1, 1, 1) normal1D_standard = NormalGamma()   ########## # Normal (nD) ##########   # Ppal Param  : Normal(mu, sigma, R) normal2D_1 = Normal(     NumericalPoint(2, 1.0), NumericalPoint(2, 2.0), CorrelationMatrix(2)) # Param 2 : Normal(mu, C) myCovarianceMatrix = CovarianceMatrix(2) myCovarianceMatrix[0, 0] = 1.0 myCovarianceMatrix[0, 1] = 0.7 myCovarianceMatrix[1, 0] = 0.7 myCovarianceMatrix[1, 1] = 2.0 normal2D_2 = Normal(NumericalPoint(2, 1.0), myCovarianceMatrix) # 2D Normal distribution with zero mean and identity covariance matrix: normal2D_standard = Normal(2)   # In order to create a Normal of dimension n # with 0 mean and Identity variance matrix n = 5 normalStandardnD = Normal(n)   ########## # Rayleigh ##########   # Ppal Param : Rayleigh(sigma, gamma) rayleigh = Rayleigh(2.5, -0.5) # Default construction ==> Rayleigh(sigma, gamma) = Rayleigh(1.0, 0.0) rayleigh = Rayleigh()   ########## # Rice ##########   # Ppal Param : Rice(sigma, nu) rice = Rice(0.5, 2.5) # Default construction ==> Rice(sigma, nu)= Rice(1.0, 0.0) rice = Rice()   ########## # Student (1D) ##########   # Ppal Param = Student(nu, mu, sigma) student = Student(3.0, 2.0, 4.0) # Default construction ==> Student(nu, mu, sigma) = Student(3.0, 0.0, 1.0) student = Student()   ########## # Student (nD) ##########   # Ppal Param = Student(nu, mu, sigma, R) student = Student(     3.0, NumericalPoint(2, 1.0), NumericalPoint(2, 3.0), CorrelationMatrix(2)) # Default construction nD ==> Student(nu, mu) = Student(nu, # NumericalPoint(d, 1.0), NumericalPoint(d, 1.0), CorrelationMatrix(d)) student = Student(4.0, 3)   ########## # Trapezoidal ##########   trapezoidal = Trapezoidal(1.0, 2.0, 3.0, 4.0) # Default construction ==> Trapezoidal(a, b, c, d) = Trapezoidal(-2.0, # -1.0, 1.0, 2.0) trapezoidal = Trapezoidal()   ########## # Triangular ##########   # Ppal Param = Triangular(a,m,b) triangular = Triangular(1.0, 2.0, 4.0) # Default construction ==> Triangular(a, m, b) = Triangular(-1.0, 0.0, 1.0) triangular = Triangular()   ########## # TruncatedNormal ##########   # Ppal Param  = TruncatedNormal(mu_n, sigma_n, a, b) truncatednormal = TruncatedNormal(1.0, 2.0, -1.0, 5.0) # Default construction ==> TruncatedNormal(mu_n, sigma_n, a, b) = # TruncatedNormal(0.0, 1.0, -1.0, 1.0) TruncatedNormal = TruncatedNormal()   ########## # Uniform ##########   # Ppal Param = Uniform(a,b) uniform = Uniform(1.0, 2.0) # Default construction ==> Uniform(a,b) = Uniform(-1.0, 1.0) uniform = Uniform()   ########## # Weibull ##########   # Ppal Param = Weibull(e, beta, gamma) weibull = Weibull(1.0, 2.0, 3.0) # Param 1 = (mu, sigma, gamma) is coded by 1 weibull = Weibull(4.0, 2.0, 3.0, 1) # It is also possible to write : weibull = Weibull(4.0, 2.0, 3.0, Weibull.MUSIGMA) # Default construction ==> Weibull(e, beta, gamma) = Weibull(1.0, 1.0, 0.0) weibull = Weibull()   ########## # Wishart ##########   # Ppal Param : Wishart(V, nu) V = CovarianceMatrix(2) V[0, 0] = 2.0 V[1, 0] = 1.0 V[1, 1] = 3.0 nu = 3.5 wishart = Wishart(V, nu) # Default construction ==> Wishart(V, nu) = Wishart(Id_1, 1.0) wishart = Wishart()   # DISCRETE distributions   ########## # Bernoulli ##########   # Ppal Param : Bernoulli(p) # Default construction ==> Bernoulli(p) =  Bernoulli(0.5) bernoulli = Bernoulli(0.3)   ########## # Binomial ##########   # Ppal Param : Binomial(n,p) # Default construction ==>  Binomial(n,p) =  Binomial(1,0.5) binomial = Binomial(5, 0.7)   ########## # KPermutationsDistribution ##########   # Ppal Param : KPermutationsDistribution(k,n) # Default construction ==>  KPermutationsDistribution() = # KPermutationsDistribution(1,1) kpermutations = KPermutations(2, 3)   ########## # Geometric ##########   # Ppal Param : Geometric(p) # Default construction ==>  Geometric(p) =  Geometric(0.5) geometric = Geometric(0.3)   ########## # Multinomial ##########   # Ppal Param : Multinomial(N, (p_i)_{i=1, ..., n}) multinomial = Multinomial(5, NumericalPoint(4, 0.25))   ########## # NegativeBinomial ##########   # Ppal Param : NegativeBinomial(r,p) # Default construction ==>  NegativeBinomial(r,p) =  NegativeBinomial(1,0.5) negativeBinomial = NegativeBinomial(5, 0.7)   ########## # Poisson ##########   # Ppal Param : Poisson(lambda) # Default construction ==>  Poisson(lambda) =  Poisson(1) poisson = Poisson(0.3)   ########## # Skellam ##########   # Ppal Param : Skellan(lambda1, lambda2) # Default construction ==>  Skellan(lambda1, lambda2)=Skellan(1.,1.) skellam = Skellam(1.2, 3.4)   ########## # UserDefined (nD), n=2 ##########   # We create a collection of pair (xi, pi), i=1,2,3, each xi in R^2 # Weights are not complusory normalized to 1 # OpenTURNS automatically normalizes them   # Give the range of the distribution # First pair : (x1 = (1.0, 1.5), p1 = 3) # Second pair : (x2 = (2.0, 2.5), p2 = 3) # Third pair : (x3 = (3.0, 3.5), p3 = 4) range = NumericalSample([[1.0, 1.5], [2.0, 2.5], [3.0, 3.5]])   # Give the list of the respective weights # First pair :  p1 = 3 # Second pair : p2 = 3 # Third pair : p3 = 4 weights = NumericalPoint([3, 3, 4])   # Create the UserDefined distribution distribution = UserDefined(range, weights)   ########## # ZipfMandelbrot ##########   # Ppal Param : ZipfMandelbrot(N,q,s) # Default construction ==>  ZipfMandelbrot(N,q,s) = ZipfMandelbrot(1,0,1) zipfMandelbrot = ZipfMandelbrot(3, 0.3, 2.4)

Refer to the Reference Guide to get graphs of the distributions pdf.


1.1.2 UC : Creation of a truncated distribution

The objective of the Use Case is to truncate a 1D distribution already defined. OpenTURNS enables to truncate the distribution in its lower area, or its upper area or in both lower and upper areas. After having truncated a distribution, it is possible to recuperate the initial distribution thanks to the method getDistribution().

Let's consider X a random variable with respectively F X and p X its cumulative and probability density functions, and (a,b)±. The random variable Y=X/[a,b] which is the random variable X given that X[a,b] is defined by the following cumulative and probability density functions F Y and p Y :

y,F Y (y)=X<y/X[a,b]=1foryb,0forya,F X (y)-F X (a) F X (b)-F X (a)fory[a,b]
y,p Y (y)=0foryborya1 F X (b)-F X (a)p X (y)fory[a,b]
Requirements  

some lower and upper bounds : myLowerBound, myUpperBound

type:

reals

a scalar distribution : myEntireDist

type:

a Distribution which implementation is UsualDistribution or ComposedDistribution or Mixture

 
Results  

a distribution : myTruncatedDistribution

type:

a TruncatedDistribution

 

Python script for this UseCase :

script_docUC_InputNoData_TruncatedDist.py

##################################### # CASE 1 : Truncate the distribution # within the range [myLowerBound, myUpperBound] #####################################   myTruncatedDistribution = TruncatedDistribution(     myEntireDist, myLowerBound, myUpperBound)   # See the result on the PDF myTruncatedDistribution.setName('my truncated dist') # View(myTruncatedDistribution.drawPDF()).show()   ##################################### # CASE 2 : Truncate the distribution

Figures 1 and 2 show the PDF and CDF of the truncated distributions of a Logistic(α=1.0, β=2.0) respectively within the ranges [4.0,[, [-2.0,5.0] and [-,3.0].

PDF of several truncated Logistic distributions
Figure 1
CDF of several truncated Logistic distributions
Figure 2

1.1.3 UC : List of Copula

The objective of this Use Case is to manipulate copulas of OpenTURNS.

A copula may be considered as the restriction to [0,1] n of a distribution with uniform 1D marginals on [0,1] and this copula as copula. That's why an object of type Copula offers the same methods as an object of type Distribution (see U.C. 1.1.7 to have the list of the methods).

Details on copula may be found in the Reference Guide ( see file Reference Guide - Step B – Copula ).

OpenTURNS proposes the copulas listed in Table.3. Table.2 gives the main copula features, where AMH means Ali-Mikhail-Haq and FGM means Farlie-Gumbel-Morgenstern.

Copula   Archimedean   Independent   Elliptical   Other  
AMH   ×   for θ=0      
Clayton   ×   for θ=0      
ComposedCopula     if each copula is Independent   if each copula is Normal   ×  
FGM     for θ=0     ×  
Frank   ×   for θ=0      
Gumbel   ×   for θ=1      
Independant   ×   ×   ×    
Min         ×  
Normal     for R ̲ ̲=I ̲ ̲   ×    
SklarCopula   if coming from a distribution with Archimedean copula   the same   the same   ×  
Features of the OpenTURNS copula.
Table 2

Name   Dimension   C(u 1 ,,u n )   Parameters  
AMH   2   u 1 u 2 1-θ(1-u 1 )(1-u 2 )   |θ|<1  
Clayton   2   u 1 -θ +u 2 -θ -1 -1/θ   θ0  
FGM   2   u 1 u 2 (1+θ(1-u 1 )(1-u 2 ))   θ[-1,1]  
Frank   2   ( -1 θlog1+(e -θu 1 -1)(e -θu 2 -1 e -θ -1forθ0u 1 u 2 forθ=0 (   θ>0  
Gumbel   2   exp-(-log(u 1 )) θ +(-log(u 2 )) θ 1/θ   θ1  
Independent   n   i=1 n u i   n  
Min   n   min(u 1 ,,u n )   n  
Normal   2   - Φ -1 (u 1 ) - Φ -1 (u 2 ) 1 2π1-ρ 2 exp-s 2 -2ρst+t 2 2(1-ρ 2 )dsdt   R ̲ ̲=1ρρ1 ( ρ(-1,1)  
Normal   n   - Φ -1 (u 1 ) - Φ -1 (u n ) 1 (2π) n/2 det(R ̲ ̲)exp-1 2x ̲ t R ̲ ̲ -1 x ̲dx ̲   R ̲ ̲, SPD  
Sklar   n   FF 1 -1 (u 1 ),,F n -1 (u n )   -  
Expressions of the copulas of OpenTURNS.
Table 3

Furthermore, OpenTURNS allows to extract the copula C from any n dimensional distribution, thanks to the inverse of the Sklar theorem :

C(u 1 ,,u n )=F(F 1 -1 (u 1 ),,F n -1 (u n ))

where F is the cumulative density function of the distribution and F i its respective marginals. This copula is denoted Sklar Copula within OpenTURNS.

OpenTURNS also allows to create some copula as the product of other copulas : if C 1 and C 2 are two copulas respectively of random vectors in n 1 and n 2 , we can create the copula of a random vector of n 1 +n 2 , noted C as follows :

C(u 1 ,,u n )=C 1 (u 1 ,,u n 1 )C 2 (u n 1 +1 ,,u n 1 +n 2 )

It means that both subvectors (u 1 ,,u n 1 ) and (u n 1 +1 ,,u n 1 +n 2 ) of n 1 and n 2 are independent.

OpenTURNS also creates some mixtures of copulas (the density is a linear combination of some copulas densities) thanks to the object Mixture (see the use case ). Note that the result still remains a copula.

OpenTURNS also builds ordinal sums of copulas from a collection of n copulas (C 1 ,,C n ), each one being of the same dimension d and a collection of bounds (α 1 ,,α n+1 ) in ]0,1[. The ordinal sums of copulas is defined by:

C(u ̲)=α i +C i u 1 -α i α i+1 -α i ifu ̲[α i ,α i+1 [M d (u ̲)else (2)

with M d the Min-copula: M d (u ̲)=min k=1d u k and where, for convenience, we noted α 0 =0 and α n =1.

Note that if α i =α i+1 then the copula C i+1 is erased from the list, for i=0n-1.

Requirements  

none

 
Results  

all the copula named according to the rule : typeCopula

type:

usual copula (for example NormalCopula,...)

a composed copula : finalCopula

type:

ComposedCopula

 

Python script for this UseCase :

script_docUC_InputNoData_Copula.py

###################### # ALIMIKHAILHAQ copula ######################   # For example, theta = -0.5 theta = -0.5 amhCopula = AliMikhailHaqCopula(theta)   ################ # CLAYTON copula ################   # Only for dimension = 2 theta = -0.5 claytonCopula = ClaytonCopula(theta)   ################# # COMPOSED copula #################   # For example, the GumbelCopula concatenated to a Clayton one # Create the collection of copulas with default parameters finalCopula = ComposedCopula([GumbelCopula(), ClaytonCopula()])   ########################### # FARLIE MORGENSTEIN copula ###########################   # For example, theta = 0.5 theta = 0.5 fgm = FarlieGumbelMorgensternCopula(theta)   ############## # FRANK copula ##############   # Only for dimension = 2 # Frank copula is parameterized by theta without restriction # For example, theta = 9.2 theta = 9.2 frankCopula = FrankCopula(theta)   ############### # GUMBEL copula ###############   # Only for dimension = 2 # Gumbel copula is parameterized by theta without restriction # For example, theta = 2.5 theta = 2.5 gumbelCopula = GumbelCopula(theta)   #################### # INDEPENDENT copula ####################   # Independent Copula parameterized by its dimension # For example, dimension = 3 dim = 3 independentCopula = IndependentCopula(dim)   ############ # MIN Copula ############   # For example, dimension = 3 dim = 3 minCopula = MinCopula(dim)   ################ # NORMAL copula ################   ################ # Case 1 : # Normal Copula parameterized by its correlation matrix R   # For example, dimension = 3 dim = 3 R = CorrelationMatrix(dim) for i in range(dim - 1):     R[i, i + 1] = 0.8   # It is possible to define the correlation matrix through a map # and then use the function getCorrelationMatrixFromMap to convert # the map into a CorrelationMatrix # if input variables are noted X,Y and Z vars = ['X', 'Y', 'Z'] correlationMap = {} correlationMap['X'] = {} correlationMap['X']['X'] = 1.0 correlationMap['X']['Y'] = 0.8 correlationMap['X']['Z'] = 0.8 correlationMap['Y'] = {} correlationMap['Y']['X'] = 0.8 correlationMap['Y']['Y'] = 1.0 correlationMap['Y']['Z'] = 0.8 correlationMap['Z'] = {} correlationMap['Z']['X'] = 0.8 correlationMap['Z']['Y'] = 0.8 correlationMap['Z']['Z'] = 1.0 R = getCorrelationMatrixFromMap(vars, correlationMap)   # Create a normal copula from the correlation matrix R normalCopula = NormalCopula(R) normalCopula.setName("a normal copula")   ################ # Case 2 : # Create a normal copula from the Spearman rank correlation matrix S   # For example, dimension = 3 dim = 3 S = CorrelationMatrix(dim) for i in range(1, dim):     S[i, i - 1] = 0.25   # Create the correlation matrix R of the  normal copula # from the Spearman correlation matrix S R = NormalCopula.GetCorrelationFromSpearmanCorrelation(S)   # Create the normal copula from the R correlation matrix normalCopula = NormalCopula(R) normalCopula.setName("another normal copula")   ################ # Case 3 : # Normal Copula parameterized by its dimension   # Correlation matrix R is equal to identity dim = 3 normalCopula = NormalCopula(dim)   ############## # SKLAR Copula ##############   # For example, the copula of a 2d Student distribution # Student(nu, mu, sigma, R) myStudent = Student(     3.0, NumericalPoint(2, 1.0), NumericalPoint(2, 3.0), CorrelationMatrix(2)) sklarCopula = SklarCopula(myStudent)   #################### # MIXTURE of copulas ####################   # For example, the mixture of a GumbelCopula and a Clayton one # with parameters 0.2 and 0.6 copulaList = [GumbelCopula(4.5), ClaytonCopula(2.3)] parameters = [0.2, 0.6] # Create the mixture of copulas finalCopula = Mixture(copulaList, parameters) print(finalCopula.isCopula())   ######################## # ORDINAL SUM of copulas ########################   # For example, a GumbelCopula and a NormalCopula # in dimension 2 with default parameters # where the GumbelCopula is squeezed in [0,0.3] # where the NormalCopula is squeezed in [0.3, 1] ordinalSumCop = OrdinalSumCopula([GumbelCopula(2), NormalCopula(2)], [0.3])

Refer to the Reference Guide to get the graphs of the pdf iso-curves of each copula.


1.1.4 UC : Creation of nD distribution from (marginals, copula)

The objective of the Use Case is to model a distribution, described by its marginal distributions and its dependence structure (a particular copula). A simplified way is proposed when the copula is the independent one.

Details on copula may be found in the Reference Guide ( see file Reference Guide - Step B – Copula ).

This Use Case is particularly adapted to the modelisation of the distribution of the input random vector.

The example here is a distribution of dimension 3 defined by :

  • Beta, Triangular and Uniform marginals,

  • an independent copula.

Requirements  

none

 
Results  

a nD distribution : myDistribution

type:

Distribution which implementation is a ComposedDistribution

 

Python script for this UseCase :

script_docUC_InputNoData_ComposedDistribution.py

# Create the first marginal : Weibul(mu, sigma, gamma) = Weibull(2.0, 1.0, 0.0) weibDist = Weibull(2.0, 1.0, 0.0, Weibull.MUSIGMA) weibDist.setName("myWeibull")   # Create the second marginal : Triangular(a,m,b) = Triangular(1.0, 3.0, 5.0) triangularDist = Triangular(1.0, 3.0, 5.0) triangularDist.setName("myTriangular")   # Create the third marginal : Uniform(a,b) = Uniform(2.0, 4.0) uniformDist = Uniform(2.0, 4.0) uniformDist.setName("myUniform")   # Create a collection of distribution of dimension 3 # using List python aCollection = [weibDist, triangularDist, uniformDist]   ############################# # CASE 1 : independent copula ############################# # It is not necessary to specify the copula # Create the distribution # Create a collection of distribution of dimension 3 # using List python myDistribution = ComposedDistribution(aCollection)   ############################# # CASE 2 : other copula ############################# # Create a copula : Normal copula of dimension 3 fom Spearman rank # correlation matrix spearmanMatrix = CorrelationMatrix(3) spearmanMatrix[0, 1] = 0.25 spearmanMatrix[1, 2] = 0.25 aCopula = NormalCopula(     NormalCopula.GetCorrelationFromSpearmanCorrelation(spearmanMatrix)) aCopula.setName("Normal copula")   # Create the distribution myDistribution = ComposedDistribution(aCollection, aCopula)   # Give a Description to the Distribution myDistribution.setDescription(("dist1", "dist2", "dist3"))  


1.1.5 UC : Creation of a nD distribution from a Mixture

In OpenTURNS, a Mixture is a distribution which probability density function is a linear combination of probability density functions.

The objective of this Use Case is to model a distribution, defined as a mixture :

p(x ̲)= i=1 n a i p i (x ̲) (3)

where

i=1 n a i =1 (4)

and

i,a i 0 (5)

OpenTURNS automatically normalizes the weights so the User can give a i which don't verify the constraint (4).

By default, the weights are taken equal to 1 n.

The example here is a mixture of three univariate distributions: Triangular(1.0, 2.0, 4.0), Normal(-1.0, 1.0) and Uniform(5.0, 6.0), with respective weights : (0.2, 0.3, 0.5).

The PDF and CDF graphs the mixture distribution are drawn in Figures 3 and 4.

Note that it is possible to create a mixture of copula. The result still remains a copula.

Requirements  

some distributions : dist1, dist2

type:

Distribution

some copulas: cop1, cop2

type:

Distribution

 
Results  

a mixture distribution : myMixtDist

type:

Mixture

a mixture copula : myMixtCop

type:

Mixture

 

Python script for this UseCase :

script_docUC_InputNoData_Mixture.py

############################################ # CASE 1: Create a mixture of distributions   # Create the collection of distributions aCollection = [dist1, dist2]   # Create the collection of the weights myWeights = [0.20, 0.80]   # Create the mixture myMixture = Mixture(aCollection, myWeights)   # Alternate definition of the weights, not normalized myWeights2 = [2.0, 5.0] # The normalization is done automatically myMixture2 = Mixture(aCollection, myWeights)   ############################################ # CASE 2: Create a mixture of copulas   # Create the collection of copulas aCollection = [cop1, cop2]   # Create the collection of the weights myWeights = [0.20, 0.80]   # Create the mixture myMixture = Mixture(aCollection, myWeights)

PDF of the Mixture distribution = 0.2*Triangular(1.0, 2.0, 4.0) + 0.5*Normal(-1.0, 1.0) + 0.3*Uniform(5.0, 6.0)
Figure 3
CDF of the Mixture distribution = 0.2*Triangular(1.0, 2.0, 4.0) + 0.5*Normal(-1.0, 1.0) + 0.3*Uniform(5.0, 6.0)
Figure 4

1.1.6 UC : Creation of 1D distribution from a 1D distribution

The objective of the Use Case is to create a distribution, defined as the push-forward distribution of a scalar distribution by a transformation .

If we note 0 a scalar distribution, f: a mapping, then OpenTURNS enables to easily create the push-forward distribution defined by:

=f( 0 ) (6)

which means that for all scalar random variable X following the distribution 0 , Y=f(X) follows the distribution .

Note that for the following usual transformations f, a simplified notation has been implemented: cos, sin, tan, acos, asin, atan, cosh, sinh, tanh, acosh, asinh, atanh, exp, log, ln, pow, inverse, sqr, sqrt, cbrt, abs. If the operators are chained, they are applied from left to right. See the foolowing script for some examples.

Requirements  

a scalar distribution : myAntecedentDist

type:

Distribution

a mapping : f

type:

NumericalMathFunction

 
Results  

a scalar distribution : myImageDist

type:

CompositeDistribution

 

Python script for this UseCase :

script_docUC_InputNoData_CompositeDistribution.py

  # Create the push-forward distribution myDistribution = CompositeDistribution(f, myAntecedentDist)   # Use the simplified construction myDistribution2 = myAntecedentDist.exp()   # Use chained operators myDistribution3 = myAntecedentDist.abs().sqrt()  

The following figures illustrate the cases where 0 is the Normal() distribution and f the function defined by:

  • f(x)=sin(x)+cos(x) in the figure 5;

  • f(x)=exp(x) in the figure 5;

  • f(x)=|x| in the figure 6.

PDF of the push-forward distribution of a Normal() mapped through f(x)=sin(x)+cos(x)
PDF of the push-forward distribution of a Normal() mapped through f(x)=exp(x)
Figure 5
PDF of the push-forward distribution of a Normal() mapped through f(x)=|x|
Figure 6

1.1.7 UC : Manipulation of a distribution

The objective of this Use Case is to describe the main functionalities that OpenTURNS enables to manipulate a distribution of dimension n1.

Let's note X ̲=(X 1 ,,X n ) the random vector associated to that distribution, which PDF is note p. OpenTURNS enables :

  • to ask for the dimension, with the method getDimension;

  • if n>1, to extract the extracted distribution of dimension k<n corresponding to k 1D marginals, with the method getMarginal;

  • to get the copula, with the method getCopula: if the distribution is of type ComposedDistribution, the copula is the one specified at the creation of the ComposedDistribution. If the distribution is not that sort (for example, a KernelMixture, a Mixture, a RandomMixture), the copula is computed from the Sklar theorem;

  • to ask for some properties on the copula, with the method hasIndependentCopula, hasEllipticalCopula, only for the types Usual Distribution and ComposedDistribution (defined from the 1D marginals and a copula);

  • to evaluate the mean vector (potentially of dimension 1), the covariance matrix (potentially of dimension 1×1), the standard deviation, skewness and kurtosis vectors (potentially of dimension 1), with the methods getMean, getStandardDeviation, getCovariance, getKurtosis, getSkewness, defined by the following expressions :

    E ̲[X ̲]=(E[X 1 ],,E[X n ])StdDev ̲ ̲[X ̲]=(E[(X 1 -E[X 1 ]) 2 ],,E[(X n -E[X n ]) 2 ])Cov ̲ ̲[X ̲]=(E(X i -E[X i ])(X j -E[X j ])) i,j skewness ̲[X ̲]=(E(X 1 -E[X 1 ]) Var X 1 3 ,,E(X n -E[X n ]) Var X n 3 )kurtosis ̲[X ̲]=(E(X 1 -E[X 1 ]) Var X 1 4 ,,E(X n -E[X n ]) Var X n 4 )
  • to evaluate the roughness, with the method getRoughness, defined by :

    roughness(X ̲)=||p|| 2 = x ̲ p 2 (x ̲)dx ̲
  • to get one realization or simultaneously n realizations, with the method getRealization, getSample,

  • to evaluate the Cumulative Distribution Function (CDF), the complementary CDF, the Probability Density Function (PDF) on a scalar (1D distribution only), on a point or on a numerical sample with the method computeCDF, computePDF,

  • to evaluate the probability content of a given interval, with the method computeProbability,

  • to evaluate a quantile (or a group of quantiles) or a complementary quantile, (or a group of complementary quantile) with the method computeQuantile. If the distribution if of dimension n>1, the p- quantile is the hyper surface in n defined by {x ̲ n ,F(x 1 ,,x n )=p} where F is the CDF. OpenTURNS makes the choice to return one particular point among these points : (x 1 p ,,x n p ) such that i,F i (x i p )=τ where F i is the marginal of component X i and F(x 1 ,,x n )=C(τ,,τ) where C is the distribution copula. Thus, OpenTURNS resolves the equation C(τ,,τ)=p then computes F i -1 (τ)=x i p .

    Note that the point (x 1 p ,,x n p ) belongs to the curve (F 1 -1 (s),,F n -1 (s)). In dimension 2, OpenTURNS draws the curve s(F 1 -1 (s),F 2 -1 (s)) and points the quantile of order p, with p[p min ,p max ] discretized with N points: see the method drawQuantiles.

  • to evaluate the derivative of the CDF or PDF with respect to the parameters of the distribution at a particular point, with the methods computeCDFGradient, computePDFGradient,

  • to evaluate the characteristic function of the distribution, only for the following distributions : Chi2, Gamma, Laplace, Logistic, univariate Normal, Rayleigh, Triangular, univariate TruncatedNormal, Uniform, KernelMixture (which includes a KernelSmoothing without bound treatment), Mixture, RandomMixture;

  • to draw :

    • for a 1D distribution : the PDF and CDF curves, with the methods drawPDF, drawCDF,

    • for a 2D distribution : the PDF and CDF iso-curves, with the methods drawPDF, drawCDF, and the PDF and CDF curves of its 1D marginals, with the methods drawMarginal1DPDF, drawMarginal1DCDF ,

    • for a nD with n3 distribution : the PDF and CDF of each 1D marginal, with the methods drawMarginal1DPDF, drawMarginal1DCDF and the PDF and CDF iso-curves for a specified 2D marginal, with the methods drawMarginal2DPDF, drawMarginal2DCDF.

Let's note that it is possible to visualize a graph whithin the TUI without creating the associated files, thanks to the command Show.

Requirements  

three distributions of dimension 1, 2 and 3 : dist_1, dist_2, dist_3

type:

Distribution

 
Results  

graphes, numerical points, ...

 

Python script for this UseCase :

script_docUC_InputNoData_DistManipulation.py

# Get the dimension fo the distribution dim = dist_2.getDimension() print("Dimension of the distribution = ", dim)   # Get one marginal distribution # Care : the numerotation begins at 0 marginal_2 = dist_2.getMarginal(1)   # The marginal distribution of several components # Put the indices of the concerned components together # for example, the two first components of dist_3 marginal_12 = dist_3.getMarginal((0, 1))   # Get the copula copula = dist_2.getCopula()   # Ask some properties on the copula print("hasIndependentCopula ? ", dist_2.hasIndependentCopula()) print("hasEllipticalCopula ? ", dist_2.hasEllipticalCopula())   # Get the mean vector of the distribution meanVector = dist_2.getMean()   # Get the covariance matrix of the distribution meanVector = dist_2.getCovariance()   # Get the kurtosis vector of the distribution kurtosisVector = dist_2.getKurtosis()   # Get the standard deviation vector of the distribution standardDeviationVector = dist_2.getStandardDeviation()   # Get the skewness vector of the distribution skewnessVector = dist_2.getSkewness()   # Get one realization of the distribution oneRealisationVector = dist_2.getRealization()   # Get several realizations of the distribution # For example, 100 ones OneHundred_realizations = dist_2.getSample(100)   # Evaluate the CDF and PDF # For example, at the mean point dist_2.computeCDF(dist_2.getMean()) dist_2.computePDF(dist_2.getMean()) dist_2.computeComplementaryCDF(dist_2.getMean())   # For the evaluation on a NumericalSample dist_2.computeCDF(dist_2.getSample(10)) dist_2.computePDF(dist_2.getSample(10))   # Evaluate the probability content of an interval # in dimension 1 interval = Interval(-2.0, 3.0) probability = dist_1.computeProbability(interval)   # in dimension 2 : [xmin, ymin], [ymin, ymax] interval = Interval([1.2, -1], [3.4, 2]) probability = dist_2.computeProbability(interval)   # Evaluate the quantile of order p # For example, the quantile 90% quantile_Vector_90 = dist_2.computeQuantile(0.90) # and the quantile of order 1-p quantile_Vector_10 = dist_2.computeQuantile(0.90, True)   # Evaluate the quantiles of order p et q # For example, the quantile 90% and 95% quantile_List = dist_1.computeQuantile([0.90, 0.95]) # and the quantile of order 1-p and 1-q compQuantile_List = dist_1.computeQuantile([0.90, 0.95], True)   # Evaluate the characteristic function of the distribution at pointVector # Only for dimension 1 complexResult = dist_1.computeCharacteristicFunction(dist_1.getMean()[0])   # Evaluate the derivatives of the PDF/CDF with respect to the parameters at a particular point # For example, with dimension 2, at the mean point derivatives_PDF_Vector = dist_2.computePDFGradient(dist_2.getMean()) derivatives_CDF_Vector = dist_2.computeCDFGradient(dist_2.getMean())   ##################################### # GRAPH 1 : Draw the PDF (CDF) # for a distribution of dimension 1 #####################################   # No specification of support PDF_1D_graph = dist_1.drawPDF() # Or on [-3, 3] PDF_1D_graph = dist_1.drawPDF(-3., 3.)   # Or impose a bounding box : x-range and y-range # boundingBox = [xmin, xmax, ymin, ymax] myBoundingBox = NumericalPoint((-3., 3., 0, 0.5)) PDF_1D_graph.setBoundingBox(myBoundingBox)   # In order to see the graph without creating the associated files # View(PDF_1D_graph).show()   # Create the files corresponding to the graph # the files .EPS, .PNG and .FIG are created in the current python session PDF_1D_graph.draw("PDF_graph")   # Or only the .EPS file # 640 and 480 are the pixels number in both axes PDF_1D_graph.draw("PDF_graph", 640, 480, GraphImplementation.EPS)   ##################################### # GRAPH 2 : Draw the PDF (CDF) iso-curves # for a distribution of dimension 2 #####################################   # No specification of support PDF_graph = dist_2.drawPDF()   # Or Specify the support pointMin and pointMax # the graph will be drawn in the box with # low-left corner : pointMin= (xmin, ymin) # and up-right corner : pointMax = (xmax, ymax) pointMin = NumericalPoint((-3., 0.0)) pointMax = NumericalPoint((3., 3.5))   # Specify the point number in each direction (curve look) pointNumber = Indices((201, 201))   PDF_graph = dist_2.drawPDF(pointMin, pointMax, pointNumber)   # To impose a bounding box : see Graph 1   # Change the default levels where the contours are drawn # Define the levels for example nlev = 31 levels = NumericalPoint(nlev) for i in range(nlev):     levels[i] = 0.25 * nlev / (1.0 + i)     # Change them in the contour drawable     PDF_graph_contour = PDF_graph.getDrawable(0)     PDF_graph_contour.setLevels(levels)     # If you don't need the labels on the iso curves     PDF_graph_contour.setDrawLabels(False)   PDF_graph.setDrawable(PDF_graph_contour, 0)   # To visualize the graph : see Graph 1   ##################################### # GRAPH 3 : Draw the PDF (CDF) # of the 1D marginals for a distribution of dimension >=2 #####################################   # For example, marginal 1 # Care : the numerotation begins at 0 PDF_graph = dist_3.drawMarginal1DPDF(1, -5.0, 5.0, 101)   # To specify the support  : see Graph 1 # To impose a bounding box : see Graph 1   # To visualize the graph : see Graph 1   ##################################### # GRAPH 4 : Draw the PDF (CDF) # iso-curves for a distribution of dimension n>2 #####################################   # For example, the marginals i and j # Care : the numerotation begins at 0   # Specify the support pointMin = (xmin, ymin) # and  pointMax = (xmax, ymax) # and the number of points of the curve (all vectors) pointMin = NumericalPoint((-3., 0.0)) pointMax = NumericalPoint((3., 3.5)) pointNumber = Indices((101, 101)) PDF_graph = dist_3.drawMarginal2DPDF(0, 1, pointMin, pointMax, pointNumber)   # To specify the support  : see Graph 1   # To impose a bounding box : see Graph 1   #  To Change the default levels where # the contours are drawn : see Graph 2   ##################################### # GRAPH 4 : Draw the quantile curve #####################################   # Define the range and the number of points qMin = 0.2 qMax = 0.6 nbrPoints = 101 quantileGraph = dist_2.drawQuantile(qMin, qMax, nbrPoints)  

We draw respectively in Figures 7 and 8 the iso-curves of the PDF of the two following distributions :

  • Distribution 1 : Mixture of Normal distributions of dimension 2

  • Distribution 2 : Composed Distribution, with a Gumbel copula and each marginal some mixture of normals of dimension 1.

Iso-curves of the PDF of Distribution 1 : Mixture of Normal distributions of dimension 2.
Figure 7
Iso-curves of the PDF of Distribution 2 : Composed Distribution, with a Gumbel copula and each marginal some mixture of normals of dimension 1.
Figure 8

In the Figure 9, we draw the quantile curve pF 1 -1 (p) where p[0.1,0.9] of the univariate 𝒩(0,1) distribution.

In the Figure 9, we draw the curve s(F 1 -1 (s),F 2 -1 (s)) and we point the quantiles of order p[0.1,0.9] regularly discretized into 101 points, of the bivariate distribution defined by: the margins are respectively 𝒩(0,1) and Triangular(0.0, 2.0, 3.0) and the copula is the Clayton copula(θ=2.3).

A quantile curve in dimension 1.
A quantile curve in dimension 2.
Figure 9

1.1.8 UC : Creation of a custom distribution or copula from the python script

The objective of this Use Case is to describe how to create one own distribution or copula from a python script.

The principle is to inherit from the PythonDistribution class and overload the methods of the Distribution object.

To create a Copula, one has to implement return = True in the method isCopula(). Care: it is under the User's responsability.

Then an instance of the new class can be passed on into a Distribution object.

At least computeCDF should be overriden.

Requirements  

at least a CDF function expressed in python.

 
Results  

an object proposing the same services as Distribution does.

type:

Distribution

 

Python script for this UseCase :

script_docUC_InputNoData_CustomDistFromPython.py

from math import *   class PDIST(PythonDistribution):       def __init__(self, a=[0.0], b=[1.0]):         super(PDIST, self).__init__(len(a))         self.a = a         self.b = b       def getRange(self):         return [self.a, self.b, [True] * len(self.a), [True] * len(self.a)]       def getRealization(self):         X = []         for i in range(len(self.a)):             X.append(                 self.a[i] + (self.b[i] - self.a[i]) * RandomGenerator.Generate())         return X       def getSample(self, size):         X = []         for i in range(size):             X.append(self.getRealization())         return X       def computeCDF(self, X):         cdf = 1.0         for i in range(len(self.a)):             cdf *= min(1.0,                        max(0.0, (X[i] - self.a[i]) / (self.b[i] - self.a[i])))         return cdf       def computePDF(self, X):         pdf = 1.0         for i in range(len(self.a)):             if (X[i] < self.a[i]) or (X[i] > self.b[i]):                 return 0.0             pdf /= (self.b[i] - self.a[i])         return pdf       def getMean(self):         mu = []         for i in range(len(self.a)):             mu.append(0.5 * (self.a[i] + self.b[i]))         return mu       def getStandardDeviation(self):         stdev = []         for i in range(len(self.a)):             stdev.append((self.b[i] - self.a[i]) / sqrt(12.))         return stdev       def getSkewness(self):         return [0.] * len(self.a)       def getKurtosis(self):         return [1.8] * len(self.a)       def getStandardMoment(self, n):         if n % 2 == 1:             return [0.] * len(self.a)         return [1. / (n + 1.)] * len(self.a)       def getMoment(self, n):         return [-0.1 * n] * len(self.a)       def getCenteredMoment(self, n):         return [0.] * len(self.a)       def computeCharacteristicFunction(self, x):         if len(self.a) > 1:             raise ValueError('dim>1')         ax = self.a[0] * x         bx = self.b[0] * x         return (sin(bx) - sin(ax) + 1j * (cos(ax) - cos(bx))) / (bx - ax)       def isElliptical(self):         return (len(self.a) == 1) and (self.a[0] == -self.b[0])       def isCopula(self):         for i in range(len(self.a)):             if self.a[i] != 0.0:                 return False             if self.b[i] != 1.0:                 return False         return True   R = PDIST([1.0, 1.5], [2.0, 3.5]) myDist = Distribution(R) #Show(myDist.drawPDF([0.0, 0.5], [3.0, 4.5]))  


1.1.9 UC : Creation of a custom random vector from the python script

The objective of this Use Case is to describe the main functionalities enabling to define a random vector from the python script.

The principle is to inherit from the PythonRandomVector class and overload the methods of the RandomVector object.

Then an instance of the new class can be passed on into a RandomVector object.

At least getRealization should be overriden.

Requirements  

at least a realization function expressed in python.

 
Results  

an object proposing the same services RandomVector does.

type:

RandomVector

 

Python script for this UseCase :

script_docUC_InputNoData_CustomRandVectFromPython.py

class RVEC(PythonRandomVector):       def __init__(self):         PythonRandomVector.__init__(self, 2)         self.setDescription(['R', 'S'])       def getRealization(self):         X = [RandomGenerator.Generate(), 2 + RandomGenerator.Generate()]         return X       def getSample(self, size):         X = []         for i in range(size):             X.append(                 [RandomGenerator.Generate(), 2 + RandomGenerator.Generate()])             return X       def getMean(self):         return [0.5, 2.5]       def getCovariance(self):         return [[0.0833333, 0.0], [0.0, 0.0833333]]   R = RVEC() myRV = RandomVector(R)


1.2 With samples of data : manipulation of data

It is important to note that all the Use Cases described in this section are usefull to fit a distribution from a sample in order to model the random input vector. However, it is possible to apply them to fit a distribution to the output variable of interest when described by a sample.

1.2.1 UC : Import / Export data from a file at format CSV (Comma Separated Value)

The objective of this Use Case is to import a file at format CSV containing a list of data and to export a NumericalSample into a file at format CSV.

To be a proper sample file, the following rules must be respected :

  • Data are presented in line : each line corresponds to the realization of the random vector. The number of lines is the size of the sample. The number of data on each line is the dimension of the sample.

  • Data must be separated by the same specific character, ";" by default. To change the separator, you must use either the ResourceMap class or specify it in the export or Import methods.

  • If a line does not have the same number of data as the first valid line in the file, it is disregarded.

  • The format of a data is either an integer value (2 or -5 for example), a floating-point value in decimal notation (-1.23 or 4.56 for example) or in scientific notation (-1.2e3 or 3.4e-5 for example).

When a line presents an error, the line is ignored but all the right ones are taken into account. The number of lines which don't follow the previous rules are signaled and the reason of the discard is given in the logs. To see then, you must use the Log class. There can be any number of white spaces or tabulations between the data and the separator, and the lines can be ended in a UNIX-like fashion or a Windows-like fashion.

Requirements  

a file containing data : sampleFile.csv

type:

a CSV format file respecting rules explicited before

or a numerical sample to be stored : mySampleToBeStored

type:

a NumericalSample

 
Results  

the sample issued from the data file sampleFile.csv: aSample

type:

a NumericalSample

a file containingmySampleToBeStored: mySampleStoredFile.csv

type:

a CSV format file respecting rules explicited before

 

Python script for this UseCase :

script_docUC_InputWithData_CSV.py

################### # Case 1 : import a CSV file ###################   # We give in argument of the static method ImportFromCSVFile() # the absolute adress of the file sampleFile.csv # for example : /tmp/sampleFile.csv # if only the name sampleFile.csv is fulfilled, # OpenTURNS looks for the file in the current directory   # Specify the separator at import time: for example "|" aSample = NumericalSample.ImportFromCSVFile("sampleFile.csv", "|")   # Or change the separator from its default value ";" to the value "|" # inside the ResourceMap ResourceMap.Set("csv-file-separator", "|")   # Then Import with the default separator aSample = NumericalSample.ImportFromCSVFile("sampleFile.csv")   # To see the warning messages Log.Show(Log.INFO)   ################### # Case 1 : export to a CSV file ###################   # We give in argument of the dynamic method exportToCSVFile # the absolute path where the storing file mySampleStoredFile.csv # will be created # for example : /tmp/mySampleStoredFile.csv # if only the name mySampleStoredFile.csv is given # OpenTURNS creates the file in the current directory # with the separator declared in the RessourceMap aSample.exportToCSVFile("mySampleStoredFile.csv")   # Export with a specified separator: for example ":" aSample.exportToCSVFile("mySampleStoredFile.csv", ":")


1.2.2 UC : Drawing Empirical CDF, Histogram, Clouds / PDF or superposition of two clouds from data

The objective of this Use Case is to draw :

  • the empirical cumulative density function (CDF) from data : GRAPH 1,

  • the histogram from data : GRAPH 2 (with imposed number of bars) and GRAPH 3 (with free number of bars) ,

  • the superposition of two 2D samples where the first sample is given as sample and the second sample is evaluated from a given from a 2D distribution : GRAPH 4,

  • the superposition of two 2D samples where both samples are given as samples : GRAPH 5.

Details on empirical cumulative distribution function may be found in the Reference Guide ( see file Reference Guide - Step B – Empirical cumulative distribution function ).

To draw an histogram, it is possible :

  • to fix the number of bars,

  • or not to mention it : OpenTURNS will determine automatically the bandwidth of the histogram according to the Silverman rule (gaussian empirical rule).

Requirements  

one scalar numerical sample : sample

two 2D numerical samples : sample2, sample3

type:

NumericalSample

one 2D distribution : dist2D

type:

Distribution

 
Results  

the files containing the empirical CDF graph : sampleCDF.png, sampleCDF.eps, sampleCDFZoom.png, sampleCDFZoom.eps

type:

files at format PNG or EPS or FIG

the files containing the histogram graph : sampleHist.png, sampleHist.eps, sampleHistOpt.png, sampleHistOpt.eps

type:

files at format PNG or EPS or FIG

the files containing the superposed samples (sample 2 and issued from dist2D) : sampleCloudPdf.png, sampleCloudPdf.eps

type:

files at format PNG or EPS or FIG

the files containing the superposed samples (sample 2 and issued from dist2D) : sampleClouds.png, sampleClouds.eps

type:

files at format PNG or EPS or FIG

 

Python script for this UseCase :

script_docUC_InputWithData_EmpiricalDrawing.py

############################### # GRAPH 1 : Empirical CDF graph ###############################   # Generate the Graph structure for the empirical CDF graph # graph range : min(sample) - 1, man(sample) + 1 # CARE : sample must be of dimension 1 sampleCDF = VisualTest.DrawEmpiricalCDF(     sample, sample.getMin()[0] - 1.0, sample.getMax()[0] + 1.0)   # Or impose a bounding box : x-range and y-range # boundingBox = [xmin, xmax, ymin, ymax] myBoundingBox = [-5.0, 5.0, 0.0, 1.0] sampleCDF.setBoundingBox(myBoundingBox)   # In order to see the graph without creating the associated files # View(sampleCDF).show()   # Draw the graph on the file sampleCDF.png and sampleCDF.eps # if the file adress is not fulfilled, the file is created in the current # directory sampleCDF.draw("sampleCDF")   ############################### # GRAPH 2 : Histogram graph with number of bars fixed by the user ############################### # Generate the Graph structure for the histogram graph # Number of bars fixed to 10 # CARE : sample must be of dimension 1 sampleHist = VisualTest.DrawHistogram(sample, 10)   # In order to visualize the graph # View(sampleHist).show()   ############################### # GRAPH 3 : Histogram graph with free number of bars ############################### # (automatically determined by OpenTURNS according to the Silverman rule) # Generate the Graph structure for the histogram graph # CARE : sample must be of dimension 1 sampleHistOpt = VisualTest.DrawHistogram(sample)   # In order to visualize the graph # View(sampleHistOpt).show()   ############################### # GRAPH 4 : Superposition of two 2D samples where ############################### # first sample is given as sample # second sample is issued from a 2D distribution # CARE : sample2 must be of dimension 2 # and dist is of dimension 2 # the sample issued from dist2D have the same size than sample2 cloudPdfGraph = VisualTest.DrawClouds(sample2, dist2D) cloudPdfGraph.setLegendPosition('bottomright')   # In order to visualize the graph # View(cloudPdfGraph).show()   ############################### # GRAPH 5 : Superposition of the two 2D samples : sample2 and sample3 ############################### # CARE : sample2 and sample3 must be of dimension 2 cloudPdfGraph2 = VisualTest.DrawClouds(sample2, sample3) cloudPdfGraph2.setLegendPosition('bottomright')   # In order to visualize the graph # View(cloudPdfGraph2).show()

For example, Figure 10 contains the GRAPH3 obtained with a sample of size 1000 from a Normal(0.0, 1.0) distribution.

For example, Figure 10 contains the GRAPH4 obtained by giving :

  • a sample (actually generated from a 2D Normal distribution with (2.0, 2.0) mean (1.0, 1.0) standard deviation and ρ=-0.8 correlation coefficient),

  • a 2D Normal distribution with (2.0, 2.0) mean (1.0, 1.0) standard deviation and ρ=+0.8 correlation coefficient

Histogram from a sample.
Superposition of two 2D clouds.
Figure 10

1.2.3 UC : Do two samples have the same distribution : QQ-plot visual test, Smirnov numerical test

The objective of this Use Case is to decide whether both samples follow the same distribution or not.

To help the decision, OpenTURNS proposes one visual test and one numerical test :

  • the QQ-plot visual test : Open Turns associates the empirical quantiles of each data from the both numerical samples,

  • the Smirnov test : it tests if both samples (continuous ones only) follow the same distribution. If F n 1 * and F n 2 * are the empirical cumulative density functions of both samples of size n 1 and n 2 , the Smirnov test evaluates the decision variable :

    D 2 =n 1 n 2 n 1 +n 2 sup x |F n 1 * (x)-F n 2 * (x)|

    which tends towards the Kolmogorov distribution. The hypothesis of same distribution is rejected if D 2 is too high (depending on the p-value threshold).

Details on the QQ-polt and Kolmogorov-Smirnov tests may be found in the Reference Guide ( see files Reference Guide - Step B – Using QQ-plot to compare two samples and Step B – Comparison of two samples using Kolmogorov-Smirnov test ).

Requirements  

two scalar numerical continuous samples : sample1, sample2

type:

NumericalSample

 
Results  

test result : resultSmirnov

type:

TestResult

 

Python script for this UseCase :

script_docUC_InputWithData_estSameDist.py

######################### # GRAPH 1 : QQ-plot graph #########################   # Generate the Graph structure for the QQ-plot graph # number of points of the graph fixed to 100 (20 by default) twoSamplesQQPlot = VisualTest.DrawQQplot(sample1, sample2, 100)   # To visualize the graph # View(twoSamplesQQPlot).show()   # Draw the graph on the file twoSamplesQQPlot.png and twoSamplesQQPlot.eps # if the file adress is not fulfilled, the file is created in the current # directory twoSamplesQQPlot.draw("twoSamplesQQPlot")   ######################### # Smirnov Test : # have both samples  a monotonous relation ? #########################   # H0 : same continuous distribution # Test = True <=> same continuous distribution # p-value threshold : probability of the H0 reject zone : 1-0.90 # p-value : probability (test variable decision > test variable decision evaluated on the samples) # Test = True <=> p-value > p-value threshold resultSmirnov = HypothesisTest.Smirnov(sample1, sample2, 0.90)


1.2.4 UC : Are two scalar samples independent : ChiSquared test, Pearson test, Spearman test

The objective of this Use Case is to decide whether two samples are independent or not.

To help the decision, OpenTURNS proposes the following tests :

  • the ChiSquared test : it tests if both scalar samples (discrete ones only) are independent.

    If n ij is the number of values of the sample i=(1,2) in the modality 1jm, n i. = j=1 m n ij n .j = i=1 2 n ij , and the ChiSquared test evaluates the decision variable :

    D 2 = i=1 2 j=1 m (n ij -n i. n .j n) 2 n i. n .j n

    which tends towards the χ 2 (m-1) distribution. The hypothesis of independence is rejected if D 2 is too high (depending on the p-value threshold).

  • the Pearson test : it tests if there exists a linear relation between two scalar samples which form a gaussian vector (which is equivalent to have a linear correlation coefficient not equal to zero).

    If both samples are x ̲=(x i ) 1in and y ̲=(y i ) 1in , and x ¯=1 n i=1 n x i and y ¯=1 n i=1 n y i , the Pearson test evaluates the decision variable :

    D= i=1 n (x i -x ¯)(y i -y ¯) i=1 n (x i -x ¯) 2 i=1 n (y i -y ¯) 2

    The variable D tends towards a χ 2 (n-2), under the hypothesis of normality of both samples. The hypothesis of a linear coefficient equal to 0 is rejected (which is equivalent to the independence of the samples) if D is too high (depending on the p-value threshold).

  • the Spearman test : it tests if there exists a monotonous relation between two scalar samples.

    If both samples are x ̲=(x i ) 1in and y ̲=(y i ) 1in ,, the Spearman test evaluates the decision variable :

    D=1-6 i=1 n (r i -s i ) 2 n(n 2 -1)

    where r i =rank(x i ) and s i =rank(y i ). D is such that n-1D tends towards the gaussian (0,1) distribution.

Details on the independence tests may be found in the Reference Guide ( see files Reference Guide - Step B –Chi-squared test for independence, Step B – Pearson correlation test, Step B – Spearman correlation test ).

Requirements  

two continuous scalar numerical samples of dimension 1 : contSample1, contSample2

type:

NumericalSample

two discrete scalar numerical sample discSample1, discSample2

type:

NumericalSample

 
Results  

tests results : resultChiSquared, resultPearson, resultSpearman

type:

TestResult

 

Python script for this UseCase :

script_docUC_InputWithData_IndependenceTest.py

############################# # ChiSquared Independance test # Are both scalar samples independent ? #############################   # Care : discrete distributions only # H0 = independent samples # p-value threshold : probability of the H0 reject zone : 1-0.90 # p-value : probability (test variable decision > test variable decision evaluated on the samples) # Test = True <=> p-value > p-value threshold resultChiSquared = HypothesisTest.ChiSquared(discSample1, discSample2, 0.90)   # Print result of the ChiSquared Test print("Test Succes ? ", (resultChiSquared.getBinaryQualityMeasure() == 1))   # Get the p-value of the  Test print("p-value of the  Test = ", resultChiSquared.getPValue())   # Get the p-value threshold of the ChiSquared Test print("p-value threshold = ", resultChiSquared.getThreshold())   ############################# # Pearson Test # Are both scalar samples independent ? #############################   # Care : samples are supposed to form a gaussian vector # The test is based on the evaluation of the linear correlation coefficient # H0 : independent samples (linear correlation coefficient = 0) resultPearson = HypothesisTest.Pearson(contSample1, contSample2, 0.90)   ############################# # Spearman Test # Have both scalar samples a monotonous relation #############################   # H0 : no monotonous relation between both samples resultSpearman = HypothesisTest.Spearman(contSample1, contSample2, 0.90)  


1.2.5 UC : Particular manipulations of the Pearson and Spearman tests, when the first sample is of dimension superior to 1.

The objective of this Use Case is to decide whether two samples follow a monotonous or linear relation in the case where the first sample is of dimension >1.

The Pearson and Spearman tests are evaluated successively between some (or all) coordinates of the first sample and the second one, which must be of dimension 1.

Details on the Pearson and Spearman tests may be found in the Reference Guide ( see files Reference Guide - Step B – Pearson correlation test, Step B – Spearman correlation test ).

Requirements  

one continuous scalar numerical sample of dimension n : sample1

type:

NumericalSample

one continuous scalar numerical sample of dimension 1 : sample2

type:

NumericalSample

 
Results  

tests results : resultPartialPearson, resultFullPearson, resultPartialSpearman, resultFullSpearman

type:

TestResultCollection

 

Python script for this UseCase :

script_docUC_InputWithData_PearsonSpearmanTests.py

######################### # Partial Pearson Test : # Are both samples independent? #########################   # Supposes that both  samples  form a gaussian vector  (based on the evaluation of the linear correlation coefficient) # H0 : independent samples (linear correlation coefficient = 0) # Test = True <=> independent samples (linear correlation coefficient = 0) # p-value threshold : probability of the H0 reject zone : 1-0.90 # p-value : probability (test variable decision > test variable decision evaluated on the samples) # Test = True <=> p-value > p-value threshold   # selection of components of sample1 to be tested to sample2 # for example, components 1, 2 if sample2.getDimension() >= 2 selection = list(range(2))   # Perform the Partial Pearson Test resultPartialPearson = HypothesisTest.PartialPearson(     sample1, sample2, selection, 0.90)   # Print the global result of the Pearson Test print("Test global result : ", resultPartialPearson)   # Print result of the Pearson Test for each coordinate tested for i in selection:     print("Test Succes for Coordinate =  ", selection[i], "? ", (         resultPartialPearson[i].getBinaryQualityMeasure() == 1))   # Get the p-value of the Pearson Test print("p-value of the Pearson Test = ", resultPartialPearson[i].getPValue())   # Get the p-value threshold of the  Test i = 0 print("p-value threshold for Coordinate =  ",       selection[i], " = ", resultPartialPearson[i].getThreshold())   # Full Pearson Test : it performs the partial Pearson test on the whole # coordinates of the first sample   ######################### # Full Pearson Test #########################   resultFullPearson = HypothesisTest.FullPearson(sample1, sample2, 0.90)   ######################### # Partial Spearman Test : # Have two scalar samples  a monotonous relation #########################   # H0 : no monotonous relation between both samples   # selection of coordinates of sample1 to be tested to sample2 # for example, components 1, 2 if sample2.getDimension() >= 2 selection = list(range(2))   # Perform the Partial Spearman Test resultPartialSpearman = HypothesisTest.PartialSpearman(     sample1, sample2, selection, 0.90)   ######################### # Full Spearman Test #########################   # it performs the partial Pearson test on the whole composnents of the # first sample   # Perform the Full Spearman Test resultFullSpearman = HypothesisTest.FullSpearman(sample1, sample2, 0.90)


1.2.6 UC : Regression test between two scalar numerical samples

The objective of this Use Case is to detect a linear relation between two scalar numerical samples.

Details on the linear regression model may be found in the Reference Guide ( see file Reference Guide - Step B – Linear regression ).

Requirements  

one continuous scalar numerical sample of dimension n : Xsample

type:

NumericalSample

one continuous scalar numerical sample of dimension 1 : Ysample

type:

NumericalSample

 
Results  

tests results : resultPartialRegression, resultFullRegression, resultPartialSpearman, resultFullSpearman

type:

TestResultCollection

 

Python script for this UseCase :

script_docUC_InputWithData_RegressionTest.py

########################################### # Partial Regression Test between 2 samples : ###########################################   # firstSample of dimension n and secondSample of dimension 1. If # firstSample[i] is i-th marginal of the firstSample , $PartialRegression$ # performs the linear regression test simultaneously on all firstSample[i] # and secondSample, for i in the selection. The linear regression test # tests if the regression model between two scalar numerical samples is # significant. It is based on the deviation analysis of the regression. # The Fisher distribution is used.   # selection of components  of sample1 to be tested to sample2 # for example, components 1, 2 (suppose n>5) selection = list(range(2))   # Perform the Partial Regression Test resultPartialRegression = HypothesisTest.PartialRegression(     Xsample, Ysample, selection, 0.90)   # Print the global result of the Regression Test print("Test global result : ", resultPartialRegression)   # Print result of the Regression Test for each coordinate tested for i in selection:     print("Test Succes for Coordinate =  ", selection[i], "? ", (         resultPartialRegression[i].getBinaryQualityMeasure() == 1))   # Get the p-value of the Regression Test print("p-value of the Regression Test = ",       resultPartialRegression[i].getPValue())   # Get the p-value threshold of the  Test print("p-value threshold for Coordinate =  ",       selection[i], " = ", resultPartialRegression[i].getThreshold())   ########################################### # Full Regression Test ###########################################   # It performs the partial  Regression test on the whole components of the # first sample   # Perform the Full Regression  Test resultFullRegression = HypothesisTest.FullRegression(Xsample, Ysample, 0.90)


1.2.7 UC : Distribution fitting tests, numerical and visual validation tests : Chi-squared test, Kolmogorov test, QQ-plot graph

The objective of this Use Case is to :

  • perform some parametric fitting tests on a numerical sample in dimension 1, with the maximum likelihood principle or the moment based method,

  • validate these estimations with numerical tests : the Kolmogorov test (continuous distributions) or the Chi -squared test (discrete distributions),

  • validate these estimations with a visual test : the QQ-plot graph.

The QQ-plot visual validation test is used with a numerical sample (representing the data) and a distribution (representing the fitted one). For each point of the numerical sample used in the graph, Open Turns evaluates its empirical quantile and associates to it the corresponding quantile from the fitted distribution.

Details on the Maximum Likelihood Principle may be found in the Reference Guide ( see files Reference Guide - Step B – Maximum Likelihood Principle ).

Details on the Parametric estimators used to evaluate the parameters of the copula may be found in the Reference Guide ( see files Reference Guide - Step B – Parametric Estimation ).

Details on the QQ-polt, Kolmogorov-Smirnov and Chi-squared tests may be found in the Reference Guide ( see files Reference Guide - Step B – Graphical goodness-of-fit tests : QQ-plot and Henry line and Step B – Kolmogorov-Smirnov goodness-of-fit test ).

The example here presents :

  • the fitting of a numerical sample of dimension 1 with a Beta distribution, its validation with the Kolmogorov test and the QQ-Plot graph,

  • the fitting of a numerical sample of dimension 1 with a Poisson distribution, its validation with the Chi-squared test and the QQ-Plot graph.

Requirements  

a scalar numerical sample (data) : sample

type:

NumericalSample

 
Results  

a Beta and Geometric estimated distribution : estimatedBeta, estimatedGeom

type:

Beta, Geometric

results of fitting test : resultKolmogorov, resultChiSquared

type:

Testresult

 

Python script for this UseCase :

script_docUC_InputWithData_FittingTests.py

################################ # Case 1 : Continuous distribution ################################   # Fit a Beta distribution to the sample estimatedBeta = BetaFactory().build(sample_cont)   # Get the parameters of the estimated distribution estimatedParam = estimatedBeta.getParametersCollection()   # Display the resulted distribution with its parameters print("Estimated Beta distribution=", estimatedBeta)   # Validate the Beta fitted distribution with the Kolmogorov Test resultKolmogorov = FittingTest.Kolmogorov(sample_cont, estimatedBeta, 0.95)   # Print result of the Kolmogorov Test print("Test Succes ? ", (resultKolmogorov.getBinaryQualityMeasure() == 1))   # Get the p-value of the Kolmogorov Test print("p-value of the Kolmogorov Test = ", resultKolmogorov.getPValue())   # Get the p-value threshold of the Kolmogorov Test print("p-value threshold = ", resultKolmogorov.getThreshold())   # Validate the Beta fitting with a visual test : QQ-plot test # Generate the Graph structure for the QQ-plot graph # number of points of the graph fixed to 100 (20 by default) sampleBetaQQPlot = VisualTest.DrawQQplot(sample_cont, estimatedBeta)   # To visualize the graph # View(sampleBetaQQPlot).show()   # Draw the graph on files sampleBetaQQPlot.draw("SampleBetaQQPlot")   ################################ # Case 2 : Discrete distribution ################################   # Fit a  Geometric distribution to the sample estimatedGeom = GeometricFactory().build(sample_disc)   # Display the resulted distribution with its parameters print("Estimated Geometric distribution=", estimatedGeom)   # Validate the Geometric fitted distribution with the Chi-squared Test resultChiSquared = FittingTest.ChiSquared(sample_disc, estimatedGeom, 0.95)

Figures 11 and 12 show a QQ-Plot graph to test the adequation of a sample generated from a Beta(r = 1.2, t = 3.4, a = 1.0, b = 2.0) to :

  • the Beta(r = 1.2, t = 3.4, a = 1.0, b = 2.0) distribution : visual validation of the fitting,

  • the Weibull(μ = 1.5, σ = 1.0, γ = 1.0) : visual invalidation of the fitting.

Fitting validation by the QQ-Plot graph : Beta fitting to a Beta-sample.
Figure 11
Fitting invalidation by the QQ-Plot graph : Weibull fitting to a Beta sample.
Figure 12

1.2.8 UC : Normal distribution fitting test, visual validation tests (Henry line) and numerical validation tests in extreme zones (Anderson Darling test and Cramer Von Mises test)

The objective of this Use Case is to fit a normal distribution to a scalar numerical sample, with the maximum likelihood principle or the moment based method, and to validate it with visual and numerical tests.

Details on the Maximum Likelihood Principle may be found in the Reference Guide ( see files Reference Guide - Step B – Maximum Likelihood Principle ).

Details on the Anderson Darling, Cramer Von Mises and Henry line tests may be found in the Reference Guide ( see file Reference Guide - Step B – Anderson Darling goodness-of-fit test, Step B – Cramer Von Mises goodness-of-fit test and Step B – Graphical goodness-of-fit tests : QQ-plot and Henry line ).

To help this decision, OpenTURNS proposes the following tests :

  • the Henry line visual test, which is the QQ-Plot graph adapted to the normal distribution,

  • the Anderson Darling test : this test gives more importance to extreme values. If F n is the empirical cumulative density function of the sample (x i ) 1in and if (x (i) ) 1in is the ordered sample, the Anderson Darling test evaluates the decision variable :

    AD 2 =n (F n (x)-F(x)) 2 f(x)(1-F(x))dF(x)=-n-1 n i=1 n (2i-1)[log(f(x (i) ))+log(1-F(x (n-i+1) ))]

    Under the hypothesis of normality of the distribution F, the decision variable has a tabulated distribution.

  • the Cramer Von Mises test : this test gives also more importance to extreme values. If F n is the empirical cumulative density function of the sample (x i ) 1in and if (x (i) ) 1in is the ordered sample, the Cramer Von Mises test evaluates the decision variable :

    CM= (F n (x)-F(x)) 2 dF(x)=1 12n+ i=1 n 2i-1 2n-F(x (i) ) 2

    Under the hypothesis of normality of the distribution F, the decision variable has a tabulated distribution.

Requirements  

a scalar numerical sample (data) : sample

type:

NumericalSample

 
Results  

a normal fitted distribution : estimatedNormalDistribution

type:

Distribution

the files containing the Henry line graph : HenryPlot.png, HenryPlot.eps

type:

files at format PNG or EPS or FIG

a numerical validation by the Anderson Darling test for two continuous distributions (p-value)

type:

TestResult

a numerical validation by the test for Cramer Von Mises discrete distribution (p-value)

type:

TestResult

 

Python script for this UseCase :

script_docUC_InputWithData_NormalFittingTests.py

################### # Henry line graph ###################   # Generate the Graph structure for the Henry line graph henryPlot = VisualTest.DrawHenryLine(sample)   # To visualize the graph # View(henryPlot).show()   # Draw the graph on files # if the file adress is not fulfilled, the file is created in the current # directory henryPlot.draw("HenryPlot")   ################### # Anderson Darling Test ################### # Test = True <=> the sample follows a Normal distribution (H0 hypothesis) # p-value threshold : probability of the H0 reject zone = 1-0.95 # p-value : probability (test variable decision > test variable decision evaluated on the samples) # Test = True (=1) <=> p-value > p-value threshold # Number of parameters estimated from sample : 4 resultAndersonDarling = NormalityTest.AndersonDarlingNormal(sample, 0.95)   # Print result of the  Anderson Darling Test print("Test Succes ? ", (resultAndersonDarling.getBinaryQualityMeasure() == 1))   # Get the p-value of the Anderson Darling Test print("p-value of the Anderson Darling Test = ",       resultAndersonDarling.getPValue())   # Get the p-value threshold of the Anderson Darling Test print("p-value threshold = ", resultAndersonDarling.getThreshold())   ################### # Cramer Von Mises Test ################### # H0 hypothesis : the sample follows a Normal distribution resultCramerVonMises = NormalityTest.CramerVonMisesNormal(sample, 0.95)

Figures 13 and 13 show the Henry Line of a sample coming from a :

  • Normal(μ = 0.0, σ = 1.0) distribution : visual validation of the normality,

  • Beta(r = 0.7, t = 1.6, a = 0.0, b = 2.0) distribution : visual invalidation of the normality.

Validation of the hypothesis of normality by the Henry Line for a Normal-sample.
Invalidation of the hypothesis of normalityHenry Line for a Beta-sample.
Figure 13

1.2.9 UC : Making a choice between multiple fitted distributions : Kolmogorov ranking, ChiSquared ranking and BIC ranking

The objective of this Use Case is to help to make a choice between several distributions fitted to a numerical sample. This choice can be motivated by :

  • the ranking by the Kolmogorov p-values (for continuous distributions),

  • the ranking by the ChiSquared p-values (for discrete distributions),

  • the ranking BIC values.

Details on the BIC criterion may be found in the Reference Guide ( see file Reference Guide - Step B – Bayesian Information Criterion (BIC) ).

It does not require to specify the parameters of the different models tested. It is possible to precise :

  • the model only : in that case, OpenTURNS builds a factory for each model. OpenTURNS first evaluates the parameters of the distribution (through the maximum likelihood rule or the moment based one) and then ranks the distributions according to the criteria selected,

  • the complete distributions with their parameters : OpenTURNS only evaluates the selected criteria on each of them and rank them.

Note that it is possible to specify a KernelSmoothing distribution in the list of the models tested.

Requirements  

some continupous or discrete numerical samples (data) : sample_cont, sample_dist

type:

NumericalSample

 
Results  

a continuous distribution which ranks first by the Kolmogorov test : bestContModelKol, bestcontDistKol

type:

Distribution

a continuous distribution which ranks first by the BIC test : bestContModelBIC

type:

Distribution

a discrete distribution which ranks first by the Chi Squared test : bestDiscModelChiSq

type:

Distribution

a discrete distribution which ranks first by the BIC test : bestDiscDistBIC

type:

Distribution

 

Python script for this UseCase :

script_docUC_InputWithData_ChoiceFittedDistributions.py

################################## # CASE 1 : Continuous distributions ##################################   ########################## # Specify the models only # Then, OpenTURNS finds the best parameters and tests the distribution   # Create a collection of factories for all the models # to be tested collContFact = [BetaFactory()]   # Rank the continuous models by the Kolmogorov p-values : bestContModelKol = FittingTest.BestModelKolmogorov(sample_cont, collContFact)   # Get all information on that distribution print("best continuous distribution by Kolmogorov = ", bestContModelKol)   # Rank the continuous models wrt the BIC crieria : bestContModelBIC = FittingTest.BestModelBIC(sample_cont, collContFact)   # Get all information on that distribution print("best continuous distribution wrt BIC = ", bestContModelBIC)   ########################## # Specify the entire distributions (model + parameters) # Then OpenTURNS tests the model   # Create a collection of the distributions to be tested collContDist = [Beta(2.0, 4.0, 0.0, 1.), Triangular(0., 0.5, 1.)]   # Rank the continuous models by the Kolmogorov p-values : bestcontDistKol = FittingTest.BestModelKolmogorov(sample_cont, collContDist)   # Get all information on that distribution print("best continuous distribution by Kolmogorov = ", bestcontDistKol)   # Rank the continuous models wrt the BIC crieria : bestContDistBIC = FittingTest.BestModelBIC(sample_cont, collContDist)   # Get all information on that distribution print("best continuous distribution wrt BIC = ", bestContDistBIC)   ################################## # CASE 2 : Discrete distributions ##################################   ########################## # Specify the models only # Then, OpenTURNS finds the best parameters and tests the distribution   # Create a collection of factories for all the models # to be tested (here no idea for another discrete distributions ...) collDiscFact = [PoissonFactory()]   # Rank the discrete models wrt the ChiSquared p-values : bestDiscModelChiSq = FittingTest.BestModelChiSquared(sample_disc, collDiscFact)   # Get all information on that distribution print("best discrete distribution by  = ", bestDiscModelChiSq)   # Rank the discrete models wrt the BIC crieria : bestDiscDistBIC = FittingTest.BestModelBIC(sample_disc, collDiscFact)   # Get all information on that distribution print("best discrete distribution wrt BIC = ", bestDiscDistBIC)   ########################## # Specify the entire distributions (model + parameters) # Then OpenTURNS only tests the distributions   # Create a collection of the distributions to be tested collDiscDist = [Poisson(2), Geometric(0.1)]   # Rank the discrete distributions wrt the ChiSquared p-values : bestDiscDistChiSq = FittingTest.BestModelChiSquared(sample_disc, collDiscDist)   # Get all information on that distribution print("best continuous distribution by  = ", bestDiscDistChiSq)   # Rank the discrete distributions wrt the BIC crieria : bestDiscDistBIC = FittingTest.BestModelBIC(sample_disc, collDiscDist)   # Get all information on that distribution print("best continuous distribution wrt BIC = ", bestDiscDistBIC)  


1.2.10 UC : PDF fitting by kernel smoothing and graphical validation : superposition of the empirical and kernel smoothing CDF

The objective of this Use Case is to model the PDF of a random vector, described by a numerical sample thanks to the kernel smoothing method and to superpose on the same graph the kernel smoothing PDF and the histogram built from the same numerical sample.

Details on the kernel smoothing method may be found in the Reference Guide ( see file Reference Guide - Step B – Kernel Smoothing ).

In dimension 1, the kernel smoothed probability density function p ^ has the following expression, where K is the univariate kernel, n the numerical sample size and (X 1 ,,X n ) n the univariate random sample with i,X i :

p ^(x)=1 nh i=1 n Kx-X i h (7)

The kernel K is a function satisfying K(x)dx=1. Usually, K is chosen to be a unimodal probability density function that is symmetric about 0.

The parameter h is called the bandwidth.

In dimension d>1, OpenTURNS uses the product kernel K d , defined by:

K d (x ̲)= j=1 d K(x j )

where x ̲=(x 1 ,,x d ) d . Thus, the kernel smoothed probability density function in dimension d writes:

p ^(x ̲)=1 N j=1 d h j i=1 N K d x 1 -X i1 h 1 ,,x d -X id h d

where (X ̲ 1 ,,X ̲ n ) is the d-variate random sample which components are denoted X ̲ i =(X i1 ,,X id ).

Let's note that the bandwidth is the vector h ̲=(h 1 ,,h d ).

The default kernel of OpenTURNS is the product of standard Normal distributions. The dimension of the product is automatically evaluated from the random sample.

In dimension 1, the boundary effects may be taken into account in OpenTURNS : the boundaries are automatically detected from the numerical sample (with the min and max functions) and the kernel smoothed PDF is corrected in the boundary areas to remain within the boundaries, according to the miroring technique :

  • the Scott bandwidth is evaluated from the numerical sample : h

  • two subsamples are extracted from the inital numerical sample, containing all the points within the range [min,min+h[ and ]max-h,max],

  • both subsamples are transformed into their symmetric samples with respect their respective boundary : its results two samples within the range ]min-h,min] and [max,max+h[,

  • a kernel smoothed PDF is built from the new numerical sample composed with the initial one and the two new ones, with the previous bandwidth h,

  • this last kernel smoothed PDF is truncated within the inital range [min,max] (conditional PDF).

The choice of the kind of the kernel is free in OpenTURNS : it is possible to select any 1D distribution and to define it as a kernel. However, in order to optimize the efficiency of the kernel smoothing fitting (it means to minimise the AMISE error), it is recommended to select a symmetric distribution for the kernel.

All the distribution default constructors of OpenTURNS create a symmetric default distribution when possible. It is also possible to work with the Epanechnikov kernel, which is a Beta(r=2,t=4,a=-1,b=1).

The bandwidth h ̲ may be fixed by the User. However, it is recommended to let OpenTURNS evaluate it automatically from the numerical sample according to the following rules :

  • In dimension d, OpenTURNS automatically applies the Scott rule.

  • In dimension 1, the automatic bandwidth selection method depends on the size n of the numerical sample. As a matter of fact, the computation bottleneck is the estimation of the estimators Φ ^ r as it requires the evaluation of a double summation on the numerical sample, which has a cost of 𝒫(n 2 ).

    • if n250, the Solve-the-equation plug-in method is used on the entire numerical sample.

    • if n>250, the Solve-the-equation plug-in method is too computationnally expensive. Then, OpenTURNS proceeds as follows :the plug-in method on the entire numerical sample if its size is inferior to 250; the mixted method in the other case. Refer to the Reference Guide for more details on these methods.

It is possible to specify a particular bandwidth, evaluated according to one of the previous rules as follows :

  • computeSilvermanBandwidth applies the Silverman rule,

  • computePluginBandwidth applies the plug-in method on the entire numerical sample,

  • computeMixedBandwidth applies the mixted plug-in method on the entire numerical sample, as described above.

Requirements  

a nD-sample : sample

type:

NumericalSample

 
Results  

a kernel smoothed distribution : kernelSmoothedDist

type:

Distribution

 

Python script for this UseCase :

script_docUC_InputWithData_KernelSmoothing.py

################################## # STEP 1 : Creation of the kernel   # Create the default kernel : kernel product of N(0.0, 1.0) kernel = KernelSmoothing()   # Create a specified kernel # for example, a Uniform one with default parameters kernel2 = KernelSmoothing(Uniform())   # Specify totally the kernel # CARE : the kernel smoothing is more efficient # when the kernel support is symmetric qith respect to 0 kernel3 = KernelSmoothing(Triangular(-2.0, 0.0, 2.0))   ##################################################### # STEP 2 : Creation of the kernel smoothed distribution # The dimension of the distribution is authomatically # detected from the numerical sample   # With no bandwidth specification : the method employed is # adapted to the size of the numerical sample # with no boudary treatment kernelSmoothedDist = kernel.build(sample)   # Add a boundary treatment in dimension 1 only kernelSmoothedDist2 = kernel.build(sample, True)   # Check the bandwidth used print("kernel bandwidth=", kernel.getBandwidth())   # Specify a particular bandwidth evaluated # according to the Silverman rule myBandwith = kernel.computeSilvermanBandwidth(sample)   # or according to the plug-in method # applied in the entire numerical sample myBandwith2 = kernel.computePluginBandwidth(sample)   # or according to the mixted plug-in method myBandwith3 = kernel.computeMixedBandwidth(sample)   # Then build the estimation with the specified bandwidth kernelSmoothedDist = kernel.build(sample, myBandwith)   ############################################ # GRAPH : In dimension 1, superposition of the kernel smoothed CDF # and the empirical CDF ############################################ # Create the graph containing the kernel smoothed CDF kernelSmoothedCDF = kernelSmoothedDist.drawCDF()   # Draw the empirical CDF of the sample on the same graph empiricalCDF = VisualTest.DrawEmpiricalCDF(     sample, sample.getMin()[0], sample.getMax()[0]) drawableEmpiricalCDF = empiricalCDF.getDrawable(0) drawableEmpiricalCDF.setColor('blue')   # Add the second drawable on the first graph kernelSmoothedCDF.add(drawableEmpiricalCDF)   # To visualize the graph # View(kernelSmoothedCDF).show()   # Draw the final graph on the file smoothedCDF-EmpiricalCDF at format .eps, .png and .fig # if the adress is not fulfilled, the file is created in the current directory kernelSmoothedCDF.draw("smoothedCDF-EmpiricalCDF")  

Figures 14 and 14 show a 1D kernel smoothing of a distribution of type Mixture which PDF is defined by : 0.2*Triangular(1.0, 2.0, 4.0) + 0.5*Normal(-1.0, 1.0) + 0.3*Exponential(1.0, 3.0), thanks to a numerical sample of size 10 4 , with a Normal, Triangular and Epanechnikov kernel, when the bandwidth is selected according to the mixted plug-in method.

Figures 15 and 15 show the difference when the previous distribution is estimated with a Normal kernel and when the bandwidth is selected according to the Silverman rule, the plug-in method or the mixted plug-in method, thanks to a numerical sample of size 10 3 .

Figures 16 and 16 show the effect of the boundary treatment in the kernel smoothing through the example of the exponential distribution Exp(λ=2.0,γ=0.0). A Normal kernel is used and the bandwidth is selected according to the mixted plug-in method, thanks to a numerical sample of size 10 4 .

PDF of th kernel smoothed distribution and of the real one with different kernels.
CDF of the kernel smoothed distribution and of the real one with different kernels.
Figure 14
PDF of the kernel smoothed distribution and of the real one with different bandwidth selection rules.
CDF of the kernel smoothing distributions and of the real one with different bandwidth selection rules.
Figure 15
Effect of the boundary treatment on the kernel smoothing PDF of an exponential distribution.
Effect of the boundary treatment on the kernel smoothing CDF of an exponential distribution.
Figure 16

1.2.11 UC : Estimation of a Copula from a sample

The objective of this Use Case is to :

  • fit a copula to a sample : this estimation may be parametric (when using a specific model of copula) or not (when using the Sklar Copula extracted from a non parametric estimation of the multivariate distribution),

  • validate the estimation with a visual test by superposing the iso-curves of the estimated copula and the sample in the rank space.

Details on the Maximum Likelihood Principle may be found in the Reference Guide ( see files Reference Guide - Step B – Maximum Likelihood Principle ).

Details on the Parametric Estimators used to evaluate the parameters of the copula may be found in the Reference Guide ( see files Reference Guide - Step B – Parametric Estimation ).

Requirements  

a 2D numerical sample (data) : sample

type:

NumericalSample

 
Results  

some estimations of the copula : estimatedParamCop,estimatedSklarCop

type:

a Copula

 

Python script for this UseCase :

script_docUC_InputWithData_CopulaEstimation.py

################################ # Case 1 : Parametric estimation ################################   # Estimate a copula from the  numerical sample # for example a gumbel model estimatedParamCop = GumbelCopulaFactory().build(sample) print("Gumbel Copula  = ", estimatedParamCop)   ################################ # Case 2 : Non parametric estimation ################################   # Build a non parametric estimation of # the multivariate distribution # For example with the kernel smoothing method myEstimatedDist = KernelSmoothing().build(sample)   # Then extract the copula estimatedSklarCop = myEstimatedDist.getCopula() print("Sklar Copula  = ", estimatedSklarCop)   ################################ # Validation in the rank space ################################   # Map the sample into the rank space   # Creation of the operator which transforms the marginals into the uniform ones # We suppose here that the marginal distributions are both Normal(0,1) ranksTransf = MarginalTransformationEvaluation(     DistributionCollection([Normal(), Normal()]), MarginalTransformationEvaluation.FROM)   # Transformation of the initial sample into the ranked sample rankSample = NumericalSample(sample.getSize(), sample.getDimension()) for i in range(sample.getSize()):     rankSample[i] = ranksTransf(sample[i])   # Cloud in the rank space rankCloud = Cloud(rankSample, 'blue', 'plus', 'sample') myGraph = Graph(     'Parametric estimation of the copula', 'X', 'Y', True, 'topleft') myGraph.setLegendPosition('bottomright') myGraph.add(rankCloud)   # Then draw the iso-curves of the estimated copula minPoint = NumericalPoint([0.0, 0.0]) maxPoint = NumericalPoint([1.0, 1.0]) pointNumber = Indices([201, 201]) graphCop = estimatedParamCop.drawPDF(minPoint, maxPoint, pointNumber) contour_estCop = graphCop.getDrawable(0)   # Erase the labels of the iso-curves contour_estCop.setDrawLabels(False)   # Change the levels of the iso-curves # Levels of the iso curves nlev = 31 levels = NumericalPoint(nlev) for i in range(nlev):     levels[i] = 0.25 * nlev / (1.0 + i)     contour_estCop.setLevels(levels)   # Change the legend of the curves contour_estCop.setLegend('Gumbel copula')   # Change the color of the iso-curves contour_estCop.setColor('red')   # Add the iso-curves graph into the cloud one myGraph.add(contour_estCop)   # Visualize the graph # View(myGraph).show()  

Figure 17 draws the cloud of the sample in the physical space and Figure 17 draws the sample in the rank space. Figure 18 draws the superposition of the sample in the rank space and the iso-curves of the estimated copula.

Initial sample.

Sample in the rank space.

Figure 17
Sample and iso-PDF curves of the estimated copula.
Figure 18

1.2.12 UC : Copula validation through the Kendall Plot Test

The objective of this Use Case is to exhibit the different uses of the Kendall Plot to :

  • test a specific model of bivariate copula with respect to a sample,

  • test whether two bivariate samples share the same copula model.

Details on the Kendall Plot Test may be found in the Reference Guide ( see files Reference Guide - Step B – Graphical googness-of-fit tests : QQ-plot, Kendall Plot and henry line. ).

This Use Case performs the following analysis :

In the example, we suppose we have a sample of dimension 2. The objective is to estimate the dependence structure of the sample. The analysis has the following steps :

  1. we suppose we have a sample of dimension 2 sample1 and a given model of copula copula , whatever the way it has been determined (parametric estimation or non parametric one). We want to validate this copula model with the Kendall Plot test, thanks to the method VisualTest.DrawKendallPlot(sample1, copula);

  2. we suppose we have two samples of dimension 2 sample1 and sample2. We wonder whether this two samples have the same copula model, whithout specifying it, using the Kendall Plot test, thanks to the method VisualTest.DrawKendallPlot(sample1, sample2).

In OpenTURNS, the mean of the statistics W i is evaluated with the Monte Carlo sampling method from n simulations. By default, n=100. It is possible to change this value thanks to the ResourceMap class (see the python script).

Requirements  

two 2D numerical sample (data) : sample1, sample2

type:

NumericalSample

 
Results  

a bivariate copula : copula

type:

CopulaImplementation

 

Python script for this UseCase :

script_docUC_InputWithData_CopulaKendallPlotTest.py

################################## # CASE 1 : Test a specific copula model for a given sample ##################################   # Change the parameter for the evaluation of E(Wi) myValue = 25 ResourceMap.SetAsUnsignedInteger(     'VisualTest-KendallPlot-MonteCarloSize', myValue)   # Run the Kendall test # For example on the Gumbel Copula(3) copula_test = GumbelCopula(3) kendallPlot1 = VisualTest.DrawKendallPlot(sample1, copula_test)   # Visualize the graph # View(kendallPlot1).show()   ################################## # CASE 2 : Test whether two samples have the same copula model ##################################   # Run the Kendall test kendallPlot2 = VisualTest.DrawKendallPlot(sample1, sample2)   # Visualize the graph # View(kendallPlot2).show()   # Print the graph in file.PNG kendallPlot2.draw('copula_validation', 640, 480, GraphImplementation.PNG)

In Figures 19 to 20, the data 1 and data 2 have been generated from a Frank(1.5) copula, and data 3 from a Gumbel(4.5) copula.

Figures 19 and 19 respectively validates and invalidates the Frank copula model to data 1 and data 2.

Figures 20 and 20 respectively validates that data 1 and data 2 share the same copula, and shows that data 1 and data 3 don't share the same copula.

The Kendall Plot test validates the use of the Frank copula model for the data 1.

The Kendall Plot test invalidates the use of the Frank copula model for the data 1.

Figure 19

The Kendall Plot test validates that data 1 and data 2 have the same copula model.

The Kendall Plot test invalidates that data 1 and data 3 have the same copula model.

Figure 20

1.2.13 UC : Estimation and validation of a linear model from two samples

The objective of this Use Case is to build a linear regression model between a the scalar variable Y and the n-dimensional one X ̲=(X i ) in , as follows :

Y ˜=a 0 + i=1 n a i X i +ε

where ε is the residual, supposed to follow the Normal(0.0, 1.0) distribution.

Each coefficient a i is evaluated from both samples Ysample and Xsample and is accompagnied by a confidence interval and a p-value (which tests if they are significantly different from 0.0).

Details on the linear regression model may be found in the Reference Guide ( see file Reference Guide - Step B – Linear regression ).

The linear model may be used to evaluate predictions on particular sample of the variable X : particularXSample.

The linear model may be validated :

  • graphically if Xsample is of dimension 1, by drawing on the same graph the cloud (Xsample, Ysample) and the regression line, with the OpenTURNS method DrawLinearModelVisualTest,

  • numerically with the following OpenTURNS tests :

    • LinearModelRSquared Test which tests the quality of the linear regression model. It evaluates the indicator R 2 (regression variance analysis) and compares it to a level,

    • LinearModelRAdjustedSquared which tests the quality of the linear regression model. It evaluates the indicator R 2 adjusted (regression variance analysis) and compares it to a level,

    • LinearModelFisher Test which tests the nullity of the regression linear model coefficients (Fisher distribution used),

    • LinearModelResidualMean Test which tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).

The hypothesis on the residuals (centered gaussian distribution) may be validated :

  • graphically if Xsample is of dimension 1, by drawing the residual couples (ε i ,ε i+1 ), where the residual ε i is evaluated on the samples (Xsample, Ysample) : ε i =Ysample i -Y ˜ i with Y ˜ i =a 0 +a 1 Xsample i . The OpenTURNS method is DrawLinearModelResidualtest ,

  • numerically with the LinearModelResidualMean Test which tests, under the hypothesis of a gaussian sample, if the mean of the residual is equal to zero. It is based on the Student test (equality of mean for two gaussian samples).

Requirements  

a 1D-sample : Ysample

type:

NumericalSample

a nD-sample : Xsample

type:

NumericalSample

a nD-sample : particularXSample

type:

NumericalSample

,

a linear regression model : linearRegressionModel

type:

LinearModel

the linear coefficients (a i ) 0in : coefValues

type:

scalarCollection

the confidence intervals of each coefficient a i

type:

ConfidenceIntervalCollectionf

the p-values of each coefficient a i

type:

ConfidenceIntervalCollection

the predicted value on a particular sample : predictedSample

type:

NumericalSample

the sample of residual values: residualSample

type:

NumericalSample

the graph superposing the samples cloud and the regression line (in case of dimension 1 for X) : linearRegressionModel.png, linearRegressionModel.eps

type:

files at format PNG or EPS or FIG

the graph of residual values : residualGraph.png, residualGraph.eps

type:

files at format PNG or EPS or FIG

LinearModelRAdjustedSquared test result : resultLinearModelRAdjustedSquared

type:

TestResult

LinearModelRSquared test result : resultLinearModelRSquared

type:

TestResult

LinearModelFisher test result : resultLinearModelFisher

type:

TestResult

LinearModelResidualMean test result : resultLinearModelResidualMean

type:

TestResult

 
Results    

Python script for this UseCase :

script_docUC_InputWithData_LinearModel.py

############################### # Create the linear model from both sample : # Ysample function of Xsample ############################### # CARE : Xsample is of dimension n and Ysample of dimension 1 # The level confidence to evaluate the confidence interval is set to 0.90 linearRegressionModel = LinearModelFactory().build(Xsample, Ysample, 0.9)   # Get the coefficients ai print("coefficients of the linear regression model = ",       linearRegressionModel.getRegression())   # Get the confidence intervals of the ai coefficients print("confidence intervals of the coefficients = ",       linearRegressionModel.getConfidenceIntervals())   # Get the p values of the (n+1) coefficients ai: print("p-value of each coefficient = ", linearRegressionModel.getPValues())   # Get the residuals print("residuals values = ",       linearRegressionModel.getResidual(Xsample, Ysample))   # Evaluate the predictions on the sample particularXSample print("predicted values on particularXSample = ",       linearRegressionModel.getPredicted(particularXSample))   ################################################### # GRAPH 1 : Validate the model with a visual test ################################################### # superposition of clouds (Xsample, Ysample) # ONLY if Xsample is a SCALAR numerical sample # + linear regression model   linearRegressionGraph = VisualTest.DrawLinearModel(     Xsample, Ysample, linearRegressionModel)   # To vizualize the graph # View(linearRegressionGraph).show()   ################################################### # GRAPH 2 : Draw the graph of the residual values ################################################### # couples (residual i, residual i+1) # ONLY if Xsample is a SCALAR numerical sample   residualValuesGraph = VisualTest.DrawLinearModelResidual(     Xsample, Ysample, linearRegressionModel)   # To vizualize the graph # View(residualValuesGraph).show()   ############################# # LinearModelRSquared Test tests # the quality of the linear regression model. ############################# # It evaluates the R^2 indicator (regression variance analysis) # and compares it to a level # H0 = R^2 > level # Test = True <=> R^2 > level # p-value threshold : level CARE : it is NOT a probability here! # p-value : R^2 CARE : it is NOT a probability here! # Test = True <=> p-value > p-value threshold   # The two following tests must be equal : # Test 1 : We don't give the linear model which is evaluated and then tested resultLinearModelRSquared1 = LinearModelTest.LinearModelRSquared(     Xsample, Ysample, 0.90)   # Test 2 : We give the regression linear model evaluated previously resultLinearModelRSquared2 = LinearModelTest.LinearModelRSquared(     Xsample, Ysample, linearRegressionModel, 0.90)   # Print result of the LinearModelRSquared Test print("Test Succes ? ", (     resultLinearModelRSquared1.getBinaryQualityMeasure() == 1))   # Get the p-value of the LinearModelRSquared Test # CARE : it is NOT a probability here! but the R^2 value print("p-value of the LinearModelRSquared Test = ",       resultLinearModelRSquared1.getPValue())   # Get the p-value threshold of the LinearModelRSquared Test # CARE : it is NOT a probability here! but the level=0.90 here print("p-value threshold = ", resultLinearModelRSquared1.getThreshold())   ############################# #  LinearModelAdjustedRSquared Test tests # the quality of the linear regression model. ############################# # It evaluates the adjusted R^2 indicator (regression variance analysis) # and compare it to a level # H0 = adjusted aR^2 > level   # The two tests must be equal # We don't give the linear model which is evaluated and then tested resultLinearModelAdjustedRSquared1 = LinearModelTest.LinearModelAdjustedRSquared(     Xsample, Ysample, 0.90)   # We give the regression linear model evaluated previously resultLinearModelAdjustedRSquared2 = LinearModelTest.LinearModelAdjustedRSquared(     Xsample, Ysample, linearRegressionModel, 0.90)   ############################# #  LinearModelFisher Test tests # the nullity of the regression linear model coefficients # (Fisher distribution used). ############################# # H0 = the linear relation coefficients are those evaluated by the linear # regresion   # The two tests must be equal # Test 1 : We don't give the linear model which is evaluated and then tested resultLinearModelFisher1 = LinearModelTest.LinearModelFisher(     Xsample, Ysample, 0.90)   # Test 2 : We give the regression linear model evaluated previously resultLinearModelFisher2 = LinearModelTest.LinearModelFisher(     Xsample, Ysample, linearRegressionModel, 0.90)   ############################# #  LinearModelResidualMean Test tests, # under the hypothesis of a  gaussian sample, # if the mean of the residual is equal to zero. ############################# # It is based on the Student test (equality of mean for two gaussian samples). # H0 = the residuals have a mean equal to zero   # The two tests must be equal # Test 1 : We don't give the linear model which is evaluated and then tested resultLinearModelResidualMean1 = LinearModelTest.LinearModelResidualMean(     Xsample, Ysample, 0.90)   # Test 2 : We give the regression linear model evaluated previously resultLinearModelResidualMean2 = LinearModelTest.LinearModelResidualMean(     Xsample, Ysample, linearRegressionModel, 0.90)

The following figures draw the regression model superposed on the samples cloud (Xsample, Ysample) of size 10 3 and the residuals graph in both cases :

  • where the regression model seems validated : Figures 21 and 21,

  • where the regression model doesn't seem to be validated (relation of kind Y=X 2 ) : Figures 22 and 22.

  • where the regression model doesn't seem to be validated (relation of kind Y=sin(X)) : Figures 23 and 23.

Visual validation of the Linear Regression Model.
Visual validation of the Linear Regression Model : residuals graph.
Figure 21
Visual invalidation of the Linear Regression Model.
Visual invalidation of the Linear Regression Model : residuals graph.
Figure 22
Visual invalidation of the Linear Regression Model.
Visual invalidation of the Linear Regression Model : residuals graph.
Figure 23

1.2.14 UC : Statistical manipulations on data : min, max, covariance, skewness, kurtosis, quantile, empirical CDF, Pearson, Kendall and Spearman correlation matrixes and rank/sort functionnalities

The objective of this Use Case is to describe the main statistical functionalities that OpenTURNS enables to manipulate some data, represented by a NumericalSample.

OpenTURNS enables to calculate per components :

  • min and max per component, with the methods getMin, getMax

  • range per component, with the method computeRangePerComponent

  • mean, variance, standard deviation , skewness and kurtosis per component, with the methods computeMean, computeVariancePerComponent, computeStandardDeviationPerComponent, computeSkewnessPerComponent, computeKurtosisPerComponent

  • empirical median and other quantiles per component, with the methods computeMedianPerComponent, computeQuantilePerComponent

OpenTURNS enables some global calculs :

  • covariance of the sample, with the methods computeCovariance,

  • standard deviation of the sample : the Cholesky factor of the covariance matrix, with the methods computeStandardDeviation ,

  • Pearson, Kendall and Spearman correlation matrix, with the methods computePearsonCorrelation, computeKendallTau, computeSpearmanCorrelation,

  • empirical CDF evaluated on a point x ̲ : X 1 x 1 ,,X n x n , with the methods computeEmpiricalCDF,

  • empirical tail CDF evaluated on a point x ̲ : X 1 >x 1 ,,X n >x n , with the methods computeEmpiricalCDF,

  • empirical quantiles, with the method computeQuantile. For the quantile x q of order q, OpenTURNS proceeds as follows :

    • q[1 2n,1-1 2n], then OpenTURNS approximates the empirical cumulative density function by interpolating all the middles of the steps and then evaluates x q from this continuous approximation.

    • q1 2n, then OpenTURNS returns min(X i ).

    • q>1 2n, then OpenTURNS returns max(X i ).

At last, it is possible :

  • to copy into a NumericalSample whose components are the respective ranks of the components, with the method rank,

  • to copy into a NumericalSample whose components are all sorted in ascending order, with the method sort ,

  • to extract the (i+1) component whose components are all sorted in ascending order, with the method sort(i) ,

  • to copy into a NumericalSample whose NumericalPoints are reordered such that the (i+1) component is sorted in ascending order, with the method sortAccordingAComponent(i),

  • to keep from the Numericalsample only the i first points, with the method split(i),

  • to translate the points of the NumericalSample, with the method translate ,

  • to multiply all the components of the points by a factor, with the method scale,

  • to remove a particular point from the NumericalSample, with the method erase.

Requirements  

a numerical sample : sample

type:

NumericalSample

 
Results  

statistical elements listed previously

type:

NumericalPoint, SquareMatrix or CorrelationMatrix

 

Python script for this UseCase :

script_docUC_InputWithData_StatisticalManipulation.py

# Get the Variance per component print("Variance =", sample.computeVariance())   # Get the Skewness per component print("Skewness =", sample.computeSkewness())   # Get the Kurtosis per component print("Kurtosis =", sample.computeKurtosis())   # Get the median per component print("Median per component =", sample.computeMedian())   # Get the empirical 0.95 quantile per component print("0.95 quantile per component =",       sample.computeQuantilePerComponent(0.95))   # Get the sample covariance print("Covariance =", sample.computeCovariance())   # Get the sample standard deviation print("Standard deviation =", sample.computeStandardDeviation())   # Get the sample  Pearson correlation matrix print("Pearson correlation =", sample.computePearsonCorrelation())   # Get  the sample Kendall correlation matrix print("Kendall correlation =", sample.computeKendallTau())   # Get  the sample Spearman  correlation matrix print("Spearman correlation =", sample.computeSpearmanCorrelation())   # Get the value of the empirical CDF at a point # For ex in dimension 3 myPoint = NumericalPoint([1.1, 2.2, 3.3]) print("Empirical CDF at point POINT = ", sample.computeEmpiricalCDF(myPoint))   # Get the value of the empirical CDF at point POINT print("Empirical CDF at point POINT = ",       sample.computeEmpiricalCDF(myPoint, True))  

To illustrate each method, we give here an example in dimension 2 : consider the following NumericalSample numSample = [(1.3, 1.2); (4.1, 1.0); (2.3, 2.7)]. Then,

At last, it is possible :

  • new = numSample.rank(): new = [(0,1); (2,0); (1,2)],

  • new = numSample.sort(): new = [(1.3,1.0); (2.3,1.2); (4.1,2.7)],

  • new = numSample.sort(0)): new = [(1.3); (2.3); (4.1)],

  • new = numSample.sortAccordingAComponent(1): new = [(4.1, 1.0);(1.3, 1.2);(2.3, 2.7)],

  • new = numSample.split(2): new = [(2.3, 2.7)] and numSample = [(1.3, 1.2); (4.1, 1.0)],

  • new = numSample.translate(NumericalPoint(2,1.0)): new = [(2.3, 2.2); (4.1, 2.0); (3.3, 3.7)],

  • new = numSample.scale(NumericalPoint(2,2.0)): new = [(2.6, 2.4); (8.2, 2.0); (4.6, 5.4)],

  • new = numSample.erase(1): new = [(1.3, 1.2); (2.3, 2.7)].


1.2.15 UC : Draw of clouds

The objective of this Use Case is to draw a cloud of points when points are bidimensional and all the possible pairs of components when the points are of dimension n>2.

Requirements  

one sample of dimension 2 : sample2d

type:

NumericalSample

one sample of dimension n=4 : sample4d

type:

NumericalSample

 
Results  

the files containing the cloud graph

type:

files in all possible formats

 

Python script for this UseCase :

script_docUC_InputWithData_CloudDrawing.py

################################### # Create a 2d cloud   # Create the cloud Drawable # cloud : filled squares in blue myCloud = Cloud(sample2d, "blue", "fsquare", "First Cloud")   # Then, insert it into a Graph structure myGraph2d = Graph("My Sample 2d", "x1", "x2", True, "topright") myGraph2d.add(myCloud)   ################################### # Create all the pairs of 2d clouds   myPairs = Pairs(sample4d, 'My Pairs', Description(     ['Var1', 'Var2', 'Var3', 'Var4']), 'red', 'circle')   # Then, insert it into a Graph structure myGraph4d = Graph() myGraph4d.add(myPairs)

Figure (24) draws the superposition of two clouds of dimension 2 and size 1000, realizations of :

  • a Normal distribution with 0 ̲ mean, unit standard deviation and independant components,

  • a Normal distribution with unit-mean, unit-standard deviation and independant components.

Figure (24) draws all the possible pairs of components from a sample of dimension 4 and size N=500, generated by a Normal distribution with correlation matrix R ̲ ̲(C ij ) ij such that the components 1 and 2 are correlated with R 12 =0.8 and the components 3 and 4 are correlated with R 34 =-0.7.

Superposition of two normal numerical sample of dimension 2.
Figure 24
Pairs graph for a sample of dimension 4.
Figure 25

1.2.16 UC : Maximum likelihood of a given probability density function

The objective of this Use Case is to explicitate how to implement the evaluation of the parameters of a fitted distribution thanks to the Maximum Likelihood Principle.

Details on the Maximum Likelihood Principle may be found in the Reference Guide ( see files Reference Guide - Step B – Maximum Likelihood Principle ).

Let us denote (x ̲ 1 ,,x ̲ n ) the sample, p θ ̲ the particular distribution of probability density function we want to fit to the sample, and θ ̲Θ p its the parameter vector.

The likelihood of the sample according to p θ ̲ is :

likelihood(x ̲ 1 ,,x ̲ n ,θ ̲)= i=1 n p θ ̲ (x ̲ i )

It may be implemented :

  • either as a python function thanks to the class OpenTURNSPythonFunction, as explained in the UC.2.1.2,

  • or thanks to the NumericalMathFunction class, as explained in the UC.2.1.3.

The maximum likelihood estimation θ ̲ MLE is solution of the equation :

max θ ̲Θ likelihood(x ̲ 1 ,,x ̲ n ,θ ̲)

or

max θ ̲Θ loglikelihood(x ̲ 1 ,,x ̲ n ,θ ̲)

The following UC illustrates the example of a Normal distribution, where (μ,σ)× + are defined through the maximum likelihood principle, where the log of the likelihood funciton is implemented thanks to the OpenTURNSPythonFunction class.

Note that to avoid underflow problems with the log of the likelihood function, it is necessary to bound the probability density function to a min value eps, here equal to 1.0e-16. The example illustrates how to proceed.

Requirements  

a sample of data : sample

type:

a NumericalSample

 
Results  

the log likelihood function in the openturns library : myLogLikelihoodOT

type:

a NumericalMathFunction

the MLE of (μ,σ) : optimizer

type:

a NumericalPoint

 

Python script for this UseCase :

script_docUC_InputWithData_MaxLikelihood.py

# Compute the log-likelihood instead of the likelihood to avoid underflow # and truncate it to avoid computing log(0)...   class LogLikelihoodFunction(ot.OpenTURNSPythonFunction):       def __init__(self, sample):         ot.OpenTURNSPythonFunction.__init__(self, 2, 1)         self.sample_ = sample       def _exec(self, X):         normal = ot.Normal(X[0], X[1])         logLikelihood = 0.0         # The PDF is assumed to be constant, equal to eps for values smaller         # than eps         for i in range(self.sample_.getSize()):             pdf = normal.computeLogPDF(self.sample_[i])         return [logLikelihood]   # Create the Python function associated to the sample # We suppose we have a numerical sample of data : sample myLogLikelihoodPython = LogLikelihoodFunction(sample)   # Create the NumericalMathFunction of the library openturns myLogLikelihoodOT = ot.NumericalMathFunction(myLogLikelihoodPython)   # Create the research interval of (mu, sigma) lowerBound = ot.NumericalPoint((-1.0, 1.0e-4)) upperBound = ot.NumericalPoint((3.0, 2.0)) finiteLowerBound = ot.BoolCollection((0, 1)) finiteUpperBound = ot.BoolCollection((0, 0)) bounds = ot.Interval(lowerBound, upperBound, finiteLowerBound, finiteUpperBound)   # Create the starting point of the research # For mu : the first point # For sigma : a value evaluated from the two first data startingPoint = ot.NumericalPoint(2) startingPoint[0] = sample[0][0] startingPoint[1] = m.sqrt(     (sample[1][0] - sample[0][0]) * (sample[1][0] - sample[0][0]))   # Create the optimization problem problem = ot.OptimizationProblem(myLogLikelihoodOT, ot.NumericalMathFunction(), ot.NumericalMathFunction(), bounds) problem.setMinimization(False)   # Create the TNC algorithm myAlgoTNC = ot.TNC(problem) myAlgoTNC.setStartingPoint(startingPoint)   # Run the algorithm and extract results myAlgoTNC.run() resMLE = myAlgoTNC.getResult() MLEparameters = resMLE.getOptimalPoint() print("MLE of (mu, sigma) = (", MLEparameters[0], ", ", MLEparameters[1], ")")


1.3 Bayesian modeling

The objective of this section is to build a random vector or a distribution wich parameters follows a distribution.

1.3.1 UC : Creation of a distribution with uncertain parameters

The objective of this Use Case is to create the distribution of X ̲ such that

X ̲|Θ ̲ X ̲|Θ ̲

with

Θ ̲=g(Y ̲)

and

Y ̲ Y ̲ .

The function g is a given function of input dimension the dimension of Y ̲ and output dimension the dimension of Θ ̲.

We call ConditionalDistribution such a distribution.

Requirements  

distribution of X ̲|Θ ̲ : conditionedDist

type:

Distribution

distribution of Y ̲ : conditioningDist

type:

Distribution

the function g : linkFunction

type:

NumericalMathFunction

 
Results  

the conditional distribution : finalDist

type:

ConditionalDistribution

a sample: sampleY

type:

NumericalSample

 

Python script for this UseCase :

script_docUC_InputBayesian_CondDist.py

# Create the conditioned distribution conditionedDist = Uniform()   # Create the conditioning distribution conditioningDist = Uniform(-1.0, 1.0)   # Create the ling function g g = NumericalMathFunction(['y'], ['y', '1+y^2'])   # Create the resulting conditional distribution finalDist = ConditionalDistribution(conditionedDist, conditioningDist, g)   # Generate some realizations sampleY = finalDist.getSample(1000)

The UC illustrates the distribution of X that follows a Uniform(A,B) distribution, with (A,B)=g(Y), g: 2 ,g(Y)=(Y,1+Y 2 ) and Y follows a Uniform(-1,1) distribution.

The figure Fig.26 draws the probability density function of X.

Uniform(Y,1+Y 2 ) distribution with Y∼Uniform(-1,1).
Figure 26

1.3.2 UC : Creation of a multivariate distribution through conditioning

The objective of this Use Case is to create the distribution of the random vector (X ̲,Y ̲) where

Y ̲ Y ̲

and

X ̲|Θ ̲ X ̲|Θ ̲

whith

Θ ̲=g(Y ̲)

where g a given function of input dimension the dimension of Y ̲ and output dimension the dimension of Θ ̲.

Requirements  

distribution of X ̲|Θ ̲ : conditionedDist

type:

Distribution

distribution of Y ̲ : conditioningDist

type:

Distribution

the function g : linkFunction

type:

NumericalMathFunction

 
Results  

the conditional distribution : finalDist

type:

ConditionalDistribution

a sample: sampleY

type:

NumericalSample

 

Python script for this UseCase :

script_docUC_InputBayesian_BayesDist.py

# Create the conditioned distribution conditionedDist = Uniform()   # Create the conditioning distribution conditioningDist = Uniform(-1.0, 1.0)   # Create the ling function g g = NumericalMathFunction(['y'], ['y', '1+y'])   # Create the resulting conditional distribution finalDist = BayesDistribution(conditionedDist, conditioningDist, g)   # Generate some realizations sampleY = finalDist.getSample(1000)

The UC illustrates the distribution of X that follows a Uniform(A,B) distribution, with (A,B)=(Y,1+Y) where Y follows a Uniform(-1,1) distribution.

The figure Fig.27 draws a sample of (X,Y).

Iso-PDF and sample of (X,Y) where X∼Uniform(Y,1+Y) and Y∼Uniform(-1,1).
Figure 27

1.3.3 UC : Bayesian Calibration of a Computer Code

The objective of this Use Case is to explicitate how to implement the evaluation of the parameters of a computer model thanks to Bayesian estimation.

Details on the Principle of Bayesian Calibration may be found in the Reference Guide ( see files Reference Guide - Step B – Bayesian Calibration ).

Let us denote y ̲=(y 1 ,,y n ) the observation sample, z ̲=(f(x 1 |θ ̲),...,f(x n |θ ̲)) the model prediction, p(y|z) the density function of observation y conditional on model prediction z, and θ ̲ p the calibration parameters we wish to estimate.

The following objects need to be defined in order to perform Bayesian calibration with Open-TURNS:

  • the conditional density p(y|z) must be defined as a probability distribution, as explained in the UC.

  • The computer model must be implemented thanks to the NumericalMathFunction class, as explained in the section 2.1. This takes a value of θ ̲ as input, and outputs the vector of model predictions z ̲, as defined above (the vector of covariates x ̲=(x 1 ,...,x n ) is treated as a known constant). When doing that, we have to keep in mind that z will be used as the vector of parameters corresponding to the distribution specified for p(y|z). For instance, if p(y|z) is normal, this means that z must be a vector containing the mean and variance of y (see the example below)

  • the prior density π(θ ̲) encoding the set of possible values for the calibration parameters, each value being weighted by its a priori proabibility, reflecting the beliefs about the possible values of θ ̲ before consideration of the experimental data. Again, this is implemented as a probability distribution

  • The Metropolis-Hastings algorithm that samples from the posterior distribution of the calibration parameters requires a vector θ ̲ 0 initial values for the calibration parameters, as well as the proposal laws used to update each parameter sequentially.

The posterior distribution is given by Bayes' theorem :

π(θ ̲|y ̲)θ ̲Ly ̲|θ ̲×π(θ ̲)

(where θ ̲ means "proportional to", regarded as a function of θ ̲) and is approximated here by the empirical distribution of the sample θ ̲ 1 ,...,θ ̲ N generated by the Metropolis-Hastings algorithm. This means that any quantity characteristic of the posterior distribution (mean, variance, quantile, ...) is approximated by its empirical counterpart.

The following UC illustrates the example of a standard normal linear regression, where

y i =θ 1 +x i θ 2 +x i 2 θ 3 +ε i ,

where ε i i.i.d.𝒩(0,1) and we use a normal prior on θ ̲:

π(θ ̲)=𝒩(0;100I 3 ).

The UC is divided into two Python scripts:

  • the first one shows how to define the requirements of the second according to the normal linear regression example above (it provides some data);

  • the second illustrates how carrying out a Bayesian calibration in a general context using the RandomWalkMetropolisHastings class.

First Python script for this UseCase:

script_docUC_InputWithData_BayesianCalibration1.py

  # Number of covariates covNum = 1 # Dimension of the observations obsDim = 1 # Dimension of the vector of parameters to calibrate paramDim = 3   # The data: obsSize = 10   # 1) Covariate values x = NumericalSample(obsSize, covNum) for i in range(obsSize):     x[i, 0] = -2 + 5. * i / 9.   # 2) Observations y_obs = NumericalSample(obsSize, obsDim) y_obs[0, 0] = -9.50794871493506 y_obs[1, 0] = -3.83296694500105 y_obs[2, 0] = -2.44545713047953 y_obs[3, 0] = 0.0803625289211318 y_obs[4, 0] = 1.01898069723583 y_obs[5, 0] = 0.661725805623086 y_obs[6, 0] = -1.57581204592385 y_obs[7, 0] = -2.95308465670895 y_obs[8, 0] = -8.8878164296758 y_obs[9, 0] = -13.0812290405651   # The likelihood:   # 1) Parametric function which associates, for each observation, some given #    values of the parameters theta to calibrate to the parameters of the #    distribution of the corresponding observation p = NumericalSample(obsSize, paramDim) for i in range(obsSize):     for j in range(paramDim):         p[i, j] = (-2 + 5. * i / 9.) ** j   fullModel = NumericalMathFunction(     ['p1', 'p2', 'p3', 'x1', 'x2', 'x3'], ['z', 'sigma'], ['p1*x1+p2*x2+p3*x3', '1.0']) model = NumericalMathFunction(fullModel, list(range(paramDim)))   # 2) Conditional distribution of the observations #    (here: centered unit normal distribution) conditional = Normal()   # Finally: the prior distribution sigma0 = NumericalPoint(paramDim, 10.)  # standard deviations C0 = CorrelationMatrix(paramDim)       # variance matrix for i in range(paramDim):     C0[i, i] = sigma0[i] * sigma0[i]   m0 = NumericalPoint(paramDim, 0.)      # mean prior = Normal(m0, C0)  


Second Python script for this UseCase :

Requirements  

a prior distribution: prior

type:

a Distribution

a conditional distribution: conditional

type:

a Distribution

a model: model

type:

a NumericalMathFunction

a sample of data: y_obs

type:

a NumericalSample

 
Results  

a Random Walk Metropolis-Hastings (RWMH) sampler: RWMHsampler

type:

a RandomWalkMetropolisHastings

a posterior random vector: myRandomVector

type:

a PosteriorRandomVector

 

script_docUC_InputWithData_BayesianCalibration2.py

  # Additional inputs for the random walk Metropolis-Hastings # sampler constructor:   # 1) Initial state: the prior mean theta0 = prior.getMean()   # 2) Proposal distribution: uniform proposal = DistributionCollection() for i in range(paramDim):     proposal.add(Uniform(-1.0, 1.0))   # Creation of the Random Walk Metropolis-Hastings (RWMH) sampler RWMHsampler = RandomWalkMetropolisHastings(     prior, conditional, model, p, y_obs, theta0, proposal)   # Tuning of the RWMH algorithm:   # 1) Strategy of calibration for the random walk (trivial example: default) strategy = CalibrationStrategyCollection(paramDim) RWMHsampler.setCalibrationStrategyPerComponent(strategy)   # 2) Other parameters RWMHsampler.setVerbose(True) RWMHsampler.setThinning(1) RWMHsampler.setBurnIn(2000)   # Ready to generate a sample from the posterior distribution # of the parameters theta sampleSize = 10000 sample = RWMHsampler.getSample(sampleSize)   # Look at the acceptance rate # (basic checking of the efficiency of the tuning;-thisIsABlanLine- # value close to 0.2 usually recommended) print('acceptance rate=', end=' ') print(RWMHsampler.getAcceptanceRate())   # It is possible to create a random vector based on the RWMHsampler myRandomVector = PosteriorRandomVector(RWMHsampler)  


Introduction  
Table of contents
Creation of the limit state function and the output variable of interest