MATH6173 Statistical Computing — Coursework 2
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH6173 Statistical Computing — Coursework 2
Instructions for Coursework 2
1. This coursework assignment is worth 50% of the overall mark for the module.
2. The deadline is 1500 on Wednesday 11 January 2023 and your completed work must be submitted electronically via Blackboard.
3. Your submission must consist of exactly one written report, which contains the answers (including the required mathematical derivation and graphs) to the relevant questions, and one R script file, which contains all the R code you used to obtain your results all questions. In the report file, you must label clearly to which questions the answers are associated. In the R script, you must use comments to make it clear to which questions the R codes are answering. Penalties will be applied if any R codes required are missing from the R script. Please ensure the submitted files have the correct file format; otherwise it cannot be marked. Note an R markdown file is a different format to an R script file.
4. Your written report should be in PDF format. Please name your report as
5. Your R script file must have the filename
6. Your entire R script file must be reproducible and run without any errors.
7. You must present elegant and informative plots in the written report, such as using appropriate plotting area, points/lines sizes and colours. Also please include meaningful labels, titles and legends.
8. You must informatively comment your R code including indication of where each question and task begins. When writing an R function, the 1) function and 2) argument names must be exactly the same as those specified in the questions; otherwise a heavy penalty will be applied.
● Please note that uppercase and lowercase of the same alphabet are not the same character, so if the question asks to write a function/argument called, apple, but a function/argument called
applE is presented in your answer, than that would be wrong.
9. You must not load any add-on packages, e.g., using function library or require. Otherwise, a penalty will incur.
10. The standard Faculty rules on late submissions apply: for every weekday, or part of a weekday, that the coursework is late, the mark will be reduced by a factor of 10%. No credit will be given for work which is more than one week late.
11. All coursework must be carried out independently. You are reminded of the University’s Academic Integrity Policy, see https://www.southampton.ac.uk/quality/assessment/academic_integrity.page.
12. In the interests of fairness, queries which relate directly to the coursework must be made via the MATH6173: Statistical Computing Discussion Forum on Blackboard. This ensures that every- body has access to the same information.
Total marks available: 100
Question 1: A-R Sampling
[23 marks]
Consider the target distribution π with the probability density function (pdf),
where the parameters a > 0, b > 0 and s > 0.
The goal is to draw samples from π(x) using A-R sampling, that has a proposal density set to the double exponential density c(λ) with the pdf
where the rate parameter λ > 0, and b is the same as that in equation (1).
(a) [4 marks] Assume the parameter values a = 2, b = 3 and s = 1.25. There are two potential values of interest, 1 and 0.5, for the parameter λ in equation (2). In the report file, explain which of those two values for λ produces a proposal density that has a higher acceptance probability?
Note: Rigorous proof is not required for this question, but the reasoning must be clear and logical. Figures may be useful to support your explanation.
(b) [6 marks] Use the parameter values a = 2, b = 3, s = 1.25 and the chosen value of λ in part (a) to design an A-R sampling algorithm to generate random values from π(x). Also, select an appropriate
.
Important:
● In the description of the algorithm, any equations and mathematical expressions must be specific to the context of this question.
● You must present the description of the algorithm in the report file .
(c) [6 marks] Write an R function called q1 .ar with the following features.
Arguments:
● n: a positive integer value representing the number of random values to be generated.
● a: a positive numeric value for parameter a in equation (1).
● b: a positive numeric value for parameter b in equation (1) and (2).
● s: a positive numeric value for parameter s in equation (1).
● lambda: a positive numeric value for parameter λ in equation (2).
● M: a positive numeric value.
Computation:
● This R function implements the algorithm designed in part (b), but allow a, b, s, λ and M to be specified through the arguments.
Returns:
● A numeric vector containing the sequence of n values drawn from π(x) defined equation (1).
(d) [3 marks] Set the seed to 2022, and use the R function q1 .ar created in part (c) to generate 5000 random values with parameter values a = 2, b = 3, s = 1.25 and the chosen value of λ in part (a). In the report file, compare their empirical distribution with equation (1) by presenting the distribution of the simulated samples as a histogram, and on that same figure plot a line representing the true pdf ofπ(x) in equation (1).
Important: This question asks for the same type of figure as the solution to Questions 2 (c) of the
Week 9 Lab.
(e) [4 marks] The kth root of the kth moment of X is given by
(E [X k ]) = ┌.-o(o) xk π(x)dx┐ k . (3)
Write R commands to estimate the statistic specified by equation (3) for k = 2 and k = 3, using the 5000 values already sampled in part (d). Report your answers (to 3 d.p.) as comments below your R commands for this question in the R script.
Question 2: Importance Sampling
[ 19 marks] Measurements of precipitation are usually presented as percentages in weather forecasts, e.g., “The precipitation at 3 PM today is 3%”. Let {X/ , ..., Xd } be a sequence of d + 1 precipitation values (converted from percentage to proportion), where the Xi e [0, 1] is recorded at time ti with til1 2 ti . The logit of the precipitation value at time ti is defined as
Ui = log ╱ 1 乂(X)X(i)i 、 for all i e {0, ..., d} . (4)
We model the sequence U = (U1 , ..., Ud ) by a continuous-time random walk conditioned on U/ = u/ , which
means
Ui ~ Normal(µ = Ui - 1 , σ 2 = ti 乂 ti - 1 ) for all i e {1, ..., d} , (5)
where µ and σ 2 are the mean and variance of a normal distribution.
(a) [4 marks] Let a = (a1 , ..., ad ) with 0 < ai < 1 for all i e {1, ..., d}. We are interested in estimating the probability
ptail (a) = Pr(X1 > a1 , X2 > a2 , ..., Xd > ad IX/ = x/ ) (6)
by the standard importance sampling.
Consider the shifted exponential distribution that has the pdf
f (x; λ, s) = λ exp [乂λ(x 乂 s)] for x 2 s, (7)
where the parameter s e R shifts the exponential distribution. In the report file, explain how should we use f (x; λ, s) in equation (7) to form a proposal distribution q for generating the random vector U , such that q avoids wasting simulations on sampling vectors that will not be used to calculate ptail (a)?
(b) [6 marks] Design an importance sampling algorithm that uses the proposal density q formed in part (a) to find ptail in equation (6), given the initial precipitation value X/ = x/ .
Important:
● In the description of the algorithm, any equations and mathematical expressions must be specific to the context of this question.
● You must present the description of the algorithm in the report file .
(c) [6 marks] Write an R function called q2 .is with the following features.
Arguments:
● n: a positive integer representing the number of samples to be generated to calculate ptail .
● x0: a numeric value representing the observed value of X/ .
● time: a numeric vector representing the times {t/ , ..., td } when the d values of {X/ , ..., Xd } are observed.
● a: a numeric vector containing the d values of {a1 , ..., ad } in equation (6).
● lambda: a numeric value representing the value of the parameter λ in equation (7).
Computation:
● This R function implements the algorithm designed in part (b) to estimate ptail (a).
Returns:
● A numeric value representing the estimate of probability ptail (a) calculated in Computation.
(d) [3 marks] Set the seed to 2022, and use the R function created in part (c) to estimate Pr(X1 > 50, X2 > 70, X3 > 90IX/ = 30) with t/ = 0, t1 = 0.7, t2 = 1.2, t3 = 2.5 by generating 100000 samples and set λ = 0.5. Report your answers (to 3 significant figures) as comments below your R commands for this question in the R script.
Question 3: Markov Chain Monte Carlo (MCMC)
[20 marks]
Recall the Markov model described in Question 4 of Coursework 1. Assume that data has been collected on
the weather states for n days, X = {X1 , ..., Xn } with Xi e X = {s, c, r} for all i e {1, ..., n}, where s = sunny, c = cloudy and r = rainy. However, the transition probabilities PT are unknown and are to be estimated in a Bayesian framework. Based on the Markov model, the likelihood of PT is given by
n
p(XIPT , π) = p(X1 ) ) pT (Xi IXi - 1 ), (8)
i=2
where pT (Xi = bIXi - 1 = a) with a, b e X is the entry pab in the transition probability matrix
PT = ╱、
(prs prc prr ) , (9)
and the initial probabilities for the weather state is given by π = (πs , πc , πr ), i.e., Pr(X1 = x) = πx for x e X.
The Dirichlet distribution is the multivariate extension of the Beta distribution and provides a distribution
over random vectors V = (V1 , ..., Vd ) with the constraint lj(d)=1 vj = 1. The pdf of a d-dimensional Dirichlet (a)
distribution is defined as
which has the parameters a = (a1 , ..., ad ) with a1 , ..., ad > 0.
Let px = (pxs , pxc , pxr ) be a random vector with values in the row of PT conditioned on weather state x. For all x e X, we apply a Dirichlet(1, 1, 1) prior on px , i.e.,
px ~ Dirichlet(1, 1, 1) for all x e X, (11)
where Dirichlet(1, 1, 1) is a uniform distribution over all vectors V , and px is a priori independent of the values of π . Thus, given that π is known, the posterior distribution of PT follows
p(PT IX, π) x p(XIPT , π). (12)
(a) [5 marks] Design an algorithm that uses MCMC to estimate the posterior distribution of PT . At each step j of the MCMC, use Dirichlet ╱20 x px(/j) - 1户 、as the proposal density q(pIpx(/j) - 1户 ), where p is the proposed vector for step j, while px(/j) - 1户 is the sampled vector for step j 乂 1.
Important:
● In the description of the algorithm, any equations and mathematical expressions must be specific to the context of this question.
● You must present the description of the algorithm in the report file .
(b) [3 marks] Write an R function called calcDirichletPDF with the following features.
Arguments:
● x: a non-negative numeric vector wherein the elements sum up to 1.
● a: a positive numeric vector that contains the values of a in equation (10) and has the same length as x.
Computation:
● Use equation (10) to evaluate the natural logarithm of the density for x under the Dirichlet distribution with parameters specified by a.
Hint: The lgamma function may be useful here.
Returns:
● A numeric vector representing the log pdf calculated as in Computation.
(c) [3 marks] Write an R function called simDirichletRV with the following features.
Arguments:
● a: a positive numeric vector containing the values of a in equation (10).
Computation:
● For each j in {1, ..., d}, where d is the number of elements in a, sample Uj
Uj ~ Gamma(aj , 1), (13)
where aj is the jth element of a.
● Compute the vector
where V is a vector drawn from the Dirichlet distribution with parameters specified by a.
Returns:
● A numeric vector representing V (in equation 14) as generated in Computation.
(d) [6 marks] Write an R function called q3 .mcmc with the following features.
Arguments:
● X: a character vector, where each element must be one of the three character ,
● initStateProbs: a numeric vector containing three elements, where the first, second and third elements respectively correspond to the initial probabilities πs , πc and πr .
● initPT: a numeric 3 x 3 matrix representing the initial values of PT .
● burnin: a numeric value represents the length of chain for the burnin period of the MCMC. Note: Here, the burnin period of the chain does not include the initial value of PT .
● n: a numeric value indicating the length of chain after the burnin period of the MCMC.
Computation:
● Using the calcDirichletPDF and simDirichletRV functions created in part (b) and (c), respec- tively, this R function implements the MCMC algorithm designed in part (a) to produce the posterior distribution of PT .
Returns:
● An n x 3 x 3 array object, which contains n posterior samples of PT generated after the burnin period in Computation.
(e) [3 marks] Set the seed to 2022, and use the q3 .mcmc function created in part (d) to produce 100000 samples from the posterior distribution of PT after removing the first 10000 samples as burnin. Assume that π = (1/3, 1/3, 1/3) and that every element in the initial value of PT is 1/3. The value of X is recorded in the text file q3Weather .txt. Calculate the posterior mean of PT . Report the estimated posterior mean (to 3 d.p.) as comments below your R commands for this question in the R script.
Question 4: Mixture of Linear Regressions
[30 marks]
Let Y = (Y1 , ..., Yn ) be a vector of outcomes and x = (x1 , ..., xn ) be the values of the covariate X for n observations. Assume that Y is generated from a mixture model of K linear regressions, where K is a positive integer. That is the ith outcome Yi is generated from its covariate value xi and one of K linear regressions in the mixture model, such that
Yi IXi , Zi = k ~ Normal(ak + bk xi , σk(2)), (15)
where ak , bk and σk are the intercept, slope and standard deviation of the error terms for the kth linear regression. The random variable Zi indicates from which linear regression Yi is generated, we assume
Zi ~ Multinimial(π1 , ..., πK ), (16)
which means p(Zi = k) = πk .
Consider the incomplete data scenario, where the variables Zi ’s are hidden. Let θ be the parameters of a mixture of linear regression. Specifically, θ = {π, a, b, σ}, where π = (π1 , ..., πK ), a = (a1 , ..., aK ), b = (b1 , ..., bK ) and σ = (σ1 , ..., σK ).
Follow parts (a)– (d) outlined below to write a program in R that uses the expectation-maximisation (EM) algorithm to find the maximum likelihood estimates (MLEs) of the parameters θ .
(a) [ 6 marks] Recall that
p(Zi = kIXi , θ cur ) = p(Zi = k)p(Xi IZi = k, θ cur )
where θ cur = {πcur , acur , bcur , σ cur } with the vectors π cur , acur , bcur and σ cur respectively representing the current values of π , a, b and σ .
Write an R function, called linRegMixEstep, which has the following features.
Arguments :
(1) piCur: a numeric vector that contains K elements representing values in π cur .
(2) aCur: a numeric vector that contains K elements representing values in acur .
(3) bCur: a numeric vector that contains K elements representing values in bcur .
(4) sCur: a numeric vector that contains K elements representing values in σ cur .
(5) x: a numeric vector of length n, containing the values of x.
(6) Y: a numeric vector of length n, containing the values of Y .
Computation :
● This R function calculates the conditional distribution p(Zi = kIxi , Yi , θ cur ) for all k e {1, ..., K} and all i e {1, ..., n}.
Returns :
● A numeric matrix with n rows and K columns, where the entry in the ith row and kth column is the value for
p(Zi = kIxi , Yi , θ cur ).
(b) [ 9 marks] Let θ new = (πnew , anew , bnew , σ new ) with π new = (π1(n)ew , ..., πK(new)), (a1(n)ew , ..., aK(ne)w ), (b1(n)ew , ..., bK(ne)w ) and (σ1(n)ew , ..., σK(new)). The expressions for elements of θ new that optimize the function Q(θ, θcur ) is given by
π k(new) = , (18)
╱b(a)nknke(e)w(w)、 = (xT wk x)- 1 xT wk Y, (19)
σ k(new) = 1 l p(Zi l(=)xi(乂),θ(a)nkc乂 bk(n)ew xi )2 , (20)
for all k e {1, ..., K}. The design matrix x has n rows and two columns. In x, every element in the first column has the value 1, while the second column is the vector x. wk is an n x n diagonal matrix with ith diagonal entry being p(Zi = kIxi , θ cur ).
(i) Write an R function, called calcNewCoefs, which has the following features. Arguments :
(1) W: the n x K numeric matrix produced by the linRegMixEstep function in part (a).
(2) x: same as in part (a).
(3) Y: same as in part (a).
Computation :
● This R function calculates anew = (a1(n)ew , ..., aK(ne)w ) and bnew = (b1(n)ew , ..., bK(ne)w ) defined by equation (19).
Returns :
● A list containing two objects:
(1) The first object is the numeric vector representing anew , and
(2) the second object is the numeric vector representing bnew .
(ii) Write an R function, called calcNewSd, which has the following features.
Arguments :
(1) W: the n x K numeric matrix produced by the linRegMixEstep function in part (a).
(2) x: the same as in part (a).
(3) Y: the same as in part (a).
(4) a: a numeric vector containing K elements, which represent the appropriate intercept estimates for calculating σ new .
(5) b a numeric vector containing K elements, which represent the appropriate slope estimates for calculating σ new .
Computation :
● Calculate σ new , σK(new)) defined by equation (20).
Returns :
● A vector of length K representing σ new .
(iii) Write an R function, called linRegMixMstep, which has the following features.
Arguments :
(1) W: the n x K numeric matrix produced by the linRegMixEstep function in part (a).
(2) x: same as in part (a).
(3) Y: same as in part (a).
Computation :
● This R function finds values of θ that optimise the function Q(θ, θcur ) by using equation (18) to calculate π new = (π1(n)ew , ..., πK(new)), while using the functions created in parts (i) and (ii) to
calculate anew bnew and σ new .
Returns :
● A list containing four objects, where
– the first object is a numeric vector containing π new ,
– the second object is a numeric vector containing anew ,
– the third object is a numeric vector containing bnew , and
– the forth object is a numeric vector containing σ new .
(c) [4 marks] Write an R function, called linRegMixCalcLogLik, which has the following features. Arguments :
(1) x: same as in part (a).
(2) Y: same as in part (a).
(3) piCur: same as in part (a).
(4) piNew: the numeric vector π new computed by the linRegMixMstep function in part (b).
(5) aCur is the same as in part (a).
(6) aNew: the numeric vector anew computed by the linRegMixMstep function in part (b).
(5) bCur is the same as in part (a).
(6) bNew: the numeric vector bnew computed by the linRegMixMstep function in part (b).
(7) sCur is the same as in part (a).
(8) sNew: the numeric vector σ new computed by the linRegMixMstep function in part (b).
Computation :
● This R function calculates the log likelihoods of a mixture linear regressions for θ cur and θ new respectively.
Returns :
● A numeric vector, where
– the first element is the log likelihood of a mixture of linear regressions given the parameter values θ cur , and
– the second element is the log likelihood of a mixture of linear regressions given the parameter values θ new .
Note: Here, ‘log’ refers to the natural logarithm.
(d) [ 6 marks] Write an R function, called linRegMixEM, which has the following features. Arguments :
(1) piInit: numeric vector that contains the initial values for the vector T .
(2) aInit: numeric vector that contains the initial values for the parameter vector a.
(3) bInit: numeric vector that contains the initial values for the parameter vector b.
2023-01-12