Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Homework 2

STAT 5511 (Fall 2021)

The usual formatting rules:

Your homework  (HW) should be formatted to be easily readable by the grader .

•   You may use knitr or Sweave in general to produce the code portions of the HW . However, the output from knitr/Sweave that you include should be only what is necessary to answer the question, rather than just any automatic output that R produces .  (You may thus need to avoid using default  R functions if they output too much unnecessary material, and/or should make use of invisible() or capture.output().)

–   For example:  for output from regression, the main things we would want to see are the estimates for each coefficient (with appropriate labels of course) together with the computed OLS/linear regression standard errors and p-values .  If other output is not needed to answer the question, it should be suppressed!

•   Code snippets that directly answer the questions can be included in your main homework document; ideally these should be preceded by comments or text at least explaining what question they are answering .  Extra code can be placed in an appendix .

•   All plots produced in R should have appropriate labels on the axes as well as titles .  Any plot should have explanation of what is being plotted given clearly in the accompanying text .

•   Plots  and  figures  should  be appropriately sized, meaning  they  should  not  be  too  large,  so  that  the  page  length  is  not  too  long .    (The  arguments fig .height and fig .width to knitr chunks can achieve this .)

•   Directions for by-hand” problems:  In general, credit is given for  (correct) shown work, not for final answers; so show all work for each problem and explain your answer fully.

Questions:

1.  (Prediction using the cross-correlation function) Assume that Yt  = aXt −ℓ + Wt  for some number a. The series Xt  leads Yt  if ℓ > 0 and is said to lag Yt  if ℓ < 0. Assume that E(Xt) = E(Yt) = 0, that {Xt} is stationary and that Wt ∼ WN(0, σ2 ) is uncorrelated with the whole series Xt . Let γx denote the autocovariance function of {Xt}.

(a)  Is Yt  stationary?

(b)  Compute the cross covariance function between Yt  and Xs, for any s and t.  (Your answer will depend on γx, the autocovariance function of Xt .)

(c)  Compute the cross correlation function between Yt  and Xs, for any s and t.  (Your answer will depend on γx, the autocovariance function of Xt .)

Solution:

(a)  We have EYt = aEXt −ℓ . We have Var(Yt) = a2 Var(Xt ℓ)+σw(2) = a2 γx(0)+σw(2) by independence of Xt −ℓ and Wt, and stationarity of Xt .  For h > 0, Cov(Yt, Yt h) = Cov(aXt ℓ, aXt −ℓ −h) = a2 γx(h).

Thus, since Xt  is stationary, Yt  has constant mean, constant variance and its autocovariance is a function of the time difference only. We conclude Yt  is indeed (weakly) stationary.

(b)  Cov(Yt, Xs) = Cov(aXt ℓ+ Wt, Xs) = a Cov(Xt ℓ, Xs) + Cov(Wt, Xs) = aγx(|t − ℓ − s|), where we have used that Wt is independent of Xs for all (t, s) and that Xt is stationary by assumption.

(c)  Using the calculation of Var(Yt) from above, the cross-correlation is

Cov(Yt, Xs)/^Var(Yt)Var(Xs) = aγx(|t − ℓ − s|)/^(σw(2) + a2 γx(0))γx(0).

2.  Question 2.3, Shumway and Stoffer, 4th edition (Note:  The question is somewhat different than in previous editions).

Solution:

(a) par(mfrow=c(2,2),mar=c(2.5,2.5,0,0)+0.5,mgp=c(1.6,0.6,0)) for(i in c(1:4)){

x<-ts(cumsum(rnorm(100,0.01,1)))

model<-lm(x~time(x)+0,na.action = NULL)

plot(x,ylab='random walk drift',las=1)

abline(a=0,b=0.01,col=2,lty=2)

abline(model,col=4)

}

2

0

−2

−4

−6

0            20          40           60           80          100

Time

4

2

0

−2

−4

0            20          40           60           80          100

Time

6

4

2

0

−2

0            20          40           60           80          100

Time

10

8

6

4

2

0

−2

0            20          40           60           80          100

Time

The dashed line is the true mean function and the solid line is the fitted one.

(b) par(mfrow=c(2,2),mar=c(2.5,2.5,0,0)+0.5,mgp=c(1.6,0.6,0))

for(i in c(1:4)){

x<-ts(rnorm(100))

y<-0.01*time(x)+x

model<-lm(y~time(x)+0,na.action = NULL)

plot(x,ylab='linear trend plus noise',las=1)

abline(a=0,b=0.01,col=2,lty=2)

abline(model,col=4)

}


2

1

0

−1

−2

0            20          40           60           80          100

Time


2


1


0


−1


−2

0            20          40           60           80          100

Time


2

1

0

−1

−2

0            20          40           60           80          100

Time


2

1

0

−1

−2

0            20          40           60           80          100

Time

The dashed line is the true mean function and the solid one is the fitted one.

(c)  This question explores two very different models or data generating mechanisms. The estimated line based on the linear trend model does quite well (“is consistent”, we say) whereas based on the random walk it does poorly.

We  saw  in  class  the  theoretical  property  that  random  walks  are  nonstationary  because  the variance of a random walk accumulates over time.  This simulation shows what it means that the trend” that we  (think we) see in a random walk is actually variance rather than a true trend.  (The four different instantiations of the random walk had four different trends”, whereas the four different simulations in (b) had very similar trends.)

One way to think about this is to think about prediction: predicting future values based on the estimates in part (b) will tend to do well, but in part (a) the estimated line will be useless for prediction.

Another thing to notice is that the series as a whole is much more variable in (a) than in (b). For instance, the last observation (X100) goes from around -10 to +4 in (a) whereas in (b) it is always between −2 and 2.

3.  Question 2.10, Shumway and Stoffer.  For  (f)(iii), you can do both analysis of the residuals as you would in a non-time series context (e.g., a QQ-plot) and analysis of the correlation of the residuals (using the ACF).

Solution:

(a) library(lattice)

library(astsa)

par(mfrow=c(1,1))

plot.ts(gas,ylab="price",main="gas and oil",ylim=c(25,325),col='1',las=1)

lines(oil, col='2')

legend("topright", legend = c("gas","oil"),lty = 1:1, col = 1:2, bty = 'n', cex=0.6)

gas and oil

gas

oil

2000              2002              2004              2006              2008              2010

Time

The series look like random walks, perhaps with drift.  So, it is not stationary.  There is one visible very large jump present.   Excluding that one, there are still several other quite large jumps present.   This suggests there are periods of heavy volatility or heavy tailed behavior. Ignoring those, the random walk (with drift) model seems reasonable.

(b)  If Xt+1  = (1 + r)Xt then log(Xt+1/Xt) = log(1 + r) which is approximately r if r is close to 0. so,∇ log(xt) is a good approximation.

(c)  gas_gr <- diff(log(gas))

oil_gr <- diff(log(oil))

plot.ts(gas_gr,main="oil and gas growth rate",ylab = 'growth rate',col='1',las=1) lines(oil_gr, col='2')

legend("topright", legend = c("gas growth rate","oil growth rate"),

lty = 1:1, col = 1:2, bty = 'n', cex=0.6)

oil and gas growth rate

2000

2002

2004

2006

Time

2008

2010

par(mfrow=c(2,1))

acf(gas_gr,main = 'gas growth rate')

acf(oil_gr,main= 'oil growth rate')


gas growth rate

0.3

Lag

oil growth rate

0.3

Lag

We can see that the transformed data looks fairly stationary since most of the ACF (excluding lag 0) lies within the 95% confidence interval.


(d) par(mfrow=c(1,1))

ccf(oil_gr, gas_gr, main = 'gas growth rate & oil growth rate',

ylab = 'CCF',las=1)

gas growth rate & oil growth rate


0.6



0.4



0.2



0.0



−0.4                  −0.2                   0.0                    0.2                    0.4

Lag

The plot here is of γoil,gas (h) = Cov(Ot+h, Gt). The strongest correlation on the plot is at h = 0; the two series are strongly contemporaneously correlated.  Significant CCF values in this plot with lag > 0 indicate that gas leads oil; significant values when the lag is < 0 indicate that oil leads gas. We know that oil is used to create gas and so we would expect, a priori, that oil would lead gas. That would indicate we would see significant values with lag ≤ 0. We do indeed see that at a one week lag (h = −1) oil significantly leads gas. (It is debatable whether oil leads gas at 3 weeks, h = −3.) From this plot at lag h = 3 we also see that gas seems to lead oil by three weeks, and maybe also at weeks h = 1, 2. As the textbook mentions, this might be considered to be a feedback loop (e.g., where the price of gas is high and so oil sellers decide/realize they could increase the price of oil and gas sellers would still pay for it).



(e) lag2.plot(oil_gr,gas_gr,3,corr=T,smooth=T)



0.66

−0.2         −0. 1          0.0           0.1           0.2

oil_gr(t−0)