MATH2697 Statistical Modelling II 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MATH2697
Statistical Modelling II
2022
SECTION A
1. An air pollution monitoring station in the city of Munich has recorded daily average SO2 concentrations over a period of 14 consecutive days. We denote the logarithms of these daily averages by y1 , . . . , y14 , which will form our response variable to which we refer as “pollution” in what follows. We are interested in modelling the responses yi in dependence of the daily average temperatures x1 , . . . , x14 (recorded in degrees Celsius on the same 14 days), as well as an indicator zi which takes the value 0 if day i is a weekday, and 1 if day i is a Saturday or Sunday. The full data set is provided below.
i 1 2 3 4 5 6 7
yi |
-3.147 |
-2.830 |
-3.016 |
-3.079 |
-3.541 |
-2.976 |
-2.781 |
xi |
16.47 |
16.02 |
16.81 |
22.87 |
21.68 |
21.23 |
20.55 |
zi |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
i |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
yi |
-3.352 |
-2.765 |
-1.897 |
-2.120 |
-2.453 |
-1.973 |
-2.235 |
xi |
18.32 |
15.96 |
15.36 |
12.47 |
12.46 |
11.77 |
11.72 |
zi |
0 |
0 |
0 |
1 |
1 |
0 |
0 |
(a) We are fitting the linear model yi = β1 + β2 xi + β3 zi + ∈i . Write down the first
four rows of the design matrix, X .
(b) Denote C = (XT X)-1 and s2 the usual unbiased estimator of the error vari-
ance. You can use in what follows that
C = ╱ . ←0.01627
、
←0.00510 0.35484 . ,
s2 = 0.1338985, and XT (y1 , . . . , y14 )T = ( ←38.1650, ←656.4754, ← 11.1930)T . Find βˆj , j = 1, 2, 3, and their standard errors SE(βˆj ), j = 1, 2, 3.
(c) Assume that on a particular Tuesday one observes an average temperature x0 = 16.5○ . We would like to predict the true, unknown pollution y0 on this day, using the fitted model. Hence, find
(i) the predicted pollution, yˆ0 , on that day;
(ii) a 95% confidence interval for the expected pollution E(y0 ex0 , z0 = 0) on
that day;
(iii) a 95% prediction interval for the actual pollution y0 on that day.
2. For a linear model of type Y = Xβ + e, with β ∈ Rp , the hat matrix is given by H = X(XT X)-1 XT .
(a) Show HHT = H and Tr(H) = p.
(b) Of particular interest are the diagonal values of H, the so-called leverage values
hi , i = 1, . . . , n. Show 0 × hi × 1.
(c) The figure below shows a plot of hi versus i for a linear model fitted to a particular data set with n = 46.
i. Give an interpretation of this plot.
ii. Is this plot useful to judge whether the linear model fit would change considerably if the observation labelled “8” were removed from the data set? If so, provide this judgement. Otherwise, suggest an alternative measure to deal with this question (no formulae necessary).
iii. The mean of the plotted leverage values is 0.06521739. Find p.
8 +
|
||||||||
|
|
|
|
|
|
40
Index i
SECTION B
3. We consider a linear model Y = Xβ + e, with β ∈ Rp and e ~ Nn (0, σ2 In ), where 0 denotes a vector of appropriate length consisting only of zeros, and In is the n 一 n identity matrix. Denote by = (XT X)-1 XT Y the least squares estimator of β , and s2 = (Y ← X )T (Y ← X ) the unbiased estimator of σ 2 .
(a) Derive the expectation and variance of . Hence, give the sampling distribution of .
(b) Write down the expression for the (squared) Mahalanobis distance between Y
and Xβ , and give its distribution.
(c) Write down the expression for the (squared) Mahalanobis distance between and β , and give its distribution.
(d) Prove the decomposition
(Y ← Xβ)T (Y ← Xβ) = (Y ← X )T (Y ← X ) + (β ← )T XT X(β ← ).
(e) Using (b), (c), and (d), justify that, after appropriate standardization, the
sampling distribution of s2 is given by a χ2 distribution, that is
cs2 ~ χk(2) , (1)
and give the constant c, as well as the degrees of freedom k. [Note: Please explain your line of reasoning, but no formal proof is required. In particular, you do not need to show that and s2 are independent.]
(f) Give E(s2 ), and develop a formula for Var(s2 ). [Hint: You can use that Var(χk(2)) = 2k, for k ∈ Z+ . If you could not solve part (e), please work with equation (1) as displayed.]
4. We are given a multiple linear regression model in the form yi = xi(T)β + ∈i , i = 1, . . . , n, where β ∈ Rp , X ∈ Rn ×p , and Y = (y1 , . . . , yn )T .
(a) Show that, for models involving an intercept, one has XT = 0 and YˆT = 0, where Yˆ and are the vectors of fitted values and residuals, respectively, after the usual least squares fit;
(b) Hence, for models involving an intercept, show that
SST = SSR + SSE (2)
where SST = (yi ← y¯)2 , SSR = (yˆi ← y¯)2 , and SSE = (yi ← yˆi )2 . Also explain why equation (2) is generally not correct if there is no intercept in the model.
(c) The statistic F for the overall F ←test is defined by
SSR/(p ← 1)
F =
SSE/(n ← p) .
Define the coefficient of determination (R2 ) in terms of the quantities introduced in part (b), and find an expression for R2 which only depends on F , n and p.
(d) We are given a real data set with n = 14, which after fitting the linear model with p = 3 (including the intercept) yields the value F = 7.539.
i. Carry out the overall F–test at the 0.01 level of significance.
ii. Compute R2 , and interpret the result. [Note: If you could not solve part
(c), you can make use of the information SSR = 1.999.]
iii. Assume that, for subject–matter considerations, the data analyst decides to remove the intercept. They refit the model using some statistical software, which reports a value R2 = 0.9803 for the fitted model. Does this give evidence that the model without intercept is preferable to the model with intercept? Explain your answer carefully.
2022-05-16