MATH2931 Assignment/Quiz 2 Questions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
Assignment/Quiz 2 Questions
1. Assume a Gaussian linear model:
Y ∼ N(Xβ,σ2 In ),
where X ∈ Rn ×p , β ∈ Rp , and σ 2 are fixed/given matrix, vector, and scalar, respectively.
(a) Write down the joint pdf: g(y |β,σ2 , X).
(b) Compute the maximum likelihood estimate of β:
= argminlng(y |β,σ2 , X).
β
(c) Compute the maximum likelihood estimate of σ 2 :
2 = argminlng(y | ,σ 2 , X).
2
2. Suppose that Y ∼ N(Xβ,σ2 In ), where X ∈ Rn ×p , β ∈ Rp , and σ 2 are fixed/given matrix, vector, and scalar, respectively. Let
= X+ Y
2 = ∥Y − X∥ /(n2 − T),
where T = rank(X). Show that:
(a)
E = β ,
when rank(X) = T = p.
solution: Since X is full rank we have
X+ = (X⊤X)+ X⊤ = (X⊤X)−1 X⊤
Hence, E = X+ Xβ = β .
(b)
E2 = σ 2 ,
using the identity EX⊤AX = µ⊤Aµ + tr(AVar(X)).
solution: For this part using the projection property (I − XX+ )2 =
(n − r)E2 = E∥Y − X∥2
= E∥Y − XX+ Y ∥2
= E∥(I − XX+ )Y )∥2
= EY⊤ (I − XX+ ) Y2
= EY⊤ (I − XX+ )Y
= E[Y]⊤ (I − XX+ )E[Y] + tr((I − XX+ )Var(Y )) = E[Y]⊤ (I − XX+ )Xβ + σ tr(I2 − XX+ )
Therefore using
(I − XX+ )X = X − XX+ X = X − X = 0
we obtain
(n − r)E2 = σ tr(I2n − XX+ ) = σ (n2 − r),
because using the SVD we can write
X+ X = (USV⊤ )+ USV⊤
= (VS+ U⊤ )USV⊤
= VS+ SV⊤
= V [0(Ir) 0(0)] V⊤ .
and hence
tr(XX+ ) = tr(X+ X) = tr [V [0(Ir) 0(0)] V⊤] = r
If X is full rank, then this is simpler to write, because
tr(XX+ ) = tr(X+ X) = tr((X⊤X)−1 X⊤X) = tr(Ip ) = p = r. In summary, an unbiased estimator of σ 2 is
2 = ∥Y − XX+ Y ∥2 .
n − p
Recall that in first year you are taught that
n
(c)
X+ (X+ )⊤ = (X⊤X)+ .
solutions: We prove this by simply substituting and using X+ = (X⊤X)+ X⊤ :
X+ (X+ )⊤ = (X⊤X)+ X⊤ ((X⊤X)+ X⊤ )⊤
= (X⊤X)+ X⊤X[(X⊤X)+]⊤
= (X⊤X)+ X⊤X(X⊤X)+
= A+ AA+ = A+
= (X⊤X)+
(d)
[Y X] ∼ N ([X+o(X)β] , σ 2 [(X0(⊤X))+ In − X(0)X+])
Hence, deduce that is independent of ∥Y − X∥2 .
[X(X)2(1)] ∼ N ] , ]) .
We know that Y is multivariate Gaussian and any linear transformation of a Gaussian variable yields another multivariate Gaussian. Thus, from
[Y X] = [石In X+]别Y
A
we can conclude that [Y X] is multivariate Gaussian with mean [In X(+)X+] E[Y] = [In X(+)X+] Xβ = [X+o(X)β] .
The covariance is:
Var(AY ) = σ AA2 ⊤
= [In X(+)X+] [(X+ )⊤ In − XX+J
= [(In X(+)X(⊤)+ )⊤ I(+) X(X)2(+))]
= [(X0(⊤X))+ (In − X(0)X+ )]
Therefore,
Var() = σ2 (X⊤X)+
Var(Y − X) = σ (I2 − XX+ )
Cov( , Y − X) = 0
Since and Y − X are jointly normal with zero covariance, then they
other words, and 2 are independent. This is going to be used in the
next part.
(e) If T = p, then
σ 2 = σ 2 ∼ χn −r .
solution: We know that
X ⊤ Σ−1 X ∼ χr(2) ,
where X ∼ N(0, Σ).
In particular, we have the quadratic form
Y ⊤ (In − XX+ )Y /σ2 = Y⊤ (In − XX+ )(In − XX+ )+ (In − XX+ )Y /σ2
= Z⊤ (In − XX+ )+ Z ,
where (Y ∼ N(Xβ,σ2 In ))
Z = (In − XX+ )Y /σ ∼ N(o, In − XX+ ).
example,
E[Z] = (In − XX+ )E[Y]/σ = (In − XX+ )Xβ/σ = o
and
Var(Z) = In − σ(X)X+ Var(Y )In − σ(X)X+ = In − σ(X)X+ σ I2n In − σ(X)X+ = In −XX+ .
Therefore,
Z⊤ (In − XX+ )+ Z ∼ χn(2)−r .
(f) If r = p, then
βˆj − Eβˆj = (βˆj − Eβˆj )/(σ∥ei(⊤)X+ ∥) ∼ tn −r .
solution: From part 5 we know that
(n − r)2 /σ 2 ∼ χn(2)−r
From part 4, we know that 2 is independent of ∼ N(E,σ 2 (X⊤X)+ ),
so that
βˆj − E[βˆj ] ∼ N(0, 1)
σ ^[(X⊤X)+]jj
From Quiz 1 sheet, we have that
βˆj −E[βˆj]
σ ^[(X⊤X)+ ]jj
In other words, we have
^ X(βj)+]jj ∼ tn −p .
From Part 3, we know that
[(X⊤X)+]jj = ej(⊤)(X⊤X)+ ej = ej(⊤)X+ (X+ )⊤ ej = ∥ ej(⊤)X+ ∥2 .
3. For the simple linear regression Y = β0 + β1 x + ϵ, show that R2 is the same as the sample correlation between the response and the explanatory variable:
R2 = (对i (yi − y¯)(xi − ))2
对i (yi − y¯)2 对i (xi − )2.
solution: First, from the definition in the notes on page 39, Section 2.3.7, we know that
R2 = ∥y()12 = y(yˆ) y¯(y¯)1(1)2(2) = 对i ( y(i)i(+)) y¯)2
For a simple linear regression, we know that
b0 = y¯ − b1
b1 = 对i (xi − )(yi − y¯)
对i (xi − )2
Substituting these gives:
对i (b1 xi + y¯ − b1 − y¯)2 = 对i (b1 [xi − ])2
对i (yi − y¯)2 对i (yi − y¯)2
= b1(2) 对i [xi − ]2
对i (yi − y¯)2
= (对i (xi − )(yi − y¯))2
对i (yi − y¯)2 对i (xi − )2,
which completed the proof.
4. Show that Ra(2)djusted ≤ R2 .
solution: We recall that
Ra(2)djusted = 1 − (1 − R )2 n − 1
Therefore,
When n > p ≥ 1 we have that
≥ 1.
Therefore,
1 − Ra(2)djusted n − 1
1 − R2 n − p
We conclude,
(1 − R )2 ≤ 1 − Ra(2)djusted
Hence,
Ra(2)djusted ≤ R2 ,
which makes sense, because Ra(2)djusted is less optimistic about the model than R2 (higher R2 means better fit to the data, possibly overfitting).
5. For the diabetes dataset, compute the 2-fold cross-validation loss as an estimate of the expected generalization risk of the linear learner. Report the numerical value.
solution: Without reordering the data we get: 3250.9
6. For the diabetes dataset, compute the leave- one- out cross-validation loss (the PRESS statistic divided by n) as an estimate of the expected generalization risk of the linear learner, and report the numerical value.
Perform the computation of the leave- one- out cross-validation in two different ways: 1) one using the fast PRESS statistic formula; 2) another using a brute force retraining of the linear learner.
solution: The value for n-fold CV is: 3147
7. For the diabetes dataset, use the estimate
∥(In − XX+ )Y ∥2 2σ2p
where σ 2 ≈ 3000, of the in-sample risk to decide if the following predictors should be jointly included/excluded in the linear model: age,glu,tch,ldl?
After making your decision about which features to include in the model matrix
X, then estimating the corresponding coefficients , create a qq-plot of the residuals y − X .
soln: in-sample risk estimate using all of the predictors is 3137.9 (here p in- cludes the constant feature); in-sample risk estimate after removing the pre- dictors is 3088.5; Thus, we prefer dropping these predictors.
The coefficients estimated after dropping the predictors are:
152.43, −233.31, 576.45, 287.26, −171.16, −197.03, 620.38
The residuals look like this
and a qqplot looks like this:
8. For the diabetes dataset, compute a 95% numerical confidence interval for βj that corresponds to the predictor “age” .
answer approximately [-381 , -86]
9. Download file risk .csv. The goal is to predict risk from the other variables. Do an F-test to check if the explanatory variables are all jointly relevant and report the R2 .
10. Here we use fish .csv dataset. The variables weight (in grams) and length (in millimetres) in this data set are the lengths and weights of 23 different catfish captured in the Kanawha River in Charleston, West Virginia. It was desired to estimate the angler harvest of channel catfish, and for live fish length is much easier to measure than weight. Hence it was of interest to study the length/weight relationship for channel catfish.
(a) Train a simple linear regression model with weight as response and length as predictor.
(b) It is conjectured that the weight of a fish varies with length by the follow- ing relationship:
log10 (Y) ≈ β0 + β1 log10 (x) + ϵ,
where y is the weight and x is the length.
Train a simple linear regression model with log-weight as response and log-length as predictor.
(c) Plot a scatterplot and estimate the generalization risk for both models. Explain which mode is preferable based on the scatterplot and gen. risk. Assume interest is in prediction of y in natural units (not in log units).
answer: The coefficient on the raw data is: b = [ −884.31, 3.8444]⊤ . The
The following gives the visual diagnostic. The left plots correspond to the linear fit
b0 + xi b1
and the right plots correspond to the fit
100 +log 10 (xi)1
From top to bottom we have the lines of best fit, residuals, and scatterplots.
1000
800
600
400
200
0
200 250 300 350 400 450 500
1200
1000
800
600
400
200
0
200 250 300 350 400 450 500
200
150
100
50
0
-50
- 100
200 250 300 350 400 450 500
150
100
50
0
-50
- 100
200 250 300 350 400 450 500
QQ Plot of Sample Data versus Standard Normal
200
100
0
- 100
-200
-2 - 1 0 1 2 Standard Normal Quantiles
150
100
50
0
-50
- 100
-2 - 1 0 1 2
Standard Normal Quantiles
13
2023-01-10