STA302H1F/STA1001HF: Mini Project 1 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STA302H1F/STA1001HF: Mini Project 1
2022
The mini project will be done independently. It will be used to develop your understanding of linear regression properties as well as your data analysis skills which will be relevant for the final project. For the mini project you will be asked to do the following:
❼ You will be required to submit the R Markdown file that contains the codes on Quercus. This
will be important to ensure that your script runs.
❼ Projects should be submitted on time (i.e. by the deadline). Late submissions will receive a
10% penalty for each day that the project is late.
❼ In general, extensions will not be given. In the case of a valid reason for missing the deadline,
the weight will be shifted to another component of the course.
Suppose we want to simulate the following linear model:
β0 + β1Xi + ei
where ei ~ N (0, 22). Assume Xi ~ N (0, 12), β0 = 0.5, β1 = 3.
1. (5 points) Fix the sample size at 100, and the number of simulations at 100. For each simulation obtain a least squares estimation of β 1 from the linear regression model (using the lm function). You may find the following lines of code useful for your simulation:
set .seed(1002224444)
e<-rnorm(n, 0 ,2)
x<-rnorm(n, 0, 1)
y<-0 .5 + 3*x + e
model<- lm(y~x)
n .sim=100
n=10
2. (10 points) Recall that SST = (Yi - i)2 + (i - )2 . Use this relation to compute the correlation coefficient (r) for each least squares estimate βˆ1 .
3. (5 points) Plot the histogram for r . Does the shape of the distribution make sense? Explain why or why not.
4. (5 points) In order to determine whether β 1 is significantly different from 0, we consider the following hypotheses:
-H0 : β 1 = 0
and the test statistic t* =
Show that t* ~ tn-2 . Begin first standardizing each of the estimated βˆ1 - 0 values by s(βˆ1 ). Afterwards, plot the values on a histogram.
5. (5 points) After computing t* for each of the estimates for β 1 . Plot the line for the mean on the histogram from question 4. What does this line tell you about the significance of β 1 ?
6. (5 points) Consider the cofficient of correlation between X and Y , which we write ρ . The maximum likelihood estimator of r is
(Xi - )(Yi - )
Consider the following hypotheses:
and the test statistic t2(*) =^1(r^) .
Compute t2(*) for each of the estimates for β 1 . Plot the line for the mean on the histogram from question 4. What does this line tell you about the significance of r?
7. (5 points) Explain with words and without doing any computations why rejecting the null hypothesis in question 4 is equivalent to rejecting the null hypothesis in question 6.
8. (10 points) Show that t* = t2(*) . Show all the details of your proof. You may use any formulas studied in class.
2022-07-27