Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 3032

Homework 3

(Due: February 15, 2024)

Spring 2024

Show all work for full credit. Assignment is worth 10 points.

Instruction:

Please show your work on each problem for full credit. A correct answer, unsupported by the necessary explanation, R code, and/or output will receive very little, if any, credit. Your work needs to be organized in a reasonably neat and coherent fashion and submitted as a pdf file on Canvas.

Problem I. Heights from Pearson and Lee (1903)

For this we will be working with the famous data set of mother and daughters heights collected in England from 1893 to 1898 by Karl Pearson and Alice Lee. This dataset is available through the alr4 package in R. Install the package if you haven’t already, and access the dataset Heights within using the command data as was done in HW2.

a. Construct a plot with daughter’s height as the response and mother’s height as the predictor. Include the fitted least squares regression line on the plot.

b. Output the summary of the linear model constructed in the previous part.

c. Construct the Residual and QQ diagnostic plots for the linear regression. Comment on any concerns about our assumptions for linear regression.

d. George claims that mother’s heights will contribute more than father’s heights, and therefore the slope β1 will be bigger than .5. Conduct a test at a .05 significance level to determine if there is significant evidence for George’s claim. Include any necessary R code.

e. Create a 90% confidence interval for the intercept β0 using the function confint.

f. Now show the work of creating the interval in the previous part without confint.

Problem II. Windspeeds at nearby sites

This data was collected from windspeed measurement sites in Northern South Dakota in 2002. We will use the windspeed at a reference site to model the windspeed at a candidate site for a windmill. The data is also available through the alr4 package and is labeled wm1.

a. Fit a linear model using the reference site speeds to predict the candidate site. Check the diagnostic plots and comment on any concerns.

b. Suppose the current windspeed is 15mph at the reference site, build an interval that contains the current windspeed at the candidate site with 99% confidence. Do this using the predict function in R.

c. Build an interval that contains the average windspeed at the candidate site when the windspeed at the reference site is 15mph with 99% confidence. This time do not use the predict function and instead calculate out using the formulas from lecture. Make sure to fully show your work. [Note: Though there are faster ways to get pieces like ¯X and the SXX with R, everything in this problem can be done with what is in the model summary.]

d. Comment on which interval from the previous two parts is wider and why.

Bonus/Discussion Questions (ungraded)

HW Bonus questions are designed to get you to think deeper about the topics in the course. You are encouraged to discuss them with the TA and I during office hours.

a. Explain how it is possible to use prediction intervals to test whether additional data points came from the model (points not in the original sample). How would you do this with a single additional data point? How about multiple?

b. What is the covariance between the fitted values predicted at different X values?