HW 8
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
HW 8
1) The file Diamond.xls contains data on pricing of ladies’ diamond rings, based on the weights !of the diamonds. The data were originally given in a full page advertisement placed in the Straits Times newspaper issue of February 29, 1992, by a Singapore-based retailed of diamond jewelry. The 48 rings considered in this data set were similar in terms of design, gold weight, and the diamond qualities of cut, color and clarity. Therefore, the carat size of the diamond stones becomes the obvious factor to use in pricing the rings. The variables in Diamond.xls are Weight (in carats), and Price (of the ring, in Singapore Dollars).
A) Make a scatterplot of Price versus Weight, and comment on the reasonableness of fitting a linear regression model to this data.
B) Run the regression of Price on Weight. Copy and paste the Minitab regression output.
C) What is the equation of the fitted tine? Use this equation to predict the price of a diamond ring which weighs 0.23 carats.
D) Is there evidence of a significant linear relationship between the price and the weight of the diamond? Justify your answer.
E) Interpret the estimated slope of the fitted model, and construct a 95% CI for the true slope coefficient. What is the practical meaning of the true slope coefficient?
F) Discuss and give a practical interpretation of the coefficient of the RSquared.
G) What is the estimate of the typical fluctuation of data points from the true regression line, measured in the vertical direction?
H) At the 1% level of significance, can we reject the null hypothesis that the true slope is 3500 in favor of the alternative that it is not 3500?
I) Using Minitab, construct a 95% confidence interval for the expected price of rings which weigh 0.23 carats.
6) The file EPSReturn.xls (considered in HW1) contains data on the stock returns and earnings per share (EPS) for 52 major companies. The EPS values for the companies are those announced in December 1997, while the stock returns are calculated for January 1 998. It is of interest to try to use EPS to predict the stock return.
A) Construct a Minitab Fitted Line plot of Return versus EPS. Does the plot indicate that the linear regression model fits well? Does it suggest any possible violation of the assumptions which underlie the linear regression model?
B) Run the regression, using Minitab. Is there evidence to suggest that EPS is useful for predicting the stock returns? (Use a 5% level of significance). Compute the p-value for the estimated slope based on a left-tailed (one-sided) alternative hypothesis.
C) What does the R-Square suggest about the strength of the linear relationship?
D) Get a point forecast and a 95% prediction interval for the return of a stock with an EPS of 6. Is this a useful interval? (Refer also to the plot from A).
3) The file Gesell.xls concerns a study of whether intelligence can be predicted based on the age at which a child starts to speak. For each of 2l participants in the study, the variable Age represents the age (in months) at which they spoke their first word, and the variable Score represents the Gesell Adaptive Score. (The Gesell test is an adult intelligence test).
A) Without looking at the data, how would you expect Score to be related to Age? (Positively or negatively?)
B) Make a scatterplot of Score versus Age. Does the plot show the relationship you predicted in A)?
C) Run the simple regression of Score on Age. Get the leverage and Cook’s D values by clicking on Storage in the regression dialog box, and checking the boxes for leverage (Hi) and Cook’s Distance.
D) Use the regression output to compute the p-value for the coefficient of Age in the regression. Does this suggest that Score is related to Age?
E) Is there evidence that Score is related to Age in the direction that was suggested in Part a?
F) What proportion of the variance in Score is explained by Age, based on the regression output?
G) Are there any data points with high leverage? Is the Cook’s D corresponding to these points high enough to cause concern?
H) Delete the data point with the largest value of Cook’s D. Now, re-run the regression. Describe the effects on the p-value for the slope, and on R-Squared. Is there now strong evidence of a linear relationship between Score and Age?
I) Compare your answers from parts E and F to those from part H. If you had to make a choice by using the regression model with the entire dataset or with the dataset after the observation was deleted, which model would you choose?
2025-01-18