Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 3032

Fall 2022

Practice Exam 3

Problem I: The Study of Iris Flowers [18 pts]

The irisDat dataset includes different species of iris flowers. We want to use two variables (Sepal.Length and Color) to predict whether an iris flower belongs to a species called setosa” .

Variable list:

Species:     Sepal.Length: Color:

1 (the flower is setosa) or 0 (the flower is not setosa)

The length of the flower sepal in centimetres (cm)

The color category of the flower. The values are dark”, “medium”, and light” .

Two logistic regression models were fitted:

mod1: Species ~ 1 + Sepal.Length

mod2: Species ~ 1 + Sepal.Length + Color

mod1 summary output from R:

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept)   27.8285     4.8276   5.765 8.19e-09 ***

Sepal.Length  -5.1757     0.8934  -5.793 6.90e-09 ***

---

Residual deviance:  71.836  on 148  degrees of freedom

mod2 summary output from R:

Coefficients:

Estimate Std.

 

Error

6.302

1.217

1.594

1.269

on

(Intercept) Sepal.Length Colorlight  Colormedium

---

29.642

-6.115

5.833

3.060

Residual deviance:  49.247

Q1 [2 pts]: What method does logistic regression use to estimate the model coefficients? Please select the best answer.

A.  The maximum likelihood estimation method

B.   The least square estimation method

Q2 [3 pts]: Please write down the fitted model according to the summary output of mod2. In your final answer, the left side of the equation should be the estimated probability that the flower belongs to the species of setosa. Note: If you use notation such as Y, make sure to define it. Otherwise points will be deducted.

Q3 [3 pts]: Interpret the estimated coefficient of Sepal.Length in mod2 in the context, with respect to the odds that the flower belongs to the species of setosa.

Q4 [2 pts]: Based on mod2, for flowers with sepal length equal to 5 cm, which color has the highest probability of the flower belonging to the species of setosa? Which color has the lowest probability of the flower belonging to the species of setosa? Please choose the best answer.

A.  Dark color has the highest probability; medium color has the lowest probability.

B.  Medium color has the highest probability; light color has the lowest probability.

C.  Light color has the highest probability; dark color has the lowest probability.

D.  Medium color has the highest probability; dark color has the lowest probability.

Q5 [6 pts]: Let’s now use the X2(Chi-squared) test to compare mod1 and mod2 .

(i) [2 pts]: What is the value of test statistic? Please your work.

(ii) [2 pts]: What is the degree(s) of freedom of the test statistic under Ho ? Please explain your answer.

(iii) [2 pts] If the p value is 0.006, which of the following is the conclusion of the hypothesis test? Please select the best answer. The significant level is 0.01.

A.  We reject the null hypothesis; we choose mod1.

B.  We reject the null hypothesis; we choose mod2.

C.  We don’t reject the null hypothesis; we choose mod1.

D.  We don’t reject the null hypothesis; we choose mod2.

Q6 [2 pts]: Please complete the R codes that use mod2 to predict the odds that the flower is setosa, if the sepal length is 5 cm, and the color is light.

pred = predict(mod,newdata=data.frame(Sepal.Length=5,Color =”light”), type = ________ )

exp(______ )

Problem II: The Study of Time Series [12 pts]

We apply the difference at lag 2 to {st}  and obtain the time series { 2st}. { 2st} has the following theoretical model  2st  + 2  =  −0.6( 2st1 + 2)  + wt , where wt  ∼ iid N(0, 0.01) . Note that 0.01 is the variance of wt .

Q7 [2 pts]: Is the time series { 2st} is stationary? Please explain your answer.

Q8 [2 pts]: What is the intercept of the theoretical model of { 2st} ?

Q9 [2 pts]: Complete the following R code to simulate a time series data of size 500 for  { 2st} based on its theoretical model..

ds2 = ________  (n = 500, list(_______ = c(-0.6)), sd = ______) + (-2)

Q10 [4 pts]: There are 4 graphs on the next page. Please select the best answer for each of the questions below.

(i) [2 pts]: Which of the following plots is the ACF plot of the times series data generated in Q9? Please select the best answer.

(ii) [2 pts]: Which of the following plots is the PACF plot of the times series data generated in Q9? Please select the best answer.

 

 

Q11 [2 pts]: What is the theoretical model of {st} ? Please show your work. Your final answer should only have st  on the left side of the equation.


Problem III: The Study of Humidity [10 pts]

Humidity is an index between 0 and 100 that measures the level of moisture in the atmosphere. A higher humidity value means the air is more humid.  Scientists recorded the hourly humidity in the last 10 days at the West Fordine beach for a total of 240 hours.

The highest humidity (53) took place at the 177th hour. The lowest humidity (46) took place at the 158th hour. The sample mean of this humidity time series data is 50.

We use {t} to denote the time series of the hourly humidity. For the last 5 hours, the humidity levels are

as follows: y236 = 50, y237  = 50, y238  = 51, y239  = 49, y240  = 46 . Assume that {t} is stationary. 

We generated the ACF plot and the PACF plot of {t} (the plots are not presented here) and observed that both the MA(3) model and the AR(2) model could be appropriate for {t} .

The summary output of mod_ar2 from R is shown below. Please answer Q12.

Coefficients:

ar1   ar2  intercept

0.58  0.02    50.04

s.e.  0.05  0.01     0.11

Q12 [3 pts]: Predict the humidity level in 1 hour (when t = 241) based on the AR(2) model.

Q13 [3 pts]: After fitting the AR(2) model, we also fitted the MA(3) model,  mod_ma3. Please complete the following R codes to fit the MA(3) model and to produce the last 5 residuals (from t = 236 to t = 240). -y represents the time series data of the hourly humidity.

mod_ma3 = ___________ (y,order = c (_____ , _____ , _____)) ___________$ residuals[236:240]


The summary output of mod_ma3 from R and the last 5 residuals are shown below. Please answer

Q14.

Coefficients:

ma1    ma2    ma3   intercept

0.59    0.5     0.18    50.04

s.e.  0.05    0.02    0.05     0.10

------

# last 5 residuals (from the earliest in time to the latest)

# 1.13,  -0.01, -0.38, -1.15, -0.57

Q14 [2 pts]: Predict the humidity level in the next hours (when t = 241) based on the MA(3) model.

Q15 [2 pts] Have you written the answers of the multiple choice questions in the designated boxes?

A: Yes, I have.

B: No, I haven’t but I will.