STATS 786 Time Series Forecasting for Data Science SEMESTER 1, 2021
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 786
SEMESTER 1, 2021
STATISTICS
Time Series Forecasting for Data Science
Midterm - Test
Instructions
❼ The length of the test includes an additional 30 minutes (to allow for reading time,
the additional complexity of the online mode, and submission). You get one hour for answering the questions and an extra 30 minutes for uploading your files.
❼ You must submit your final answers before due time so do not leave sub-
mitting until just before the due time - make sure you allow time for submission.
❼ Test answers will not be accepted after the end of this extra 30 minute period.
❼ If you encounter computer/internet/other issues during the test that affect your
ability to work on or submit your test answers please contact the lecturer via email ([email protected]).
❼ We STRONGLY recommend you download your submitted document from Can-
vas, after submitting it, to verify you have uploaded the correct document. It is your responsibility to check you have submitted the correct document.
❼ It is your responsibility to ensure your test is successfully submitted on time. Please
don’t leave it until the last minute to submit your test.
Academic Honesty Declaration By completing this assessment, I agree to the following declaration: I understand the University expects all students to complete coursework with integrity and honesty. I promise to complete all online assessment with the same academic integrity standards and values. Any identified form of poor academic practice or academic misconduct will be followed up and may result in disciplinary action. As a member of the University’s student body, I will complete this assessment in a fair, honest, responsible and trustworthy manner.
This means that: ❼ I declare that this assessment is my own work. ❼ I will not seek out any unauthorised help in completing this assessment. ❼ I am aware the University of Auckland may use plagiarism detection tools to check my content. ❼ I will not discuss the content of the assessment with anyone else in any form, including, Canvas, Piazza, Facebook, Twitter or any other social media or online platform within the assessment period. ❼ I will not reproduce the content of this assessment anywhere in any form at anytime. ❼ I declare that I generated the calculations and data in this assessment indepen- dently, using only the tools and resources defined for use in this assessment. ❼ I will not share or distribute any tools or resources I developed for completing this assessment. |
1 Run the following code in R.
# Use your student ID as the seed
set .seed(2021)
sample(letters[1:6], 3, replace = FALSE)
Use the output from the above R code to select the statements that you need to answer from the list given below. For example, suppose the output for the above code is “f”, “b”, and “c”, then you should select statements “b”, “c”, and “f” from the list below to answer this question.
Note: Please make sure to replace the seed used in the above R code by your student ID to select the statements that you need to answer in this question.
State whether the selected statements are true or false. You MUST provide rea- soning for your answer.
a There is something wrong with my forecasts because they take the same value
for all forecast horizons.
b I should always choose the regression model with the smallest sum of squared
errors for obtaining predictions.
c Prediction intervals are not very important because most people want the point
forecasts.
d A time series cross-validation based on a rolling forecast origin is better than
a simple test set for comparing forecast methods.
e A white noise series has zero mean and constant autocovariance. f Linear regression models are simplistic because the real world is nonlinear.
[Total: 15 marks]
2 This question attempts to analyze the effect of temperature and pollution level on weekly cardiovascular mortality in one of the states in the US.
Note: Please refer to the appendix on pages 7–9 for the necessary figures.
Figure 1 shows the time plots for average weekly cardiovascular mortality, temper- ature, and particulate pollution level over ten years. Figure 2 shows a scatter plot matrix of mortality and the two predictor variables.
a Briefly describe the main features that you can observe from Figures 1 and 2. [5 marks]
b Let Mt denotes cardiovascular mortality, Tt denotes the temperature and Pt
denotes the particulate levels at time t. One of the students in the class suggested fitting the following four models:
Mt = β0 + β1t + et , (M1)
Mt = β0 + β1t + β2 (Tt − ) + et , (M2)
Mt = β0 + β1t + β2 (Tt − ) + β3 (Tt − )2 + et , (M3)
Mt = β0 + β1t + β2 (Tt − ) + β3 (Tt − )2 + β4 Pt + et , (M4)
where denotes the mean temperature. Explain briefly why the student has suggested fitting these four models. [4 marks]
c Summary statistics for M1–M4 are given in Table 1. Among these models, which one do you select as the best model? Briefly give reasons for your selection. Interpret the value of 2 .
Table 1: Summary statistics for models M1–M4.
2 2 AIC BIC
M1 79.1 0.209 2224 2237
M2 62.2 0.378 2103 2120
M3 55.5 0.445 2047 2068
M4 40.8 0.592 1891 1916
[5 marks]
d Figure 3shows the residual diagnostics for the best model chosen from M1–M4.
What conclusions can you draw from these plots. [3 marks] [Total: 17 marks]
3 The revenue-domestic-flights .csv file contains information about monthly rev- enue from domestic flights in US from 1979–2000.
a Read the file into R and convert it to a tsibble object. [3 marks]
b Plot the revenue series and comment briefly the main features of the data. [2 marks]
c Do you think a Box-Cox transformation is useful for this time series? Briefly give reasons for your answer. [3 marks]
d Mention at least four forecasting methods that are most appropriate for this series. [4 marks]
e Using last 2 years of data as the test set, fit the methods that you suggested in part d (you may transform the original series based on your answer to part c). [11 marks]
f Obtain the forecasts for 2 years. [2 marks]
g Compare the accuracy of your forecasts from different methods against the test set. [2 marks]
h Which method does best? Justify your selection. [3 marks]
i Plot the point forecasts from the best method along with the 95% prediction interval. [3 marks] [Total: 33 marks]
Appendix
Figure 1: Time plots for average weekly cardiovascular mortality, temperature, and par- ticulate pollution level.
0.04
0.03
0.02
0.01
0.00 100
90
80
70
60
50 100
80
60
40
20 |
Mortality
|
Temperature
|
Mortality
|
|
|
|
|
||
|
||||
120 |
50 60 70 80 90 100 20 40 60 80 100 |
Figure 2: A scatter plot matrix of mortality and the two predictor variables.
30
|
2023-04-24