STATS 326 Applied Time Series Analysis SEMESTER 1, 2022
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
STATS 326
SEMESTER 1, 2022
STATISTICS
Applied Time Series Analysis
Mid-Semester Test
1 Run the following code in R.
# Use your student ID as the seed
set.seed(2022)
sample(letters[1:5], 2, replace = FALSE)
Use the output from the above R code to select the sub-questions you need to answer from the list below. For example, suppose the result for the above code is "d" and "c", then you should select sub-questions "d" and "c" from the list below to answer this question.
Note: Please make sure to replace the seed used in the above R code with your student ID to select the sub-questions that you need to answer in this question. The marks will not be given if you do not answer the questions allocated to you based on your seed.
a Explain what each line of the following R code does and why you have obtained two different outputs by adding dyears(1) and years(1).
date <- ymd("2020-01-15")
date + dyears(1)
date + years(1)
b Explain what each line of the following R code does and how you can construct out1 from out3.
out1 <- ymd_hms("2022-06-12 11:30:15", tz = "Pacific/Auckland")
out2 <- as_date(out1)
out3 <- as_datetime(out2)
out1 == out3
c Suppose z is a tsibble containing data from 2000-01-01 12 AM to 2002-12-31 11 PM. The first few rows of z are shown below.
z %>% slice_head(n = 5)
## # A tsibble: 5 x 2 [1h] <UTC>
## Time Values
## <dttm> <dbl>
## 1 2000-01-01 00:00:00 1.02
## 2 2000-01-01 01:00:00 0.198
## 3 2000-01-01 02:00:00 0.910
## 4 2000-01-01 03:00:00 1.66
## 5 2000-01-01 04:00:00 -0.249
Explain what the following R code does and the changes you may expect in the output.
z %>% force_tz(Time, tzone = "Pacific/Auckland")
d Suppose enroll contains student enrollment and staff recruitment details from 2000–2020 for three departments. The Student's Gender is recorded as a binary variable. The first few rows of the data frame are given below. enroll %>% head(n = 5)
## Year Department Student's Gender Enrolled Staff
## 1 2000 Statistics Female 1958 32
## 2 2000 Statistics Male 1964 36
## 3 2001 Statistics Female 1985 27
## 4 2001 Statistics Male 1822 35
## 5 2002 Statistics Female 1792 37
Explain what the following R code does and what additional information you get to observe in the output?
enroll %>% as_tsibble(key = c(Department, `Student's Gender`),
index = Year)
e Suppose temp contains half-hourly data for three years. The first few rows are shown below.
temp %>% slice_head(n = 5)
## # A tsibble: 5 x 2 [30m] <UTC>
## Time Temperature
## <dttm> <dbl>
## 1 2000-01-01 00:00:00 15.6
## 2 2000-01-01 00:30:00 18.5
## 3 2000-01-01 01:00:00 23.3
## 4 2000-01-01 01:30:00 20.4
## 5 2000-01-01 02:00:00 22.0
Explain what the following R code does.
temp %>%
mutate(x = as_date(floor_date(Time, "bimonth"))) %>%
index_by(x) %>%
summarise(y = mean(Temperature)) [Total: 15 marks]
2 Figure 1 shows two graphs produced for monthly turnover (in millions of AUD) from food retailing in Tasmania over 1982 April–2018 December.
Figure 1: Two plots produced for turnover from food retailing in Tasmania
a Describe what is plotted in each panel of Figure 1 and the features you can observe for this time series. [5 marks]
b The turnover from food retailing is decomposed into its components using the following R code.
stl_dcmp <- turnover %>%
model(STL(log(Turnover) ~ season(window = 11)))
i Write down an equation to describe the form of the decomposition per- formed and explain why the above setting has been used. [9 marks]
ii Comment on what is plotted in all four panels of Figure 2 and the be- haviour of each component over time.
Decomposition
STL(log(Turnover) ~ season(window = 11))
1980 Jan 1990 Jan 2000 Jan 2010 Jan 2020 Jan
c The estimates of the decomposed components for the last 18 months are given below.
stl_dcmp %>% components() %>%
select(-State , -Industry, -.model) %>% slice_tail(n = 18)
## # A tsibble: 18 x 6 [1M]
## Month `log(Turnover)` trend season_year remainder
## <mth> <dbl> <dbl> <dbl> <dbl>
## 1 2017 Jul 5.37 5.42 -0.0421 -0.00780
## 2 2017 Aug 5.38 5.43 -0.0385 -0.00951
## 3 2017 Sep 5.38 5.43 -0.0501 -0.00386
## 4 2017 Oct 5.45 5.44 0.00486 0.00718
## 5 2017 Nov 5.48 5.45 0.0170 0.0193
## 6 2017 Dec 5.64 5.45 0.163 0.0181
## 7 2018 Jan 5.51 5.46 0.0579 -0.00730
## 8 2018 Feb 5.41 5.47 -0.0332 -0.0194
## 9 2018 Mar 5.53 5.47 0.0487 0.00506
## 10 2018 Apr 5.44 5.48 -0.0218 -0.0170
## 11 2018 May 5.46 5.49 -0.0290 0.00680
## 12 2018 Jun 5.41 5.49 -0.0776 -0.00407
## 13 2018 Jul 5.47 5.50 -0.0421 0.00993
## 14 2018 Aug 5.48 5.51 -0.0384 0.0165
## 15 2018 Sep 5.48 5.51 -0.0498 0.0159
## 16 2018 Oct 5.51 5.52 0.00491 -0.0108
## 17 2018 Nov 5.53 5.52 0.0172 -0.0116
## 18 2018 Dec 5.69 5.53 0.164 -0.00368
## # ... with 1 more variable: season_adjust <dbl>
A random walk with a drift model is fitted to the seasonally adjusted data. The details of the fitted model and forecasts calculated are given below.
stl_dcmp %>%
components() %>%
model(drift = RW(season_adjust ~ drift())) %>% report()
## Series: season_adjust
## Model: RW w/ drift
##
## Drift: 0.0049 (se: 0.0016)
## sigma^2: 0.0012
stl_dcmp %>%
components() %>%
model(drift = RW(season_adjust ~ drift())) %>%
select(-.model) %>%
forecast(h = 6)
## # A fable: 6 x 6 [1M]
## # Key: State, Industry, .model [1]
## State Industry .model Month
## <chr> <chr> <chr> <mth>
## 1 Tasmania Food retail~ drift 2019 Jan
## 2 Tasmania Food retail~ drift 2019 Feb
## 3 Tasmania Food retail~ drift 2019 Mar
## 4 Tasmania Food retail~ drift 2019 Apr
## 5 Tasmania Food retail~ drift 2019 May
## 6 Tasmania Food retail~ drift 2019 Jun
Calculate 1-step-ahead
i forecast median for turnover. ii 95% prediction interval for turnover.
Give your answers to 1 decimal place.
season_adjust .mean
<dist> <dbl>
N(5.5, 0.0012) 5.53
N(5.5, 0.0023) 5.54
N(5.5, 0.0035) 5.54
N(5.5, 0.0047) 5.55
N(5.6, 0.0059) 5.55
N(5.6, 0.0071) 5.56
[3 marks] [5 marks]
[Total: 28 marks]
3 Figure 3 shows the monthly total cost for anti-diabetic drugs in Australia from 1991 July–2008 June.
30
20
10
2005 Jan
Figure 3: Monthly total cost for anti-diabetic drugs in Australia
The structure of the tsibble object used to plot Figure 3 is given below. a10 %>% glimpse()
## Rows: 204
## Columns: 2
## $ Month <mth> 1991 Jul, 1991 Aug, 1991 Sep, 1991 Oct, 1991~
## $ Cost <dbl> 3.53, 3.18, 3.25, 3.61, 3.57, 4.31, 5.09, 2.~
a Write R code to create a tsibble containing 68 rows of quarterly total cost for anti-diabetic drugs in Australia from 1991 Q3–2008 Q2. Label this new tsibble object as a10q. [4 marks]
b Suppose we have fitted a time series regression model to the information con- tained in a10q as given below.
fit <- a10q %>%
model(lm = TSLM(log(Cost) ~ trend() + season()))
fit %>% report()
## Series: Cost
## Model: TSLM
## Transformation: log(Cost)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.1045 -0.0202 0.0061 0.0280 0.0842
##
## Coefficients:
## Estimate Std. Error t value Pr(> |t|)
## (Intercept) 2.347470 0.013830 169.74 < 2e-16 *** ## trend() 0.028065 0.000263 106.63 < 2e-16 *** ## season()year2 -0.093151 0.014590 -6.38 2.3e-08 *** ## season()year3 -0.007184 0.014597 -0.49 0.62 ## season()year4 0.117485 0.014590 8.05 2.9e-11 *** ## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '. ' 0.1 ' ' 1
##
## Residual standard error: 0.0425 on 63 degrees of freedom
## Multiple R-squared: 0.995, Adjusted R-squared: 0.994
## F-statistic: 2.88e+03 on 4 and 63 DF, p-value: <2e-16
i Write down the fitted regression model and interpret the trend coefficient. [7 marks]
ii Write an R code to perform the Ljung-Box test to assess the adequacy of the fitted model. [7 marks]
iii State the null and alternative hypothesis for the Ljung-Box test and de- scribe how you would use the output above to reach a conclusion. [4 marks]
iv Calculate 1-step-ahead forecast median for the total cost from this fitted model. Give your answer to 1 decimal place. [5 marks] [Total: 27 marks]
2023-04-24