Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Statistics 5350/7110

Assignment 1—due 8 February at 11:59 pm

You may work together for this assignment.  However, perform calculations and the writeup by yourself.  File sharing is not permitted.  Submit via Canvas.

This assignment is tightly scripted, and thus your submission should focus clearly and thoroughly on discussion and interpretation of the statistical results in your presentation. 

The file RestaurantSales.txt contains monthly U.S. retail sales for restaurants and other eating places for the period 1992(1) to 2022(10), in millions of dollars.

To do this assignment, you need to convert the class for two variables given in the data frame.  To do so, employ the following steps at the outset.  Read in the data frame using this form of command:

rsales<-read.csv("F:/Stat711023Spring/RestaurantSales.txt")

This command gives the name rsales to the data frame.  Next, give these commands:

attach(rsales)

Time<-as.numeric(Time)

fMonth<-as.factor(Month)

The last two lines convert the variable Time to numeric class and the variable Month to factor class.  In addition, augment the data frame using the following command:

rsales<-data.frame(rsales,fMonth)

1.  (a)  Make separate time series plots for (i) Sales and (ii) logSales.  List the periods of economic downturn as determined by the Business Cycle Dating Committee and mark them on each plot.  Discuss and compare the plots in considerable detail.  Include comments on trend structure and volatility. 

(b)  Give a detailed explanation of the time series fluctuations from 2020(1) to 2022(10).  What specific factors were responsible for the movements observed?

(c)  Do the plots indicate whether an additive decomposition model or a multiplicative decomposition model should be fit to model sales?  Explain your answer.

In the parts which follow, fit models with data excluded for the years 2020 to 2022.  To fit a model with these years excluded, use a command of the type

model<-lm(y~x1+x2,data=rsales[1:336,])

In this command, rsales is the name of the data frame.

2.  Fit a multiplicative decomposition model to the variable Sales.  Include just a polynomial trend and a seasonal component using the fMonth variable.  Give a brief description of the fitted model.

[R hint:  To fit a fourth-degree polynomial trend, for example, include as explanatory variables in the lm command

Time + I(Time^2) + I(Time^3) + I(Time^4)

As an alternative, you can use poly(Time,4)

These two approaches to fit the trend give identical overall fits, but produce different coefficient estimates.  The latter employs orthogonal polynomials, and the former does not.  Either form can be used for this assignment—the overall results will be the same.]

(a)  Tabulate and plot the estimated static seasonal indices and give a detailed interpretation of them in the context of the data collection.

(b)  Save the residuals from the fit.  Form a normal quantile plot of these residuals, test the residuals for normality, plot the residuals vs. time, and plot their autocorrelations.  Describe each of these results carefully.  Note that the model fails to capture trend structure fully.  Where does it fail and what are the causes?  What conclusions do you draw from the residual analysis?  What structure in the time series has the model failed to capture?

3.  To attempt to improve the model, refit, adding the calendar trigonometric pairs.  Test each of the pairs for significance.  If a pair is marginally significant (p-value between 0.05 and 0.10), retain it.  Discard a pair if its p-value is greater than 0.10, and refit.  Give a full residual analysis for this model.  Has addition of calendar structure led to improvement?  Is there remaining structure that the model has still not captured?  Explain in detail.

4.  Form the lag 1 variable of the residuals from the model in part 3.  Refit the model in part 3 with this lag 1 variable added.  The lag 1 residual variable helps to capture added structure (in the irregular part) which the part 3 model fails to account for.  Try including both calendar trigonometric pairs for this refit.

[Consider the following example of a time series:

> x<-1:10

> x

 [1]  1  2  3  4  5  6  7  8  9 10

This series lagged once may be calculated with the following simple code:

> lag1x<-c(NA,x[1:(length(x)-1)])

> lag1x

 [1] NA  1  2  3  4  5  6  7  8  9

Thus, the lag 1 series is the original series delayed by one time interval.  One data point is lost.

The lag 1 residual variable needs to be added to the data frame, and therefore it needs to have length 370.  Thus, you can use the command

lag1resid<-c(NA,resid(model)[1:335],rep(0,34))

Be sure to add this variable to the data frame.]

(a)  For this new model, perform a thorough residual analysis, with a normal quantile plot and test for normality, a plot of the residuals vs. time, and a residual acf plot.  What do these results indicate?  

(b)  Calculate the estimated static seasonal estimates from this model.  Compare them to the estimates obtained in part 3(a) using a table.  Discuss the result obtained.

5.  Redo the fit in part 4, but now with cosine and sine seasonal dummies, instead of the fMonth variable, for estimation of the static seasonal component.  Perform the amplitude, phase, and peak calculations and tabulate and interpret the results.  [R hint:  After you form the cosine and sine variables for this part, add them to the data frame.  Then fit the model, and in doing so remember to exclude data for the years 2020 to 2022.  Code to form the cosines and sines and add them to the data frame follows:

cosm<-matrix(nrow=length(Time),ncol=6)

sinm<-matrix(nrow=length(Time),ncol=5)

for(i in 1:5){

cosm[,i]<-cos(2*pi*i*Time/12)

sinm[,i]<-sin(2*pi*i*Time/12)

}

cosm[,6]<-cos(pi*Time)

c1<-cosm[,1];c2<-cosm[,2];c3<-cosm[,3];c4<-cosm[,4];c5<-cosm[,5];c6<-cosm[,6]

s1<-sinm[,1];s2<-sinm[,2];s3<-sinm[,3];s4<-sinm[,4];s5<-sinm[,5]

rsales<-data.frame(rsales,c1,s1,c2,s2,c3,s3,c4,s4,c5,s5,c6)

6.  Use the decompose command in R with a multiplicative formulation to estimate the static seasonal indices.  Compare the estimates obtained with those calculated in parts 2 and 4 with a table.  Comment.

[Care is required because data for 2020(1) to 2022(10) need to be excluded.  The first set of lines of code which follows applies decompose to the logged series and then exponentiates at the end.  Alternatively, one can apply decompose to the unlogged series with the multiplicative option, and this is given in the second set of lines of code following.  Results for the two options will differ very slightly.  In your presentation, include both of these options.

Here is code for the first option:

logSales.ts<-ts(logSales[1:336],freq=12)

logSales.decmps<-decompose(logSales.ts)

seasd<-logSales.decmps$seasonal

For the second option:

Sales.ts<-ts(Sales[1:336],freq=12)

Sales.decmpsm<-decompose(Sales.ts,type="mult")

seasdmult<-Sales.decmpsm$seasonal

seasdmult<-seasdmult[1:12]/prod(seasdmult[1:12])^(1/12)

To provide the tabulation:

cbind(seas2,seas4,exp(seasd)[1:12],seasdmult)

7.  What does the analysis in this assignment indicate about sales for restaurants and other eating places?