Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

DATA2001 Assignment 3 (Weight: 25%)

Due: 27 October 2023 @ 4:00pm

The aim of this assignment is to gain practical experience in analysing timeseries data and using it for

forecasting. You must complete this task in Python using a Jupyter notebook. You will need to submit a single Jupyter notebook (.ipynb file) via Blackboard.

Context:

When evaluating public transport and infrastructure options, such as bikeways, pedestrian bridges  and dedicated public transport corridors, a major task for government agencies is predicting future usage. By exploring ways to predict transport mode usage, data analysts seek to improve the

evaluations of different transport options. Better predictions of usage rates help to improve the allocation of funding and ensure that tax revenues are spent efficiently.

Dataset: In this assignment, you will examine bikeway data collected by the Brisbane City Council, and use it to make predictions of future cycle usage, using timeseries forecasting techniques.

The data is from:https://www.data.brisbane.qld.gov.au/data/dataset/bikeway-counts. It comprises five .csv files, for years 2016-2020. However, the internal format of the data varies across the

different years. Some key considerations when looking at the raw data:

-      For your assignment, the point of interest on the Brisbane Bikeway Network is the

Bicentennial Bikeway at Milton. The earlier years’ data refer only to the Bicentennial Bikeway (e.g. ‘ Bicentennial Bikeway Cyclists Inbound’ in the 2016 data). This is the

Bicentennial Bikeway at Milton. Later years explicitly mention Milton.

-      You should measure usage as the total of inbound and outbound cycle traffic. Some years have inbound and outbound data stored separately (e.g. 2016 data), while later years have them recorded as a daily total (e.g. ‘A019, Bicentennial Bikeway, Milton Cyclist’ in 2020).

-      More generally, be aware that the number of columns varies across the files, as do the column names, and the series themselves may be interrupted or contain mis-measurements.

Task description:

The submitted notebook should address the following 6 tasks (see marking grid for mark allocation):

1.    Data Preparation: Compile all the data for the period 2016-2020 into one series using the “pandas” library and set up an index for the entire dataset in an appropriate way for

timeseries analysis. Can you identify any useful side data or exogenous variables? If so,

include them into your dataframe and handle/merge them in an appropriate way. Explain how you did it along with a justification for your choices.

2.    Exploratory Data Analysis (EDA): Visualise the entire data set, and comment on the patterns  you can observe with respect to the features discussed in the lectures. Include visualisations appropriate for uncertainty and correlation where appropriate.

3.    Focus now on the cyclist traffic on the Bicentennial Bikeway, Milton time series.

a.    Split the data into training and testing series, selecting the testing series to be the last three months of the data.

b.    Manually step through the STR decomposition process on the training data, as

described in the course material. Visualise and interpret each of the components of   the STR decomposition for cycle traffic. (Hint: You may wish to validate the output of your manual process against an automated modelling approach.)

4.    Timeseries models:

a.    Fit an ARIMA model for the trend-cycle component of your STR decomposition of the training data and interpret the estimated model parameters.

b.    Using the STR components that you estimated in tasks 3 and 4, produce forecasts of Bicentennial Bikeway cyclist traffic at Milton for the test data series. Include the

uncertainty in the forecasts and visualise the predictions.

5.    Pure forecasters - now consider your choice of ML techniques:

a.    Select an appropriate pure forecasting method to predict the trend component of the cycle traffic training data.

b.    Using the seasonal component that you estimated in task 3 and the pure forecaster from 5.a, produce forecasts of Bicentennial Bikeway cycle traffic at Milton for the

test data series. Include the uncertainty in the forecasts, and visualise the predictions.

6.    Evaluate the forecast performance of your model-based and pure forecasters using the test data and compare the two forecasters. Use appropriate evaluation metrics and methods.

Discuss the similarities and difference between their performance and suggest possible avenues for improving cycle traffic forecasting.