COMP809 – Data Mining and Machine Learning Lab 10 – Linear Regression and LSTM
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
COMP809 – Data Mining and Machine Learning
Lab 10 – Linear Regression and LSTM
1. Linear Regression
1.1. The Boston Housing dataset
Aim: predict ‘house values’ using available independent variables. In this lab, we will use the popular library ‘ statsmodels’.
For partI, build a simple linear regression to predict MEDV (house prices) using the RM (number of rooms).
For Part II, build a multiple linear regression using the first 13 columns as independent variables (X), and the last column,’ MEDV’ as the dependent variable (y).
Split the dataset into input (X) and output (y) variables, then into 70/30 train and test sets. Fit Model and make a prediction: Use the fit() method to fit the regression model to the training data for the prediction.
Evaluate the model’s performance on the test dataset and provide accuracy metrics such as mean squared error (MSE) for both train and test set. Explain your findings.
To use the linear regression model of ‘statsmodels’ library, you need to add a column of ones to serve as an intercept.
Generate the model summary and explain your findings.
1.1. Tesla stock data
Create temporal features such as ‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, etc using the available ‘Date’ column.
Apart from the above additional features, add your own set of features that you believe would be relevant for the predictions. For instance, one hypothesis could be that the first and last days of the week could potentially affect the closing price of the stock far more than the other days. Create an additional feature that identifies whether a given day is Monday/Friday or Tuesday/Wednesday/Thursday. Split your dataset into train and validation sets and create a regression model to predict the ‘close’ feature. Evaluate the performance of your model.
2. Long Short-Term Memory (LSTM)
A class of RNN that has found practical applications is Long Short-Term Memory (LSTM) because it is robust against the problems of long-term dependency. In order to use LSTM, you must install
TensorFlow.
Scale the data before fitting your model.
Using a four-layerLSTM model, predict the value of the ‘close’ price using past 100 time steps from the train data.
Before fitting the model, set the optimizer as 'adam' and loss check to
‘mean_squared_error’. Fit the model using epochs as 10 and a small batch_size.
Expected output:
2023-07-03