MSCA31010: Linear & Non-Linear Models Winter 2022 Assignment 4
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
MSCA31010: Linear & Non-Linear Models
Winter 2022 Assignment 4
The Homeowner_Claim_History.xlsx contains the claim history of 27,513 homeowner policies. The following table describes the eleven columns in the HOCLAIMDATA sheet.
Name |
Description |
Categories |
policy |
Policy Identifier |
|
exposure |
Duration a Policy is Exposed to Risk Measured in Portion of a Year |
|
num_claims |
Number of Claims in a Year |
|
amt_claims |
Total Claim Amount in a Year |
|
f_primary_age_tier |
Age Tier of Primary Insured |
< 21, 21 - 27, 28 - 37, 38 - 60, > 60 |
f_primary_gender |
Gender of Primary Insured |
Female, Male |
f_marital |
Marital Status of Primary Insured |
Not Married, Married, Un-Married |
f_residence_location |
Location of Residence Property |
Urban, Suburban, Rural |
f_fire_alarm_type |
Fire Alarm Type |
None, Standalone, Alarm Service |
f_mile_fire_station |
Distance to Nearest Fire Station |
< 1 mile, 1 - 5 miles, 6 - 10 miles, |
f_aoi_tier |
Amount of Insurance Tier |
< 100K, 100K - 350K, 351K - 600K, 601K - 1M, > 1M |
In insurance ratemaking, the ratio of Total Claim Amount in a Year divided by the Number of Claims in a Year is called the Severity. In other words, Severity is the average dollar amount per claim. If a policy does not file any claims in a year, then its Severity is missing.
Unless otherwise stated, please provide all numeric answers rounded to the seventh decimal place.
Question 1 (50 points)
(a) (10 points) Generate horizontal boxplots of Total Claim Amount in a Year grouped by each of the seven categorical predictors f_primary_age_tier, f_primary_gender, f_marital, f_residence_location, f_fire_alarm_type, f_mile_fire_station, and f_aoi_tier.
(b) (10 points) For analyses, Severity will follow a Gamma distribution. Train a Gamma model with the logarithm link function. The target variable is Severity (use only positive and non-missing values for analyses). The predictors are the seven categorical predictors. The model will include the Intercept term. Enter predictors into the model using the Forward Selection method. The entry threshold is 0.05. What is the estimate for the Shape parameter?
(c) (10 points) Provide the Step Summary table. The table should contain (1) Step Number, (2) Model Degrees of Freedom, (3) Model Log-Likelihood, (4) Deviance Chi-Squares, (5) Deviance Degrees of Freedom, and (6) Deviance Significance. Show the Significance in .E7 scientific notation.
(d) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation. What are the values of these metrics?
(e) (10 points) Identify any poorly predicted observations. First, plot the predicted versus the observed Severity. Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, the Deviance Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly. To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).
Question 2 (50 points)
(a) (20 points). Train a Multi-Layer Perceptron neural network. The target variable is Severity (use only positive and non-missing values for analyses). The predictors are the seven categorical predictors. Perform a naïve grid search to select the best network structure. For each Hyperbolic Tangent and Rectified Linear Unit activation function, try the number of layers from 1 to 10, the common number of neurons per layer from 1 to 5. Provide a table that shows your grid search results. The table should contain (1) the activation function type, (2) the number of layers, (3) the common number of neurons per layer, (4) the total number of neurons, and (5) the mean absolute proportion error.
(b) (10 points) Recommend the best network structure which yields the lowest Mean Absolute Proportion Error. In the case of ties, choose the network with a fewer total number of neurons.
(c) (10 points) Assess the final model goodness-of-fit using (1) Root Mean Squared Error, (2) Relative Error, (3) Mean Absolute Proportion Error, and (4) Pearson Correlation. What are the values of these metrics?
(d) (10 points) Identify any poorly predicted observations. First, plot the predicted versus the observed Severity. Second, together in a single chart frame, plot the Simple Residuals, the Pearson Residuals, and the Absolute Proportion Errors versus the observed Severity. Label the axes of these two charts accordingly. To receive full credits, generate your charts with proper dimensions (e.g., length and width) and resolution (e.g., dpi).
2022-03-06