Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Week 3 Individual Assignment 3: k-NN Applied to Credit Extensions and a Bias Check

This assignment continues the analysis of the credit data, exploring whether we can improve on our earlier analysis that utilized linear and logistic regression. Please refer to the earlier assignments for the data description, and repeat if needed the data preparation steps, using the credit.csv data:

1. Create a new categorical variable that indicates whether or not a new credit extension will result in a positive NPV.

2. The knn function needs all variables to be numerical, therefore create dummy variables for all categorical variables. You may want to take advantage of the R script provided (or utilize Excel).

3. Split the data into two parts, setting the seed to 1, 70% training and 30% validation.

Please answer all questions. Supply supporting documentation and show calculations as needed. Please submit a single well-formatted PDF or Word file. The instructor should not need to go searching for your answers! In addition, feel free to include code and screenshots as an Appendix – this will not be graded, but will help the instructor give you feedback, if your models differ significantly from the solutions.

k-NN

Classify customers as profitable/not profitable with k-NN. As the second part of the assignment focuses on a bias check you may want to carefully think about which variables to include in your model.

1. Run the k-NN algorithm for classification, testing multiple values of k (increase k until you no longer observe an improvement in the accuracy on the validation data).

a. Plot the accuracy of the validation sample. Include the plot as an Exhibit. 

b. What is the best value of k?

c. Briefly explain why the % Error is zero for the training sample when k=1, but not for the validation sample.

2. Obtain predictions for the best k on the validation data.

a. Include a confusion matrix as an Exhibit.

b. What is the sensitivity?

c. What is the specificity?

d. How do these values compare to your Logistic Regression model from Week 2 (or feel free to use the solutions). Briefly comment. 

Bias Check

You have now run k-NN for classification, and let’s assume you are happy with the performance. In fact the management suggest that we will use the model for credit extension decisions, i.e. we will offer the credit to those customers that we classify as profitableAre there any bias considerations we need to be aware off?

1. What are the sensitive element(s) in the data that are available to us?

2. What are other potential sensitive elements that are not available, but we should be mindful off?

3. What is a measure of a fair application of the model, given the business scenario?

4. Break down the performance of the model by some sensitive element, using your measure(s) in #4. Summarize the analysis in an Exhibit.

5. Comment on the Exhibit. Are there any cause for concern?