Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Assignment 2022/2023

This forms your assessment (100%) of this module.

There are two parts to this assessment .

Part A contains THREE short essay-based questions and counts for 50% of the final mark. Each essay should be around 1,000-1,500 words in length.

Part B contains FIVE tasks to establish a scorecard using the given dataset and counts for 50% of the final mark. You may use Excel, SAS, R or Python to assist in the scorecard preparation.

You must answer ALL questions.

Submission must be made by 3pm on Friday 24th April 2023 via Learning Central, and instructions will follow shortly on how to do this. You will need to submit a single file containing answers to all questions; any spreadsheet analysis, workings or coding necessary can be shown in an Appendix in that file. Only the submitted file will be marked.

PART A

1.   Critically examine what needs to be considered when developing a credit risk scoring model. [20 marks]

2.   Explain how, in theory, Cox’s proportional hazard model for survival analysis can be used for constructing a scorecard. Comment on the relative popularity of Cox’s PH  model versus logistic regression in scorecard construction. [15 marks]

3.   A lender would like to extend its ability to offer credit to those with lower credit scores and is considering doing this through a combination of risk-based pricing and the use of more and different data in its credit scoring model. Discuss the implications of these for the lender in terms of both credit scorecard development  and potential impact on existing customers. [15 marks]

PART B

The dataset underpinning the analysis here is that used in the lab sessions during lectures. It has  been uploaded as a spreadsheet named German’ together with the data dictionary ‘German data dictionary’ describing each attribute. You will recall that the dataset consists of data for 1000 applicants along with a variable that says whether they were subsequently Good or Bad from a credit perspective.

1.    Split the dataset into two subsets as follows:

Subset 1: the applicants with Checking = 1 or Checking = 2

Subset 2: the applicants where Checking = 3 or Checking = 4

Clean the subsets if necessary.   [5 marks]

2.    For each subset, establish a training set and validation set. Explain:

a.    what principle you have used to decide on these;

b.   why both training and validation sets are needed;

c.    any issues encountered during the splitting exercise.   [5 marks]

3.    For each training set choose four variables which are suitable for building a scorecard. For each training set the variables must have (i) at least one continuous variable before binning; (ii) at least one categorical variable with more than two categories, so you can see whether categories can be combined.

Explain the rationale behind your choice of variables (using supporting statistics eg chi-square). Should you be unable to choose variables satisfying the above criteria, explain the problem you have encountered and the solution you have chosen to compromise the variable selection. [10 marks]

4.    Using the binary variables obtained from the coarse classification in the above exercise to build two scorecards for each training set, one using linear regression, the other using logistic regression. Note this means you should have four scorecards in total:

(i)           using linear regression for Checking = 1 or 2;  

(ii)          using logistic regression for Checking = 1 or 2; 

(iii)         using linear regression for Checking = 3 or 4;  

(iv)         using logistic regression for Checking = 3 or 4;

Note that the file you submit should include, in the Appendix, a table that gives the binary variables you used, together with the coefficients for those variables calculated in each regression[15 marks]

5.    Derive ROC curves for all scorecards using the validation set applicable to each, showing in  detail how sensitivity and specificity have been calculated. Estimate the Gini coefficient and KS values for each. Explain and comment on your results. [15 marks]