ACTL90013 Actuarial Studies Projects Part 1
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit
ACTL90013 Actuarial Studies Projects Part 1
ACTL90024 Actuarial Studies Projects Part 2
Persona matching between life insurance agents and customers
2022
1. Background
Agency is one of the main insurance distribution channels for both life and general products. Traditionally, agents are tied to a particular insurer; they are responsible for sourcing sales leads from potential customers, analyzing insurance needs, converting into sales, and servicing the customer during the policy term. Recent worldwide economic and technological disruptions have created significant impacts on the insurance industry and its current agency distribution model, generating both challenges and opportunities in the business. Now more than ever, the role of an agent has shifted from being sales-focused to engagement-focused.
However, if the initial sales agent left the insurer, their customers will become what so-called “orphan customers” . The level of engagement with such customer will therefore be impaired, and results in policy lapse or low re-purchase rate. Historically, a servicing agent would be randomly assigned to these orphan customers. Given the advancement in data collection/storage technology as well as various digital partnership initiatives, insurers today have much better understanding of customer’s profile as well as agents’ . In addition, persona matching algorithm has been proven to increase sales as well customer stickiness from TMT sector (e.g. Netflix).
In this project, we would like to understand: 1) with limited data, what is the most optimal way to match agents with orphan customers to achieve a sustainable re-purchase experience overtime, 2) how to execute such strategy given considerations in data, process, technology, and people.
2. Task
There are several options to develop a matching algorithm, logistic regression model (supervised learning) is one of them. You are required to go-through a model development and validation journey to justify the selected model form, model performance, and predicted results using logistic regression model (as minimum). Logistic regression is the statistical technique used to predict the relationship between predictors and a predicted variable. Predicted variable in this project is binary (i.e. suitable match or not, suitability can be defined as with repurchase).
Please see below for the recommended key steps:
Step 1: Data cleansing/ feature engineering
• Combine the customer and agent profile data sets (see Appendix)
• Missing data handling
• Outlier handling
• Categorical data handling (turning to ordinal scale or one-hot encoding)
• Other appropriate techniques
Step 2: Model development
• Create the target (dependent) variable, which is a binary variable indicating if there is any customer repurchase, using the variables “POLICY_REPURCHASED” and “RIDER_REPURCHASED” in the data set
• Separate historical data into model development and validation sets (handling of biased sampling and weighted regression)
• Calibrate logistic regression model(s) (extra mark will be given for building another learning approach to contrast results) on the development sample
Step 3: Model validation
• Select the right model performance metrics, which should cover accuracy, confusion matrix, AUC, and model stability with rationale
• Test calibrated model on development sample
• Validate the candidate models on validation sample
• Fine-tune data cleansing/feature engineering/model construction/calibration technique (or other appropriate means) to optimize the model performance
Step 4: Application
• Apply the final validated model with accepted performance metrics to a dataset of un- mapped customers and agents. For each new customer, recommend an agent that maximize the repurchase probability
• Highlight what are the characteristics of customer and agent are better suit each other from repurchase experience perspective and what is the advice we should give to mapped agents to better enhance their repurchase rate
• Highlight potential limitations of data, final model and application
• Highlight practical implementation challenges for such algorithm-based matching approach and what can an insurer to address these challenges
3. Report
Requirements for the report are listed below:
• Complete the above task, the content of the report should include (at minimum): your understanding of what articulated, rationale of your decision, interim results and final outputs for each step in section 2
• Write an introduction section that includes a brief literature review on matching
algorithm and why using logistic regression approach is relevant
• Formatting: 12pt Times New Roman font.
• The main report should be in pdf format and submitted electronically in LMS. It is due by
5pm on Friday 19 August.
• The report must be around 4000 words in length. The word count includes bibliography, footnotes, appendices and the number of words which would take up space used for tables, formulae and charts. Please note that a half page graph or table counts as 200 words, scale appropriately for other graphs or tables. The report should only contain
important extracts from your outputs.
• Plagiarism is prohibited. No late submission will be accepted.
• You can choose any computer program to complete this project. Python or R would be
preferred. You will need to submit your codes separately in LMS on the due date. Name
your report and supporting files using your student id.
4.Assessment
Marks may be given for your demonstration of the following:
• Meeting the terms of the project
• Ability to write (grammatically correct English) clearly and to write mathematics (if any)
correctly
• Ability to reference work appropriately
• Ability to explain the models and algorithms used clearly
• Ability to comment appropriately on numerical outputs you obtained and ability to
summarize your findings
• How much initiative you have shown in presenting your numerical results. The overall structure of your submission, the overall look of your sub- mission (formatting, fonts, etc),
and the clarity of your outputs
• This report contributes 35% in the overall assessment of ACTL90013, of which 5% will be given to the ability to construct an unsupervised learning approach to contrast logistic
regression results
5. Reference
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New
York, NY: John Wiley & Sons Inc
Park, H.A. (2013). An Introduction to logistic regression: from basic concepts to
interpretation with particular attention to bursing domain. J Korean Acad Nurs
Vol.43 No.2, 154-164
Appendix: Data
There are 3 datasets provided in this exercise:
• Historical mapped (with agent) customer profile data: Customer_Profile.xlsx
• New Customer profile data with unmapped agent: Customer_Profile_Incoming.xlsx
• Historical mapped agent profile data: Agent_Profile.xlsx
File: Customer_Profile.xlsx/ Customer_Profile_Incoming.xlsx
|
Column Names |
Descriptions |
|
CUSTOMER_ID |
Customer ID |
|
GENDER |
Customer’s gender |
|
AGE |
Customer’s age |
|
MARITAL_STATUS |
Customer’s marital status |
|
NATIONALITY |
Customer’s nationality |
|
FAMILY_INCOME |
Customer’s family income |
|
FOOT_STEP |
Average daily footsteps the customer walks as recorded |
|
INQUIRY |
Number of policy related inquiries made |
|
WEB_VISIT |
Number of health-related website visit in a year |
|
HEALTH_RECORD |
Health score based on predicted likelihood of having different type of disease |
|
TOTAL_INF_POLICIES |
Total number of in-force insurance policies |
|
TOTAL_INF_POL_ACC |
Total number of in-force accident insurance policies |
|
TOTAL_INF_POL_CI |
Total number of in-force critical illness insurance policies |
|
TOTAL_INF_POL_LF |
Total number of in-force life insurance policies |
|
TOTAL_INF_POL_MED |
Total number of in-force medical insurance policies |
|
TOTAL_INF_POL_SAV |
Total number of in-force saving insurance policies |
|
TOTAL_INF_POL_OTH |
Total number of in-force other insurance policies |
|
TOTAL_PREM_INF_POL |
Total premium of in-force insurance policies |
|
TOTAL_PREMIUM_INF_POL_ACC |
Total premium of in-force accident insurance policies |
|
TOTAL_PREMIUM_INF_POL_CI |
Total premium of in-force critical illness insurance policies |
|
TOTAL_PREMIUM_INF_POL_LF |
Total premium of in-force life insurance policies |
|
TOTAL_PREMIUM_INF_POL_MED |
Total premium of in-force medical insurance policies |
|
TOTAL_PREMIUM_INF_POL_SAV |
Total premium of in-force saving insurance policies |
|
TOTAL_PREMIUM_INF_POL_OTH |
Total premium of in-force other insurance policies |
|
TOTAL_POLICIES |
Historical total number of insurance policies |
|
AGENT_ID* |
Agent ID for the latest servicing agent |
|
POLICY_REPURCHASED* |
Number of policies the customer brought after agent reassigning |
|
RIDER_REPURCHASED* |
Number of riders (top-up coverage for an insurance policy) the customer brought after agent reassigning |
* Only for mapped customer data: Customer_Profile.xlsx
2022-07-22
Persona matching between life insurance agents and customers