Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ACTL90013 Actuarial Studies Projects Part 1

ACTL90024 Actuarial Studies Projects Part 2

Persona matching between life insurance agents and customers

2022

1. Background

Agency  is  one  of the  main  insurance  distribution  channels  for  both  life  and  general  products. Traditionally, agents are tied to a particular insurer; they are responsible for sourcing sales leads from potential customers, analyzing insurance needs, converting into sales, and servicing the customer during the  policy term.  Recent  worldwide  economic  and  technological  disruptions  have  created significant impacts on the insurance industry and its current agency distribution model, generating both challenges and opportunities in the business. Now more than ever, the role of an agent has shifted from being sales-focused to engagement-focused.

However, if the initial sales agent left the insurer, their customers will become what so-called orphan customers” . The level of engagement with such customer will therefore be impaired, and results in policy lapse or low re-purchase rate. Historically, a servicing agent would be randomly assigned to these orphan customers. Given the advancement in data collection/storage technology as well as various digital partnership initiatives, insurers today have much better understanding of customer’s profile as well as agents’ . In addition, persona matching algorithm has been proven to increase sales as well customer stickiness from TMT sector (e.g. Netflix).

In this project, we would like to understand: 1) with limited data, what is the most optimal way to match agents with orphan customers to achieve a sustainable re-purchase experience overtime, 2) how to execute such strategy given considerations in data, process, technology, and people.

2. Task

There are several options to develop a matching algorithm, logistic regression model (supervised learning) is one of them. You are required to go-through a model development and validation journey to justify the selected model form, model performance, and predicted results using logistic regression model (as minimum). Logistic regression is the statistical technique used to predict the relationship between predictors and a predicted variable. Predicted variable in this project is binary (i.e. suitable match or not, suitability can be defined as with repurchase).

Please see below for the recommended key steps:

Step 1: Data cleansing/ feature engineering

•    Combine the customer and agent profile data sets (see Appendix)

•    Missing data handling

•    Outlier handling

•    Categorical data handling (turning to ordinal scale or one-hot encoding)

•    Other appropriate techniques

Step 2: Model development

•    Create the target (dependent) variable, which is a binary variable indicating if there is any customer     repurchase,     using     the     variables POLICY_REPURCHASED”     and “RIDER_REPURCHASED” in the data set

•    Separate historical data into model development and validation sets (handling of biased sampling and weighted regression)

•   Calibrate logistic regression model(s)  (extra  mark will be given for building  another learning approach to contrast results) on the development sample

Step 3: Model validation

•    Select the  right  model performance metrics, which should cover accuracy,  confusion matrix, AUC, and model stability with rationale

•    Test calibrated model on development sample

•   Validate the candidate models on validation sample

•    Fine-tune data cleansing/feature engineering/model construction/calibration technique (or other appropriate means) to optimize the model performance

Step 4: Application

•   Apply the final validated model with accepted performance metrics to a dataset of un- mapped  customers  and  agents.  For  each  new  customer,  recommend  an  agent  that maximize the repurchase probability

•    Highlight what are the characteristics of customer and agent are better suit each other from repurchase experience perspective and what is the advice we should give to mapped agents to better enhance their repurchase rate

•    Highlight potential limitations of data, final model and application

•    Highlight  practical  implementation  challenges  for  such  algorithm-based  matching approach and what can an insurer to address these challenges

3. Report

Requirements for the report are listed below:

•    Complete the above task, the content of the report should include (at minimum): your understanding of what articulated, rationale of your decision, interim results and final outputs for each step in section 2

•    Write  an  introduction  section  that  includes  a  brief  literature  review  on  matching

algorithm and why using logistic regression approach is relevant

•    Formatting: 12pt Times New Roman font.

•    The main report should be in pdf format and submitted electronically in LMS. It is due by

5pm on Friday 19 August.

•    The report must be around 4000 words in length. The word count includes bibliography, footnotes, appendices and the number of words which would take up space used for tables, formulae and charts. Please note that a half page graph or table counts as 200 words, scale appropriately for other graphs or tables. The report should only contain

important extracts from your outputs.

•    Plagiarism is prohibited. No late submission will be accepted.

•    You can choose any computer program to complete this project. Python or R would be

preferred. You will need to submit your codes separately in LMS on the due date. Name

your report and supporting files using your student id.

4.Assessment

Marks may be given for your demonstration of the following:

•    Meeting the terms of the project

•    Ability to write (grammatically correct English) clearly and to write mathematics (if any)

correctly

•    Ability to reference work appropriately

•    Ability to explain the models and algorithms used clearly

•    Ability  to  comment  appropriately  on  numerical  outputs  you  obtained  and  ability  to

summarize your findings

•    How much initiative you have shown in presenting your numerical results. The overall structure of your submission, the overall look of your sub- mission (formatting, fonts, etc),

and the clarity of your outputs

•    This report contributes 35% in the overall assessment of ACTL90013, of which 5% will be given to the ability to construct an unsupervised learning approach to contrast logistic

regression results

5. Reference

Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New

York, NY: John Wiley & Sons Inc

Park, H.A.  (2013). An Introduction to logistic regression: from basic concepts to

interpretation with particular attention to bursing domain. J Korean Acad Nurs

Vol.43 No.2, 154-164

Appendix: Data

There are 3 datasets provided in this exercise:

•    Historical mapped (with agent) customer profile data: Customer_Profile.xlsx

•    New Customer profile data with unmapped agent: Customer_Profile_Incoming.xlsx

•    Historical mapped agent profile data: Agent_Profile.xlsx

File: Customer_Profile.xlsx/ Customer_Profile_Incoming.xlsx

Column Names

Descriptions

CUSTOMER_ID

Customer ID

GENDER

Customers gender

AGE

Customers age

MARITAL_STATUS

Customers marital status

NATIONALITY

Customers nationality

FAMILY_INCOME

Customer’s family income

FOOT_STEP

Average daily footsteps the customer walks as recorded

INQUIRY

Number of policy related inquiries made

WEB_VISIT

Number of health-related website visit in a year

HEALTH_RECORD

Health score based on predicted likelihood of having different type of disease

TOTAL_INF_POLICIES

Total number of in-force insurance policies

TOTAL_INF_POL_ACC

Total number of in-force accident insurance policies

TOTAL_INF_POL_CI

Total number of in-force critical illness insurance policies

TOTAL_INF_POL_LF

Total number of in-force life insurance policies

TOTAL_INF_POL_MED

Total number of in-force medical insurance policies

TOTAL_INF_POL_SAV

Total number of in-force saving insurance policies

TOTAL_INF_POL_OTH

Total number of in-force other insurance policies

TOTAL_PREM_INF_POL

Total premium of in-force insurance policies

TOTAL_PREMIUM_INF_POL_ACC

Total premium of in-force accident insurance policies

TOTAL_PREMIUM_INF_POL_CI

Total premium of in-force critical illness insurance policies

TOTAL_PREMIUM_INF_POL_LF

Total premium of in-force life insurance policies

TOTAL_PREMIUM_INF_POL_MED

Total premium of in-force medical insurance policies

TOTAL_PREMIUM_INF_POL_SAV

Total premium of in-force saving insurance policies

TOTAL_PREMIUM_INF_POL_OTH

Total premium of in-force other insurance policies

TOTAL_POLICIES

Historical total number of insurance policies

AGENT_ID*

Agent ID for the latest servicing agent

POLICY_REPURCHASED*

Number of policies the customer brought after agent reassigning

RIDER_REPURCHASED*

Number of riders (top-up coverage for an insurance policy) the customer brought after agent reassigning

* Only for mapped customer data: Customer_Profile.xlsx