Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Midterm Exam: Web Scraping and FinTech Data Analysis

NOTE: Use all available resources. You can find a solution to most of the coding problems on the Internet. Google it if you are stuck in the middle.

NOTE2: Do your best to conduct the task!!!

PART I. (10 points)

Complete the task below. Copy your code onto the answer sheet. 

Visit www.freelancer.com and choose any category. Extract the titles, descriptions, tags, and prices from the first page of search results. Create four corresponding vectors, named ‘titles’, ‘descriptions’, ‘tags’, and ‘prices’, with 'prices' specifically formatted as a numeric vector. Finally, assemble these vectors into a single consolidated data frame, named freelancer’.

Please note, it might be required to clean the data and use NA values to ensure the vectors are of equal length prior to merging them into a single data frame.

PART II. (15 points)

Analyze the dataset named ‘fintech,’ obtained from a prominent FinTech company based in Hong Kong. This company specializes in lending money to loan applicants. Upon receiving an application, the company's decision engine automatically classifies each loan application into one of three categories: 'approve', 'reject', or 'manual review.' For applications marked as 'manual review,' reviewers undertake a subjective assessment to further reclassify each case as either 'approve' or 'reject.’

Field description:

Field

Description

id

Loan application id

loan_amount

Requested amount in HKD

tenor

Requested repayment periods in months

age

 

month_of_service

The employment period of current job

residential_status

Own, Rent, Others

monthly_repayment

Monthly repayment for other existing loans

monthly_income

Average monthly income for the last three months

self-employed

 

bankrupted

Whether the applicant has a record of bankruptcy

housewife

 

currently_employed

Whether the applicant is employed in a full-time job

channel

Loan application channel

language

tc (traditional Chinese), EN (English)

manual_review

t = manually reviewed, f = otherwise

approved

t = approved, f = rejected

manual_approved

t = manually approved, f = otherwise

credit score

Higher is better

friends_facebook

No. of Facebook friends (0 indicates either no friend or the account was not provided)

time_application

Time of the day when the application was submitted

location

The location where the application was submitted

default

Whether the repayment was overdue as of June 2017

Q1. What is the average monthly income of the whole sample? What is the average monthly income of the currently employed? (1 point)

Q2. Generate the histogram of “loan_amount.” Can you find any interesting patterns? Can you guess the reason why the graph has such a shape? (1 point)

Q3. Replace the value of “friends_facebook” with NA if the value is 0. What is the average number of Facebook friends of those who have provided their Facebook accounts? (2 points)

Q4. Generate the scatterplot of “month_of_service” and “credit_score”. Can you find any relationship between them? What about “monthly_income” and “credit_score”? Confirm the relationship with the correlation tests. (2 points)

Q5. Make a new variable, named “automatic_approved,” which has the value “t” if approved by the decision engine, “f” if rejected by the decision engine, and “NA” if reviewed manually. How many cases are approved or rejected by their decision engine? How many are classified as “manual review”? (2 points)

Hint:

fintech.df$automatic_approved=ifelse(fintech.df$approved=="t" & is.na(fintech.df$manual_approved)==TRUE, "t", fintech.df$automatic_approved)

fintech.df$automatic_approved=ifelse(fintech.df$approved=="f" & is.na(fintech.df$manual_approved)==TRUE, "f", fintech.df$automatic_approved)

Q6. Compare the automatically approved cases and the automatically rejected cases. Conduct statistical tests on variables available in the dataset to answer the following subquestions. (5 points)

1) Are they different in “loan_amount”?

2) Are they different in “tenor”?

3) Are they different in “age”?

4) Are they different in “month_of_service”?

5) Are they different in “residential_status”?

6) Are they different in “monthly_income”?

7) Are they different in “bankrupted”?

8) Are they different in “currently_employed”?

9) Are they different in “channel”?

10) Are they different in “language”?

11) Are they different in “credit_score”?

12) Are they different in “friends_facebook”?

13) Are they different in “location_application”?

Q7.  Based on the analysis results above, provide the logic behind the decision engine to judge “approve”. (2 points)

Guideline

Submit 1) your answer sheet, 2) R-code used for the analysis, and 3) the signed declaration as a zip file to the LeranUs System. Please include your student number and name in the header of the answer sheet. Make your answer sheet formatted as follows: Times New Roman, 12-point font, double-spaced only (not 1.5), 1-inch margins all around 8.5 x 11-inch paper (or A4).