Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ACF5320

Assignment 2

ASSESSMENT TASK:     Assignment 2

WEIGHTING:                   30%

COMPLETION:                Individual

GENERATIVE AI:           Generative AI tools can be used in this assessment task

In this assessment, you can use generative artificial intelligence (AI) to generate the

specified content in relation to the assessment task. This material must be

acknowledged and recorded in your declaration of AI use.

DUE DATE:                    11:55pm, Monday 8 April, 2024

OVERVIEW

In this assignment, you are tasked with conducting regression analysis on multiple datasets provided in Excel format. The assignment is structured around four key cases, each requiring you to apply regression techniques to predict outcomes based on various independent variables. This exercise aims to assess your proficiency in predictive modelling, data analysis, and the interpretation of results within a business analytics context.

.     In the Decision case, using the "Decision.xlsx" dataset, you will analyse the impact of

experience on decision-making quality among auditors, examining how it correlates with intelligence, thinking styles, and personality traits.

.     The Haircut case requires you to explore the "Haircut.xlsx" database to determine the factors that significantly influence a company's revenue, employing regression analysis to identify these key predictors.

.     For the Audit scenario, with the "Audit.xlsx" dataset, you are to investigate the relationship between audit delay and various descriptive variables, focusing on developing a regression model that can accurately predict delay durations.

.     The Prescription Cost Analysis involves the "Prescription.xlsx" dataset, where you will model and predict drug costs based on a set of independent variables, enhancing your model's accuracy through iterative refinement.

Your submission should demonstrate a thorough understanding of regression analysis as applied to predictive analytics. This includes not only the technical execution of statistical tests but also the ability to interpret and communicate the significance of your findings in a clear, concise manner.

Through this assignment, you will showcase your capability to leverage Excel for predictive modelling and to derive actionable insights from complex datasets.

OBJECTIVES

.      Understand and apply regression analysis techniques.

.     Analyse relationships between dependent and independent variables.

.      Interpret and evaluate regression model outputs.

.     Develop predictive models based on the analysis.

.     Communicate analytical findings effectively.

SUBMISSION REQUIREMENTS

Type your responses in a MS Word document and submit your Word document to Moodle. Cut and paste any relevant output from Excel into your Word document.

You do not need to clean the data and do not delete any data.

Case 1: Audit (5 marks)

A study investigated the relationship between audit delay (the length of time from a company’s fiscal year-end to the date of the auditor’s report) and variables that describe the client and the auditor. Some of the independent variables that were included in this study are presented below.

Variable

Definition

Industry

A binary (also known as ‘dummy’) variable coded 1 if the firm    was an industrial company, or 0 if the firm was a bank, savings and loan, or insurance company.

Public

A dummy variable coded 1 if the company was traded on an organized exchange or over the counter; otherwise coded 0.

Quality

A measure of overall quality of internal controls, as judged by the auditor, on a five-point scale ranging from “virtually none”

(1) to “excellent” (5).

Finished

A measure ranging from 1 to 4, as judged by the auditor, where 1 indicates  “all  work  performed  subsequent  to  year-end”  and  4 indicates “most works performed prior to year-end” .

Using data in “Audit.xlsx”, answer the following questions:

(1.1) Develop scatter charts of the data using each of the independent variables included in the data. (0.5 mark)

(1.2) Develop the estimated regression equation using all of the independent variables included in the data. (0.5 mark)

(1.3) Test for an overall regression relationship at the 0.05 level of significance. Is there a significant regression relationship? (0.5 mark)

(1.4) How much of the variation in the sample values of delay does this estimated regression   equation explain? What other independent variables could you include in this regression model to improve the fit? (1 mark)

(1.5) On the basis of your observations about the relationships between the dependent

variable Delay and the independent variables Quality and Finished, suggest an alternative regression equation to the model developed in your answer to the question (1.2) to explain as much of the variability in Delay as possible. (2.5 marks)

Case 2: Decision (5 marks)

Using the “Decision.xlsx” dataset, analyse differences between experienced and inexperienced participants.

(2.1)    Do the experienced versus the inexperienced auditors differ in the quality of their

decisions (i.e., the Decision variable)?  Cut and paste relevant statistics from Excel and explain the statistics. (2 marks)

(2.2)    Do the experienced versus the inexperienced differ in terms of any intelligence,

thinking style, or personality trait variables? Identify the ones that are different and provide the relevant statistics. Cut and paste relevant statistics from Excel and explain the statistics (only for those that are different). (2 marks)

(2.3)    Without using the language of statistics, what do you conclude about experienced versus inexperienced auditors? (1 mark)

Decision data description

Participants consist of auditors and students. Auditors are considered experienced and students are inexperienced.

Variable

Definition

ID

Participant identification number.

Decision

Higher values indicate better performance on task requiring professional judgment.

WPT

Number of questions correctly answered on the Wonderlic Personnel Test. An IQ test. Higher scores indicate higher IQs.

FFM_agree

Response to the measures of the agreeableness factor in the Five Factor Model.

FFM_cons

Response to the measures of the conscientiousness factor in the Five Factor Model.

FFM_ES

Response to the measures of the emotional stability factor in the Five Factor Model.

FFM_extra

Response to the measures of the extraversion factor in the Five Factor Model.

FFM_open

Response to the measures of the openness factor in the Five Factor Model.

Exp dummy

0 = inexperienced, 1= experienced

Case 3: Haircut (5 marks)

Use the “Haircut.xlsx” database to run regression models that explain the factors that significantly influence revenue at this company.

(3.1)    Report and interpret your best model’s technical details. Cut and paste the relevant statistics from Excel and explain the statistics. (2 marks)

(3.2)    Do you believe that your model is effective for explaining changes in revenue?  Explain and justify your response. (2 marks)

(3.3)    Explain in plain language the meaning of your findings. (1 mark)

Haircut data description

You have been provided an Excel file that contains 4 data items. Each row represents the  data for one haircut at a business that operates in two countries. The business does not take appointments. Customers walk in and wait for a haircut.

Variable

Definition

Wait_time

the number of minutes the customer waited for the hair cut

Chair_time

the number of minutes needed to complete the hair cut

Revenue

revenue generated from the hair cut

Labour_cost

cost of labor for the hair cut

Country

dummy variable for country 1 and country 2

Case 4: Prescription Cost Analysis (15 marks)

Assume that you are working for a government agency that is trying to determine the main causes of different drug costs for different patients. You have data (“Prescription.xlsx”) from six months of drug prescriptions. You need to model and predict drug costs. The appendix shows  descriptions of the data.

(4.1) Assume that we are using this model: (3 marks)

GrossDrugCost = B0 + B1 * RiskScore + ε

i.       Interpret the coefficient and the p-value for the RiskScore variable. Provide a practical explanation of the RiskScore variable for senior management. (1 mark)

ii.       Explain what R-squared means in a statistical way and provide a practical explanation of the information to senior management. (1 mark)

iii.       A coworker wants to know what the predicted gross drug costs would be for a new

member. The new member is a 73-year-old man who the government classifies as frail and he has a risk score of 510. Using the model above, what would you predict the gross drug costs will be? (1 mark)

(4.2) Assume we are using this model: (8 marks)

GrossDrugCost = B0 + B1 * Risk Score + B2 * Age + B3 * Gender + ε

iv.      Provide a statistical interpretation of the coefficient and p-value for the gender variable. Provide a practical explanation of the information to senior management. (1 mark)

v.      Provide a statistical interpretation of the coefficient and p-value for the age variable. Provide a practical explanation of the information for senior management. (1 mark)

vi.      Provide a statistical interpretation of this model’s intercept. Provide a practical explanation of the information to senior management. (1 mark)

vii.      Compare the adjusted R-squared values between Models 1 and 2. Are they the same or   different? Why? What could you conclude about the differences (if any) in the adjusted R- squared values? (2 marks)

viii.      Senior management wants to know the expected gross drug costs of the average

customer. That is, for the median value of the RiskScore, age and gender, what would you expect the average gross drug costs to be? (2 marks)

ix.     A coworker wants to know what the predicted gross drug costs would be for a new

member. The new member is a 73-year-old who the government classifies as frail and he has a risk score of 510. Using the model above, what would you predict the gross drug     costs will be if they were a man and if they were a woman? (1 mark)

(4.3) Create a better model (4 marks)

x.      Develop a better regression model to predict gross drug costs. (2 marks)

xi.     What did you learn from this model that previous models did not tell you? (2 marks)

Variables

Definition

RecordID

Primary key from the database that is a unique number for each row of MemberID;  A unique ID for each different member

Month

The month to which the data pertains, listed in numeric format as 1 for January, 2 for February, etc.

GrossDrugCost

The total amount of drug costs incurred by a member during the corresponding month

NLISDummy

A dummy variable that takes the value of 1 if the member is listed as non-low income by the government and 0 otherwise

LISCHOSERDummy

A dummy variable that takes the value of 1 if the member chose a specific plan and 0 if the member automatically was assigned a

plan, i.e., members automatically are assigned (thus,

LISCHOSERDummy

RiskScore

A score assigned by the government based on previous

government data indicating how sick someone is, higher scores indicate members are sicker

SpecialtyDummy

A dummy variable that takes the value of 1 if the member utilizes specialty drugs and 0 otherwise

AdjudicationDays

The number of non-holiday workdays in a month Age

Gender

A dummy variable that takes the value of 1 if the member is female and 0 if the member is male

FrailtyDummy

A dummy variable that takes the value of 1 if the government

indicates the member is frail and 0 if the government indicates the member is not frail

HospiceDummy

A dummy variable that takes the value of 1 if the member is receiving hospice care and 0 if they are not

InstitutionDummy

A dummy variable that takes the value of 1 if the member is

receiving institutionalized long-term care (e.g., hospital, nursing facility) and 0 if they are not

ESRDDummy

A dummy variable that takes the value of 1 if the member is

receiving care for end-stage renal disease (i.e., end-stage kidney disease) and 0 if they are not

SUBMISSION DOCUMENT

MS Word file with the answers to all assignment questions supported by screenshots from Excel  output (where relevant). The submitted file should contain student’s Name, Surname, and Student ID.