Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

BU.450.740 Retail Analytics

Homework 2 (Individual): Linear regressions in R

Instructions

· Please write your name on the top on your report. The assignment is due at the beginning of session 3.

· Submission by email or hardcopy is not accepted. Please upload your answers at Canvas. The format can be either MS Word or PDF.

· You can collaborate and discuss with your colleagues within or outside your assigned group. However, you will be submitting your own write-up.

· Late submission will not be accepted and receive 20% deduction per day.

· Please attach the relevant R code and output in your report.  There is no designated format of the report for homework 2.   

In this exercise you will be using market-level cross-sectional sales data: “minivan_hw2.csv” from an actual car manufacturer and its retail dealers. The data are private; Please do not share with anyone outside the class. The data set is a csv file containing a major car maker’s annual sales in quantity of a particular model of their minivans. For a given zipcode, this manufacturer’s dealers in that zip code sells this model and report sales in units. Assume for now that this model is identical in year, specs, options, and MSRP for all cars sold.

Variable description:

· Location: City name of the geographic market

· Zip: US zip code for the geographic market

· q_sold: Quantity of minivan sold in # of cars

· ave_p:  Average price sold (in thousand USD)

· comp_p: Average Chrysler’s minivan price (in thousand USD)

· adv: Advertising expenditure (in thousand USD)

Questions: 20 points in total

1. [4 points] Descriptive analysis

a. Describe the summary statistics of all variables except zip

b. Generate two-way plots (i.e., plot of two variables) for all variables except location and zip. Do you observe any patterns?

2. [4 points] Simple linear regression

a. Perform a regression of q_sold on ave_p. Is there any relationship between q_sold and ave_p?

b. Discuss the fit of the model using R_squared

3. [6 points] Multiple linear regression

a. Perform a multiple regression of q_sold on ave_p, adv, and comp_p. Is there any relationship between q_sold and controls: ave_p, adv, and comp_p? Are the signs of the estimated parameters reasonable?

b. Discuss the model fit using R_squared

4. [6 points] If you are a data analyst for this company, which variables would you suggest to add in the last estimation equation (i.e., multiple regression) to make more precise argument regarding the effect of price on unit sold? These variables you will propose to add may not currently be in your data. Which variables would you suggest to add and why? You can think of this question as proposing a “wish list.” We may not always have access to the variables we wish, but we can always keep a lookout for them.