Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECOM20001: Econometrics 1

Assignment 2

Getting Started

Please create an Assignment2 folder on your computer, and go to the Canvas site for ECOM 20001 and download the following data file into the Assignment2 folder:

•  as2_voucher.csv

This dataset is the same as the one in the last assignment, except that we have some additional variables. All the variables are listed as follows:

•  id: anonymous identification number for an individual

•  newc: new consumption expenditure ($). That is, the voucher amount spent on goods that would not have been purchased by the individual if there were no vouchers.

  age: age (year) of the individual.

•  gender: =1 if the individual is female, =2 if male.

•  income: the individual’s annual income (in thousand $) .

•  education: the individual’s years of education.

•  d1 : =1 if the individual’s education is at the high school graduate level, =0 otherwise.

•  d2: =1 if the individual’s education is at the bachelor degree level, =0 otherwise.

•  d3: =1 if the individual’s education is at the postgraduate degree level, =0 otherwise.

Prior to answering the questions, do some exploratory work such as browsing the observations and looking at the summary statistics.

Note: Throughout the assignment, you must use heteroskedasticity-robust  standard errors in all regressions unless otherwise stated. However, you may report homoskedasticity-only F-statistics unless otherwise stated.

Minor rounding errors (accuracy up to 3 significant figures) are allowed. Also, when you interpret the results, you must be explicit about the unit of the         variables.


Questions

1.   (2 marks) Run a single linear regression where the dependent variable is     newc and the regressor is age. Report the regression output using stargazer or a similar format. Discuss your results by:

- Interpreting the slope estimate and discussing its statistical significance.

- Interpreting the R2 of the regression.

(Hint: to report regression output with HC standard errors using stargazer, load the packages using commands library(estimatr)” and library(stargazer)” . Then, use command                     “reg1=lm(y~x,data=mydata)”, where mydata is your original dataset. Then, use command      “stargazer(reg1,type="text", se=starprep(reg1))” .)

2.   (2 marks) Using the regression results, compute the 95% confidence interval  for the change in newc from increasing age from 20 years to 35 years.             Suppose the null hypothesis is that the change in newc is -$15, versus the      alternative hypothesis that it is not -$15. Test this hypothesis using the 95% CI and briefly explain your reasoning.

3.   (3 marks) Construct the following three scatter plots, where in each case the first variable (before the vs.) goes on the vertical axis and the second variable (after the vs.) goes on the horizontal axis. Ensure correct axes and graph       titles. For each scatter plot, overlay a single linear regression line that helps    visualise the relationship between the two variables.

•  newc vs. age

•  newc vs. income

•  age vs. income

Suppose higher income is associated with higher consumer confidence, which has a positive effect on new consumption. Combining this and the correlation

patterns in the scatter plots, carefully explain what the direction of the omitted variable bias would be for the slope coefficient on age in the single linear       regression of newc on age.

4.   (1 mark) Suppose someone in your analytics team tells you that the single        linear regression line in the age vs. income” scatter plot does not make sense  from a causality perspective.  The person proposes swapping the variables and re-estimating the model. Do you agree with the person’s approach? Briefly        explain your reasoning.

5.   (2 marks) Run a multiple linear regression where the dependent variable is newc and the regressors are age and income. Then, run a multiple linear    regression where the dependent variable is newc and the regressors are     age, income and education. Report both regression outputs using stargazer or a similar format. Discuss your results from the second regression by:

- Interpreting the slope estimate on education and discussing its statistical significance.

- Computing the dollar change in income that can offset the estimated effect of a one-year increase in education on newc.

6.   (4 marks) Run a multiple linear regression where the dependent variable is    newc and the regressors are age, income, d1, and d2. Report the regression output using stargazer or a similar format. Discuss your results by interpreting each of the coefficient estimates on d1, d2 and discussing its statistical          significance. What is the key advantage of this regression over the second    regression in question 5?

(Hint: to make the output neater, you may consider using stargazer to report the regression outputs in questions 1, 5 and 6 in one single table. This is an optional exercise and does not carry marks.)

7.   (1 mark) Regarding the model in question 6, suppose someone claims that none of the regressors have an effect on newc. Formally write down the      person’s null and alternative hypotheses and, based on the information       provided in your stargazer regression output, explain whether you accept or reject the null.

(Hint: you may use a homoskedasticity-only statistic to answer this question.)

8.   (3 marks) Following question 6, suppose we are interested in testing a joint  null hypothesis that education levels do not affect newc. Formally write down the null and alternative hypotheses. Then, use the homoskedasticity-only F- statistic formula in Lecture Note 7:

 =  (  )/ 

(1 −  )/(  − 1)

to perform the hypothesis test. Show your working.

(Hint: you must use this formula to compute the F-statistic.)

(Hint: As a hypothetical example, use command pf(10,df1=5,df2=99)” to compute Pr(F<10) for a random variable F that follows a F-distribution with 5 numerator degrees of freedom and 99    denominator degrees of freedom. See Tutorial 3 for details.)

9.   (2 marks) R-code: we will review and mark your R code according to the following scheme:

•  2/2 if R code is correct and organised and commented like the solution code for the assignment.

  1/2 if R code is correct, but hard to follow or not well  commented.

  0/2 if R code is incorrect and/or a complete mess, or not  submitted.