Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON6300/7320: Advanced Microeconometrics

Final Problem Set

June 5, 2023

Instruction

Answer all questions following a similar format of the answers to your tutorial questions. When you use R to conduct empirical analysis, you should show your R script(s) and outputs (e.g., screenshots for commands, tables, and fifigures, etc.). You will lose 2 points whenever you fail to provide R commands and outputs. When you are asked to explain or discuss something, your response should be brief and compact. To facilitate tutors’ grading work, please clearly label all your answers. You should upload your answers (in PDF or Word format) via the “Turnitin” submission link (in the “Final Problem Set” folder under “Assessment”) by 11:59 AM on the due date June 12, 2023. Do not hand in a hard copy. You are allowed to work on this assignment in groups; that is, you can discuss how to answer these questions with your group members. However, this is not a group assignment, which means that you must answer all the questions in your own words and submit your report separately. The marking system will check the similarity, and UQ’s student integrity and misconduct policies on plagiarism apply.

1. Sharp RDD (40 points)

The sharp regression discontinuity design (RDD) occurs when the treatment is determined by a threshold function of X, e.g., D = 1[X ≥ c].2 In most applications, the threshold c is determined by policy or rule. The covariate X which determines the treatment is typically called the running variable. The threshold c is often called the cut-off.

Ludwig and Miller (2007)3 used a sharp RDD to evaluate a U.S. federal anti-poverty program called Head Start (HS). HS was established in 1965 to provide preschool, health, and other social services to poor children aged three to fifive and their families. HS funding was awarded to local municipalities through a competitive grant application. Due to a worry that poor regions may not apply at the same rate as well-funded regions, during the spring of 1965, the federal government provided grant-writing assistance (GWA) to the 300 poorest counties in the United States. The 300 counties were selected based on the poverty rate as measured by the 1960 U.S. Census. The question addressed by Ludwig and Miller was whether GWA in 1965 to the 300 U.S. counties selected on a poverty index had a measurable effect (treatment effect) on childhood mortality eight to eighteen years later in the same counties, relative to counties that did not receive the GWA. In this application, the unit of measurement is a U.S. county. The outcome variable Y is the county mortality rate in 1973-1983. The running variable X is the county poverty rate (percentage of the population below the poverty line) in 1960. The cut-off is c = 59.1984.

Using the LM2007.dta dataset, a simple RDD estimation can be implemented by the fol-lowing regression:

Y = β0 + β1D + β2(X − c) + e, (1)

where Y = mort age59 related postHS and X = povrate60. Note that we center X at c here to ensure that the treatment effect at X = c is still measured by β1 in all extended models discussed below. Throughout this question, use observations satisfying X ∈ [c − 13.8, c + 13.8].

(a) (8 points) Estimate the treatment effect of GWA using model (1) (3 points). Is the treatment effect statistically significant (2 points). Interpret your result (3 points).

(b) (6 points) RDD estimation is sensitive to the misspecification of the regression function. If the true regression function is nonlinear in X, then model (1) may mistake the nonlinearity at X = c for “discontinuity” (i.e., treatment effect) at X = c, leading to biased RDD estimate of the treatment effect. Add (X − c) 2 to model (1) and estimate the treatment effect of GWA (3 points). Test the nonlinearity of the regression function (3 points).

(c) (10 points) Extend model (1) so that the new model allows the regression functions for treatment and control groups to have different slope coefficients on X − c (4 points). Estimate the treatment effect of GWA (3 points) and test if the slope coefficient varies across treatment and control groups (3 points).

(d) (8 points) The RDD estimate is sensitive to the (parametric) specification. A natural idea for obtaining more robust estimates is to go nonparametric. For example, use the Nadaraya-Watson method to estimate m0(X) := E[Y |X, D = 0] using the untreated observations X < c and m1(X) := E[Y |X, D = 1] using treated observations X ≥ c. Then, the estimator for the treatment effect is the difference between the adjoining esti-mated endpoints; i.e., ˆm1(c) −ˆ m 0(c). Compute this estimator and report your estimate (5 points). What is the main problem with this approach (3 points)? To run the non-parametric regression, use the Gaussian kernel and select the bandwidth using the least squares cross-validation. Hint: Set exdat=c in npreg() and use $mean to extract ˆm(c).

(e) (8 points) The core identification theorem assumes that E[Y |X, D = 0] and E[Y |X, D = 1] are both continuous at the cutoff, and hence the “jump” in the values of Y , if observed from the data, must be due to the treatment. These assumptions may be violated if the running variable X is manipulated by individuals seeking or avoiding treatment. Such manipulation is likely to lead to bunching of the running variable just above or below the cut-off. If there is no manipulation we expect the probability density function (PDF) of X to be continuous at X = c, but if there is manipulation we expect that there might be a discontinuity in the PDF of X at X = c. Estimate the PDF of X and plot the kernel density (3 points). Is there any evidence for manipulating X (2 points)? What is the main problem with this visual check (3 points)? To implement the kernel density estimation, use the Gaussian kernel and the Sheather-Jones Bandwidth.

2. IV Regression: Fuzzy RDD (25 points)

The sharp regression discontinuity requires that the cut-off perfectly separates treatment (D = 1) and control (D = 0) groups. An alternative context is where this separation is imperfect, but the conditional probability of treatment is discontinuous at the cut-off. This is called fuzzy regression discontinuity. This question asks you to estimate the following fuzzy RDD model using a simulated data regdisc.csv:

Y = β0 + β1D + β2(X − c) + β3(X − c)1[X ≥ c] + e, (2)

where X is the running variable and D is the treatment dummy variable (= 1 if receive treat-ment, and 0, otherwise). Note that the main difference between model (2) and model (1) is that D is not a deterministic function of X. Assume D = 1 is more likely to occur when X ≥ 5.

(a) (8 points) D may be an endogenous regressor in (2) since individuals can select to receive or avoid the treatment. For example, individuals with high treatment effects are more likely to seek treatment than those with low treatment effects. If this is the case, the OLS estimate of the treatment effect β1 is biased. Propose a valid instrument variable (IV) for D (4 points). Justify your answer (4 points).

(b) (4 points) Use the IV selected in (a) and observations with X ∈ [3, 7] to obtain a TSLS estimate of β1 (2 points). Is the treatment effect significant (2 points)?

(c) (8 points) Is model (2) exactly identified, overidentified, or underidentified (2 points)? Does your TSLS regression in (b) suffer from the weak IV problem (1 point)? Justify your answer (2 points). Is it possible to test the exogeneity of your IV proposed in (a) (1 point)? Explain your answer (2 points).

(d) (5 points) Test if D is an exogenous regressor.

3. Panel Data Logit Model with Fixed Effects (20 points)

In nonlinear panel data models, incorporating fixed effects can be challenging since a linear operation like first-differencing or demeaning cannot eliminate unobserved heterogeneity. How-ever, the panel data logit model is an exception to this. Chamberlain (1980, 1984)4 proposed a conditional maximum likelihood estimator for this mode, which takes advantage of the specifific functional form of logistic odds ratios and only uses “switchers” in the data. Hansen’s (2022) Section 25.13 provides a gentle introduction to this method (see panel logit model.pdf). In this question, we apply Chamberlain’s estimator for the following model:

Yit ∗ = β0 + β1Xi1t + β2Xi2t + αi + eit,

Yit = 1[Yit ∗ > 0], i = 1, ..., n, t = 1, ..., T,     (3)

where Yit ∗ is individual i’s latent utility in period t, Yit is the observed choice, Xi1t and Xi2t are both time-varying covariates, and αi is the usual entity fixed effects. We simulate a dataset (simudata.dta) for model (3) with n = 250 and T = 2. The simudata.dta dataset contains variables (Yi1, Xi11, Xi21, Yi2, Xi12, Xi22) for i = 1, ..., n.

(a) (2 points) Is simudata.dta of long format or wide format?

(b) (4 points) Read panel logit model.pdf. Can Chamberlain’s method estimate β0? Ex-plain your answer. Hint: See the log-likelihood function presented on page 825.

(c) (8 points) Use simudata.dta to estimate (β1, β2). First, you need to write the conditional log-likelihood function in R. Then you can use R’s optim() function (See Tutorial 1) to compute the maximum likelihood estimator. Here you can use the Nelder-Mead algorithm and initial values generated by runif(2,-1,3) with random seed set.seed(2023). As the last step, extract (βˆ 1, βˆ 2) from the optim() output using $par. Note that by default, optim() searches the minimum value of an objective function. So to obtain a maximum likelihood estimator, you should apply optim() to a negative log-likelihood function. Hint: The true values for (β1, β2) are (1, 1). If your algorithm computes the estimator correctly, you should have (βˆ 1, βˆ 2) close to these values.

(d) (6 points) Calculate bootstrap SE for βˆ 1 with B = 250 bootstrap replications. Hint: You may need to review Tutorial 7.

4. Hypothesis Test and Bootstrap (15 points)

Mankiw, Romer, and Weil (1992)5 investigates the implications of the Solow growth model using cross-country regressions. A key equation in their paper regresses the change between 1960 and 1985 in log GDP per capita on (1) log GDP in 1960, (2) the log of the ratio of aggregate investment to GDP (invest/100), (3) the log of the sum of the population growth rate n (pop growth/100), the technological growth rate g, and the rate of depreciation δ, and (4) the log of the percentage of the working-age population that is in secondary school (school/100), the latter is a proxy for human-capital accumulation.

Use the sub-sample of the 98 non-oil-producing countries in the MRW1992.dta data to answer the following questions. As g and δ were unknown, assume g + δ = 0.05 as the authors did in their paper. For all bootstrap inference, use B = 1000 bootstrap replications.

(a) (3 points) Run the regression described in the question and report the results.

(b) (4 points) Let θ denote the sum of the coefficients on (2), (3), and (4). Conduct a Wald test for H0 : θ = 0 (3 points). Compute the t-statistic for H0 (1 point).

(c) (4 points) Compute the SE for θ ˆ (2 points) and the 95% confidence interval for θ (2 points) using bootstrap without asymptotic refinement.

(d) (4 points) Compute the bootstrap p-value for H0 with asymptotic refinement. Hint: You may need to review Question 3 of Problem Set 2.