Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECON 121, Applied Econometrics and Data Analysis

Summer 2022

PROBLEM SET 1: ESTIMATING THE RETURNS TO EDUCATION

Instructions:

• Use the provided pset1_submission.R” template file to complete this assignment. DO NOT modify the file name for your submission. The autograder requires this filename to grade your assignment.

• Use the“setwd()” command to read in the datafiles locally. COMMENT OUT the“setwd()” command before you submit to Gradescope. DO NOT modify the provided code in the template file that loads the data. This will cause an error with the autograder.

• Only use the packages loaded in “pset1_submission.R”when executing the tasks for the problem set. The autograder is only configured to use these packages and may not work if you use others.

Problem Set:

A surprisingly large share of the past half-century of research in labor economics has focused on the return to education: the added earnings power that an individual obtains by staying in school an extra year. In the

1970s, the late economist Jacob Mincer formulated what is now seen as the standard relation between human

capital and wages:

ln(wi ) = β0 + β1 edi + β3 experi + β4 experi(2) + εi

where wi  is the hourly wage, edi  is years of education, and experi  is years of labor market experience. This

equation is known as the Mincerian Wage Equation. In this problem set, we will explore the difficulties that arise in estimating the returns to education using OLS.

We will use two datasets, both containing data on labor earnings and education among US adults. One is a sample of working-age (25-64) adults in the Current Population Survey, a nationally-representative monthly

survey of the non-institutionalized population. This dataset is from March 2018, with data on labor market outcomes in 2017.  The other dataset comes from the National Longitudinal Survey of Youth, a study that first surveyed a sample of 14-21 year olds in 1979 and then re-surveyed them annually or biannually to the present. The labor market data are for 2007, when the cohort was aged 42-49.

(1) Interpret the Mincerian Wage Equation conceptually. If one assumes that education and experience are exogenous, how should one interpret β 1 ? Why do you think the equation has a squared term in experience? (Answer in words only.)

(2) Start with the CPS data (https://github.com/credpath/econ121/raw/main/cps_18.rda).  Copy the data into a new data frame called cps18, work with this new data frame. Keep anyone who worked at least 50 weeks, worked at least 35 hours in a typical week, and has strictly positive annual earnings.

Be on the lookout for missing values that are coded as large numbers.  Generate a log hourly wage  variable, where the hourly wage equals annual labor earnings divided by annual work hours. Call this  variable log_wage.  Generate race/ethnicity dummies for the categories “white,”“black,” “Asian,”

and “other.”Name these variables race_white, race_black, race_asian, and race_other. Gen- erate a new education variable to measure years of schooling (type sort(unique(cps_18_raw$educ))  in R to view the unique values and all labels for education, not all values appear in the data).  Call this variable educ_years. For interval education categories, assign the midpoint of the category (ex, “5th to 6th grade”becomes 5.5 years). Assume some college and an associate’s degree are two years of schooling in addition to high school (14 years), and a bachelor’s degree is four years of schooling in ad- dition to high school (16 years); and a master’s degree two, a professional degree two and a half, and a doctorate four years of schooling in addition to a bachelor’s degree. (Using case_when inside of mutate can help.)  Generate a“potential experience” variable as follows:  experi  = agei  − educ_yearsi  − 5.

Also generate experi(2) . Call these variables exper and exper2, respectively. Create a variable called female that is one for women and zero for men. Create a summary of the data with the summary() command and call it cps_summary.  Calculate standard deviations of years of education and ex- perience and call these educ_years_stddev and exper_stddev, respectively. (Answer with code only.)

(3) Estimate the Mincerian Wage Equation. Assign the output to an object called mincer_reg. What is the estimated return to education? (Answer with code and words.)

(4) Estimate an extended”Mincerian Wage Equation that controls for race and sex. Use white and male as the base categories (race_white  =  1 and female  =  0).  Assign the output to an object called mincer_reg2. Does the estimated return to education change after controlling for these covariates? (Answer with code and words.)

(5) In the extended” regression, is the black-white log wage gap statistically different from the female- male log wage gap? If you calculate this“by hand”, then assign the t-statistic of the test to an object called t_stat_by_hand. If you calculate this with a linear hypothesis testing command, then assign the t-statistic of the test to an object called t_stat. (Answer with code and words.)

(6) Run the “extended” regression separately for women and men.  Assign the outputs to objects called mincer_reg_women and mincer_reg_men, respectively.  Based on the two sets of regression results, assess whether the difference is statistically significant. Assign the test statistic to an object

called t_stat2 (or t_stat_by_hand2 if you calculate the statistic by hand).  By how much do estimated returns differ by sex? (Answer with code and words.)

(7) Estimate the male-female difference in returns by adding interaction terms to the“extended”regression in the full sample. Assign the outpout to an object called mincer_reg3. Do you get the same answer? (You should, but be honest if you don’t.) (Answer with code and words.)

(8) Now move on to the NLSY data (download locally from https://github.com/credpath/econ121/raw/main/nlsy79.rda).

Assign nlsy79_raw to a new data frame called nlsy79 and use that for the rest of the problem set. The NLSY oversampled black and Hispanic/Latino respondents. The variable perweight is a sampling weight that can be used to obtain statistics that are representative of the population. Summarize the variables black and hisp without and with using sampling weights.  Save the statistics (mean black unweighted, mean black weighted, mean hisp unweighted, mean hisp weighted) in a vector called means. Which summary statistics provide unbiased estimates of the racial/ethnic composition of US adults who were teenagers in 1979? Explain. (Answer with code and words.)

(9) Generate a log hourly wage variable and a potential experience” variable as above.  Drop anyone

who worked less than full time  (ex,  35 hours/week for 50 weeks) or who has 0 dollars in labor income.  Estimate an extended Mincerian Wage Equation (controlling for race/ethnicity and sex), with and without using sampling weights and assign the output to objects called mincer_reg4 and mincer_reg5, respectively. How does the use of sampling weights change the results? (Answer with code and words.)