Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Econometrics Homework 1

Q1. The following table gives the joint probability distribution between employment status and college  graduation  among  those  either  employed  or  looking for  work  (unemployed)  in the working age U.S. population for 2008.

Joint Distribution of Employment Status and College Graduation in the U.S . Population Aged 25 and Greater, 2008

Unemployed (Y =0) Employed (Y =1) Total

Non-college grads (X=0)    0.037                               0.622 0.659

College grads (X=1)            0.009                               0.332                               0.341

Total

0.046

0.954

1.000

a. Compute E(Y).

b. The unemployment rate is the fraction of the labor force that is unemployed. Show that the unemployment rate is given by 1-E(Y).

c. Calculate E(Y|X = 1) and E(Y|X = 0).

d. Calculate the unemployment rate for (i) college graduates and (ii) non-college graduates.

*  e and f are optional

e. A randomly selected member of this population reports being unemployed. What is the probability that this worker is a college graduate? A non-college graduate?

f. Are educational achievement and employment status independent? Explain.

Q2. To investigate possible gender discrimination in a firm, a sample of 100 men and 64 women with similar job descriptions are selected at random. A summary of the resulting monthly salaries follows:

Average Salary ̅) Standard Deviation (SY ) n

Men

$3100

$200

100

Women

$2900

$320

64

a. What  do these  data suggest  about wage  differences  in the firm?  Do they  represent statistically significant evidence that average wages of men and women are different? (To answer this question, first state the null and alternative hypothesis; second, compute the relevant t-statistic; third, compute the p-value associated with the t-statistic; and finally, use the p-value to answer the question.)

b. Do these data suggest that the firm is guilty of gender discrimination in its compensation policies? Explain.

3. Data on fifth-grade test scores (reading and mathematics) for 420 school districts in California yield = 646.2 and standard deviation sY  = 19.5.

a. Construct a 95% confidence interval for the mean test score in the population.

b. When the districts were divided into districts with small classes (<20 students per teacher) and large classes (> 20 students per teacher), the following results were found:

Class Size Average Score () Standard Deviation (sY ) n

Small

657.4

19.4

238

Large

650.0

17.9

182

Is there statistically significant evidence that the districts with smaller classes have higher average test scores? Explain.

4. Suppose that a researcher, using data on class size (CS) and average test scores from 100 third- grade classes, estimates the OLS regression

TeŝtScoTe = 520.4 − 5.82  ×  CS, R2  = 0.08, SER = 11.5

a. A classroom has 22 students. What is the regressions’ prediction for that classroom’s average test score?

b. Last year a classroom had  19 students, and this year it has 23 students. What is the regressions’ prediction for the change in the classroom average test score?

c. The sample average class size across the  100 classrooms  is 21.4 What is the sample average of the test scores across the 100 classrooms? (Hint: Review the formulas for the OLS estimators.)

d. What is the sample standard deviation of test scores across the 100 classrooms? (Hint: Review the formulas for the R2  and SER.)

5  (Optional).  A  regression  of  average  weekly  earnings  (AWE,  measured  in  dollars)  on  age (measured in years) using a random sample of college educated full time workers aged 25-65 yields the following:

ÂwE = 696.7 + 9.6  ×  Age, R2  = 0.023, SER = 624. 1

a. Explain what the coefficient values 696.7 and 9.6 mean.

b. The standard error of the regression (SER) is 624.1. What are the units of measurement for the SER? (Dollars? Years? Or is SER unit-free?)

c. The regression R2  is 0.023. What are the units of measurement for the R2 ? (Dollars? Years? Or is R2 unit-free?)

d. What  is the  regression’s  predicted  earnings for  a  25-year-old worker? A 45-year-old worker?

e. Will the regression give reliable predictions for a 99-year-old worker? Why or why not?

f. Given what you know about the distribution of earnings, do you think it is plausible that the  distribution  of  errors  in  the  regression  is  normal?  (Hint:  Do  you  think that  the distribution is symmetric or skewed? What is the smallest value of earnings, and is it consistent with a normal distribution?)

g. The average age in this sample is 41.6 years. What is the average value of AWE in the sample?

6 (Optional) On the text Web site http://www.pearsonhighered.com/stock_watson/, you will find a data file CPS08 that contains an extended version of the data set used in Table 3.1 for 2008. It contains data for full-time, full-year workers, age 25-34, with a high school diploma or B.A./B.S. as their highest degree. A detailed description is given in CPS08_Description, also available on the Web site. (These are the same data as in CPS92_08 but are limited to the year 2008.) In this exercise, you will investigate the relationship between a worker’s age and earnings. (Generally, older workers have more job experience, leading to higher productivity and earnings.)

a.   Run a regression of average hourly earnings (AHE) on age (Age). What is the estimated intercept? What is the estimated slope? Use the estimated regression to answer this question: How much do earnings increase as workers age by 1 year?

b.   Bob is a 26-year-old worker. Predict Bob’s earnings using the estimated regression. Alexis is a 30-year-old worker. Predict Alexis’s earnings using the estimated regression.

c.    Does age account for a  large fraction of the variance  in earnings across  individuals? Explain.

Q7-Q9 Three following exercises refer to the table of estimated regressions below, computed using data for 1998 from the CPS. The data set consists of information on 4000 full-time full-year workers. The highest educational achievement for each worker was either a high school diploma or a bachelor’s degree. The worker’s ages ranged from 25 to 34 years. The data set also contained information on the region of the country where the person lived, marital status, and number of children. For the purposes of these exercises, let

AHE = average hourly earnings (in 1998 dollars)

College = binary variable (1 if college, 0 if high school)

Female = binary variable (1 if female, 0 if male)

Age = age (in years)

Ntheast = binary variable (1 if Region = Northeast, 0 otherwise)

Midwest = binary variable (1 if Region = Midwest, 0 otherwise)

South = binary variable (1 if Region = South, 0 otherwise)

West = binary variable (1 if Region = West, 0 otherwise)

Q7. Using the regression results in column (1):

a. Do workers with college degrees earn more, on average, than workers with only high school degrees? How much more?

b. Do men earn more than women on average? How much more?

Q8. Using the regression results in column (2):

a.   Is age an important determinant of earnings? Explain.

b.   Sally  is  a  29-year-old female  college  graduate.  Betsy  is  a  34-year-old female  college graduate. Predict Sally’s and Betsy’s earnings.

Q9. Using the regression results in column (3):

a.   Do there appear to be important regional differences?

b.   Why is the regressor West omitted from the regression? What would happen if it was included?

Results of Regressions of Average Hourly Earnings on Gender and Education Binary Variables and Other Characteristics Using 1998 Data from the Current Population Survey

Dependent variable: average hourly earnings (AHE).

Regressor

(1) (2) (3)

College (X1) 5.46                                  5.48                                  5.44

Female (X2) -2.64                                -2.62 -2.62

Age (X3)

0.29

0.29

Northeast (X4) 0.69

Midwest (X5) 0.60

South (X6)

-0.27

Intercept 12.69 4.40                                  3.75

Summary Statistics

SER

650.0

17.9

182

R2

0.176

0.190

0.194

2

n

4000

4000

4000

c.   Juanita is a 28-year-old female college graduate from the South. Jennifer is a 28-year-old female college graduate from the Midwest. Calculate the expected difference in earnings between Juanita and Jennifer.

Q10 (Optional) A researcher plans to study the causal effect of police on crime using data from a random sample of U.S. counties. He plans to regress the county’s crime rate on the (per capita) size of the county’s police force.

a. Explain why this regression is likely to suffer from omitted variable bias. Which variables would you add to the regression to control for important omitted variables?

b. Use your answer to (a) and the expression for omitted variable bias given in Equation (6.1) to determine whether the regression will likely over- or underestimate the effect of police on the crime rate. (That is, do you think that F̂1  > F1 or F̂1  < F1 ?)

Q11-Q14 This exercises refer to the table of estimated regressions below, computed using data for 1998 from the CPS. The data set consists of information on 4000 full-time full-year workers.

The highest educational achievement for each worker was either a high school diploma or a bachelor’s degree. The worker’s ages ranged from 25 to 34 years. The data set also contained information on the region of the country where the person lived, marital status, and number of children. For the purposes of these exercises, let

AHE = average hourly earnings (in 1998 dollars)

College = binary variable (1 if college, 0 if high school)

Female = binary variable (1 if female, 0 if male)

Age = age (in years)

Ntheast = binary variable (1 if Region = Northeast, 0 otherwise)

Midwest = binary variable (1 if Region = Midwest, 0 otherwise)

South = binary variable (1 if Region = South, 0 otherwise)

West = binary variable (1 if Region = West, 0 otherwise)

Q11. Add “*”  (5%) and “**”  (1%) to the table to  indicate the statistical significance  of the coefficients.

Q12. Using the regression results in column (1):

a. Is the College-high school earnings difference estimated from this regression statistically significant at the 5% level? Construct a 95% confidence interval for the difference.

b. Is  the  male-female  earnings  difference  estimated  from  this  regression  statistically significant at the 5% level? Construct a 95% confidence interval for the difference.

Q13 Using the regression results in column (2):

c. Is age an important determinant of earnings? Use an appropriate statistical test and/or confidence interval to explain your answer.

d. Sally  is  29-year-old  female  college  graduate.  Betsy  is  a  34-year-old  female  college graduate. Construct a 95% confidence interval for the expected difference between their earnings.

Q14 (Optional) Using the regression results in column (3):

a. Do there appear to be important regional differences? Use an appropriate hypothesis test to explain your answer.

b. Juanita is a 28-year-old female college graduate from the South. Molly is a 28-year-old female college graduate from the West. Jennifer is a 28-year-old female college graduate from the Midwest.

i.           Construct  a  95%  confidence  interval  for  the  difference  in  expected  earnings between Juanita and Molly.

ii.          Explain how you would construct a 95% confidence interval for the difference in expected earnings between Juanita and Jennifer. (Hint: What would happen if you included West and excluded Midwest from the regression?

Results of Regressions of Average Hourly Earnings on Gender and Education Binary Variables and Other Characteristics Using 1998 Data from the Current Population Survey