Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECMT2150 INTERMEDIATE ECONOMETRICS

Week 6 Tutorial  Specification Issues II

Functional form misspecification, measurement error, sample selection problems, outliers

Stata 1 based on Wooldridge Chp 9 C4

Use the data for the year 1990 in infmrt.dta on state level infant mortality rates for questions a)- d). The data are for 1987 and 1990 and we have observations for all 50 states in the USA plus the District of Columbia (called DC, also known as Washington DC). We will just use the data for 1990 so you may like to simply drop the 1987 data.

a)          Estimate the following model using the data for 1990 (51 observations):

infmort =  F0  + F1 log(pcinc) + F2 log(physic) + F3 log(popul) + u The variables in the dataset are clearly labelled.

Do the estimated coefficients have the expected signs? Do they conform with your expectations?

b)         The District of Columbia historically had pockets of extreme poverty and extreme wealth. We might expect it could be an outlier. Take a look by producing a scatter plot of infant mortality against per capita income or its log and/or infant mortality against  the number of doctors (physicians) per 100,000 population. Confirm that the outlier is indeed DC (there is a dummy variable in the data set that is equal to one for the observations that are for DC, and 0 otherwise).

c)          Given DC is such a striking outlier, we have two options for how to proceed. One is to exclude DC from the analysis and the other is to include a dummy variable for the

observation on the District of Columbia (called DC).

Estimate both models.

d)         Compare & contrast your findings for the 3 models you have now estimated :

(1)      The model above including the observation on DC

(2)      The same model as above but now excluding the observation for DC

(3)      The model above including the observation on DC but including the dummy variable for DC.

Consider both the estimates and their standard errors in your comparisons. Also interpret the coefficient on DC and comment on its size and significance.  What do you conclude about including a dummy variable for a single observation?

Q1. Wooldridge Chp 9 Q1

In Problem 11 in Chapter 4, the R-squared from estimating the model

log(salary) =  F0  +  F1 log(sales ) +  F2 log(mktval) +  F3profmarg +  F4 ceoten

+ F5 comten + u

using the data in CEOSAL2 was R2  = 0.353 (n = 177) . When ceoten2 and comten2 are added, R2  = 0.375. Is there evidence of functional form misspecification in this model?

Q2. Wooldridge Chp 9 Q4 - amended

The following equation explains weekly hours of television viewing by a child in terms of the child’s age, mother’s education, father’s education, and number of siblings:

tvhours∗  =  F0  +  F1 age +  F2 age2  +  F3motheduc +  F4fatheduc +  F5 sibs + u

We are worried that tvhours* is measured with error in our survey. Let tvhours denote the reported hours of television viewing per week.

a)         What assumptions do we need to place on the measurement error in tvhours, call it e0, in order for our OLS estimators of F0, F1, … , F5 to be unbiased and consistent?

b)         Now, we are interested in a different model:

uniwAM =  F0  +  F1 hswAM +  F2ATAR +  F3 attend +  F4 tvhours∗  + u       Here we expect that weekly hours of television viewing may have an impact on university academic outcomes.  Do you think the CEV assumptions are likely to hold? Explain.

Q3. Wooldridge Chp 9 Q5

In Example 4.4, we estimated a model relating number of campus crimes to student enrollment for a sample of colleges. The sample we used was not a random sample of colleges in the United States, because many schools in 1992 did not report campus crimes. Do you think that college    failure to report crimes can be viewed as exogenous sample selection? Explain.

Extra question if you want more practice

Stata 2 Wooldridge Chp 9 C7

Use the data in LOANAPP for this exercise.

(i)         How many observations have obrat>40, that is, other debt obligations more than 40% of total income?

(ii)        Re-estimate the model in part (iii) of Computer Exercise C8 in Chapter 7, excluding

observations with obrat>40. What happens to the estimate and t statistic on white?

(iii)       Does it appear that the estimate of Fwhite  is overly sensitive to the sample used?