Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

ECMT6007  Midterm Solutions

Question 1

1.

[5 Marks] The model could be written in different forms but it’s important to make it clear whether variables vary over time and across different individuals.

•   Most explicit form:

Earnit  = p0  + p1 Malei  + p2Ageit  + p3 Educit  + p4 0ccit  + p5 Experit  + p6 Childrenit + p7 Marriedit  + (ai  + eit )

Or simply:

Earnit  = xtp + (ai  + eit )

Notice that we are assuming a person’s gender is time-invariant and that education is allowed to change over time (no penalties if students assumed education was time-invariant). All other variables are assumed to vary over time and across individuals, except ai .

•   Per-individual matrix form:

Earni  = Xip + ui , ∀i = 1, . . . , n

•   Most condensed” matrix form:

Earn = Xp + u

[5 Marks] It is important to consider individual-specific heterogeneity in panel models to     avoid the violation of the zero conditional mean assumption (exogeneity) because of ai . For instance, one’s ability– a variable assumed not to change over time is relevant in                determining one’s earnings but cannot be accurately observed/measured (hence being in the error term as ai ).

If ignored, its potential correlation with education will lead to biased and inconsistent    estimates when using the POLS. And even under the (strong) assumption that there’s no correlation between ai  and the regressors, the POLS wouldn’t be efficient.

- Minus 1 mark if gender is allowed to change over time. Minus 1 mark if not a clear indication on whether variables vary over time and across individuals.

2.

[5 Marks] The most important difference is the assumption made about ai . For the FE model, this term is allowed to correlate with any of the regressors (i.e., Cov(ai , X) ≠ 0), while for    the RE the assumption is of zero correlation (Cov(ai , X) = 0)).

[5 Marks] From this assumption, the FE and RE transformations are proposed. For the FE, the fact that there is such a correlation leads to a transformation designed to eliminate the   effects of ai . But for the RE model, there is no need to eliminate ai  as it doesn’t lead to a   violation of zero conditional mean. However, to overcome the consequential problem of     serial correlation caused by ai , a Feasible GLS estimator is used. This makes the RE model more efficient (than the POLS).

A clear indication of the two sub-parts of the question was expected.

3.

Three possible ways would be:

First: create a dummy variable for each individual in the sample (except one) in order to explicitly model ai . This is called Least Square Dummy Variable Model (LSDV). The   advantages of this method are (1) to be capable of observing the average difference in    earnings for each individual and (2) to test whether this is an important variable across  individuals (i.e., we can test whether the POLS is the best option or not).

The disadvantages include: (1) the model will normally contain a large set of regressors, making it impractical to implement (computationally and analytically), particularly when        considering that we are primarily interested in the role ofMale on earnings (not in how wages vary across each individual); (2) the large number of regressors results in a large number of    degrees of freedom being consumed, decreasing the power of our conclusion; and (3) we        won’t be able to observe the effects of time-invariant regressors because of collinearity with   the dummy variables.

Second: use the OLS applied to the FE transformation (i.e., the FE or Within estimator). This method has the advantage of being much more practical and preserve more degrees of           freedom than the LSDV option, but we won’t be able to directly observe the estimated           individual-specific effects either. In relation to the FD estimator, the FE estimator tends to be more efficient. The biggest disadvantage, however, is that time-invariant regressors will be    eliminated in the process, i.e., we won’t be able to observe an estimate of the parameter of    interest ($ ).

Third: if T = 2, the FE estimator will be numerically the same as the First Difference (FD)     estimator. The advantages are similar to the ones for the FE estimator. The disadvantage here is that this equivalence is only possible if we have two years.

Notice that for all the models above, we are assuming that the truncation of age is unrelated to the regressors, i.e., no adjustments for this issue would be necessary.

4.


As the variable of interest is time-invariant, we cannot use the FE estimator (or the variants discussed above). So, the parameter won’t be identified.

This one was more of a hit-or-miss type of question but there were variations.

5.

[5 Marks] If the assumption of Cov(ai , X) ≠ 0 is dropped, then we can either use the RE      model (if ai  are not constant for all individuals) or the POLS if ai  = a, ∀i . In most cases, we would favour the RE estimator as the conditions for the POLS to be consistent and unbiased are even more restrictive than the ones for the RE.

With the RE (or POLS), we will be able to identify the effect of Malei  on earnings.

[5 Marks] The main disadvantage to use the RE model is the strength of the assumption that Cov(ai , X) = 0. In our context, if ability (assumed to be time-invariant, unobservable and   individual specific, i.e., being part of ai ) is correlated with education (reasonable                 assumption), then the RE estimator will be biased and inconsistent.

Notice that the BE is also possible here, although not a common choice.

6.

There are different strategies for addressing this issue. In the two cases considered below    (there might be more), we create a dummy variable for having children or not (let’s call this Cℎildit ). It’s likely that the first strategy is superior to the second, as it’s likely less              restrictive.

Strategy 1:

We could consider only women in our sample and run the following regression:

Earnit  = e0  + e1 Cildit  + e2Ageit  + e3 Educit  + e4 0ccit  + e5 Experit  + e6 Cildrenit + e7 Marriedit  + (ai  + eit )

Notice I left both Cℎildrenit  and Cℎildit  in the regression. I’m also assuming away any issues with censoring and truncation (particularly of the dependent variable).

In this regression, we could use the FE model without the issues discussed above, and motherhood penalty would be estimated by 1 .


Strategy 2:

We use all observations (both men and women). Proposing an interaction term relating both variables (Male and Child) could potentially capture this effect.

The model could be:

Earnit  = 60  + 61 Malei  + 62Ageit  + 63 Educit  + 64 0ccit  + 65 Experit  + 66 Cildrenit

+ 67 Marriedit  + 68 Malei  Cildit  + (ai  + vit )

Notice I left both Cildrenit  and Cildit  in the regression.

The expected difference in earnings between males and females would therefore be:

E[Earnit |Malei   = 1, X] E[Earnit |Malei   = 0, X] = 61  + 68 Cildit

So, the predicted difference between males and females will also depend on whether the woman has children or not.

We wouldn’t be able to estimate 61  using the FE estimator (or equivalent) but we could either (1) use an alternative method (RE or POLS) if we believe Cov(ai , X) = 0 or (2) not estimate this effect if we are primarily interested in testing the statistical significance of 68 . These are  reasons why we would likely favour the first strategy.

Question 2

7.

We can use a Hausman test after estimating both models. The null hypothesis is:

pZ(pZ)R(R)E(E)   

The test statistics is given by:

H = YFE  RE [, YVar(FE ) − Var(RE )[- 1 (FE  RE )

Under the null hypothesis, H follows a Xq(2)  distribution where q is the number of time-varying variables that are included in the RE and FE regressions1 . If the null hypothesis is rejected at the a% significance level, we would favour the FE model. If we don’t reject it, the RE is       preferred.

If the RE is not efficient, we would need to use the robust Hausman test.

No full marks if students didn’t make it clear that only time-varying terms are considered in the test (definition of q).

8.

The assumptions for the FD model to be unbiased, consistent and efficient are: FD.1 – Linearity of parameters: yi = Xip + aiL + ei , ∀i = 1, . . .,n.

FD.2 Random sampling from the cross section.

FD.3 – No perfect collinearity (X,DDX has full rank).

FD.4 Exogeneity: E(ei |Xi) = 0. (Or equivalent)

FD.5 – Homoskedasticity: Var(eit |Xi) = Var(eit) = G2, for all t = 1, 2, …, T. FD.6 – No serial correlation: Cov(eit, eis |Xi) = 0, for all t ≠ s.

FD.7 The errors are independent and identically asymptotically normally distributed.

Assumptions 5, 6 and 7 can be summarised by using:

n(FD p) → N k0, G2YE(X D′DX i)[1l

No full marks if students didn’t use the mathematical symbols (within reason) in their

answers.

 

 

 

 

9.

For the sake of notation, assume D2 represents the second difference (e.g., D2yit = yit − yi,t一2). Therefore, the model written in second differences would be:

D2yi = D2X p + D2ei , for i = 1, 2, … ,n

For each i, the second difference matrices will be:

D2 = t

0

0

1  0

0 −1

 

0

1(0)u(T一2) x T

D2yi = 一(1)2u(T2) x 1

D2X  = 一(1)2

 

xk,i,T xk,i,T一2u(T2) x k

p = t  up1pkk x 1

D2ei = 一(1)2u(T2) x 1

The estimator is:

FD2 = y(D2Xi),D2Xi{1 y(D2Xi),D2yi{

At most 5/ 10 if students didn’t get the matrix D2 correct.

10.

There is nothing preventing us from getting to the same expressions derived in lectures. For instance, the FD-2 estimator (this is the name I’m giving to this estimator) would be:

FD2 = p + y(D2Xi),D2Xi{1 y(D2Xi),D2ei{

To get to this point, we assumed linearity of the parameters and random sampling (just like    FD. 1 and FD.2). For unbiasedness and consistency, we would need the inverse of the term     above to exist (reflecting the need for no perfect collinearity – FD.3) and at the same time the last term in brackets to be zero. This requires exogeneity (FD.4).

So, the assumptions to guarantee unbiasedness and consistency would be:

FD-2.1 – Linearity of parameters: yi = Xip + aiL + ei , ∀i = 1, . . .,n.

FD-2.2 Random Sample from the cross section.

FD-2.3 No perfect collinearity (E[(D2Xi),D2Xi |Xi] is invertible).

FD-2.4 – Exogeneity: E[(D2Xi),D2ei |Xi] = 0 or equivalently E[D2ei |D2Xi] = 0. Strict exogeneity is also possible.

The assumptions for efficiency include homoskedasticity, no serial correlation and asymptotic normality of the error term. They could be summarised using:

n(FD p) → N k0, ae(2)YE((D2Xi),D2Xi)[1l

These would be assumptions FD-2.5, 6 and 7, respectively.