Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Main Examination period 2020 – January Semester A

MTH6134 / MTH6134P: Statistical Modelling II

Question 1  [20 marks]. Suppose that Yi N(μi ; σ2 ) for i = 1; 2; : : : ; n, all independent, where μi = βxi , β = (β0 ; : : : ; βp1 ), xi = (1;x1i ; : : : ;xp1;i) and σ is known.

(a) Write down the likelihood for the data y1,..., yn. [6]

(b) Show that the maximum likelihood estimator of β is ˆβ = (X > X) −1X > Y, where X is the n× p design matrix with ith row x > i . State any required assumptions on the design matrix. [6]

(c) Find the Fisher information matrix. [4]

(d) State the asymptotic distribution of ˆβ. Explain why, here, the distribution is exact. [4]

Question 2  [18 marks]. The number of deaths due to AIDS in Australia (y) per three-month period from January 1983 to June 1986 was recorded. The time (x) is measured in multiples of three months   after January 1983. Below are the data.

x

1

2

3

4

5

6

7

8

9

10

11

12

13

14

y

0

1

2

3

1

4

9

18

23

31

20

25

37

35

Let Yi denote the number of deaths due to AIDS in period xi. Then it is assumed that Yi Poisson(μi)  for i = 1; 2; : : : ; 14, all independent, where log(μi) = β0 +β1xi. This model was fitted to the data using R and the following output was obtained:

Call:

glm(formula = y ~ x, family = poisson(link=log))

Deviance Residuals:

Min 1Q Median 3Q Max

-2.2874 -1.1306 -0.6441 0.1341 2.8629

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 0.45622 0.24779 1.841 0.0656 .

x 0.24155 0.02197 10.997 <2e-16 ***

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 188.084 on 13 degrees of freedom

Residual deviance: 33.627 on 12 degrees of freedom

AIC: 90.304

Number of Fisher Scoring iterations: 5

(a) Write down the fitted Poisson regression model, and the standard errors of the maximum likelihood estimates of β0 and β1. How are the standard errors calculated from the Fisher information matrix V? [6]

(b) Give the form of the test statistic for testing H0 : β1 = 0 and draw conclusions. [4]

(c) Use the above output to assess the goodness of fit of the model. [4]

(d) Is there evidence that this model is an improvement over the null model with just an intercept? Justify your answer. [4]

Question 3  [24 marks]. Suppose that Yi Bin(1; πi) for i = 1; 2; : : : ; n, all independent, where logfπi/(1 - πi)g = β xi and xi is a known covariate.

(a) Write down the likelihood for the data y1,..., yn. [6]

(b) Obtain the likelihood equation. [5]

(c) Find the Fisher information. [6]

(d) Explain how the likelihood equation can be solved iteratively to find the maximum likelihood estimate of β using Fisher’s method of scoring. [7]

Question 4  [26 marks]. Urine drug screening was performed on 2,537 applicants for positions in the U.S. Postal Service. The contingency table below shows the distribution of the results by drug present and gender. Those applicants who tested positive for more than one drug were classified under the more serious of the drugs, so that each individual only contributed to a single cell in the table.

Gender

Drug Present

Total

None

Marijuana

Cocaine

Other Drugs

Male

1,465

146

33

28

1,672

Female

764

52

22

27

865

Total

2,229

198

55

55

2,537

Let Yjk denote the number of individuals classified in row j and column k. Then it is assumed that the Yjk have a multinomial distribution with parameters n and θjk for j = 1; 2 and k = 1; 2; 3; 4, where

n = 2; 537 and θjk is the probability that an individual is classified in row jand column k. The null hypothesis is that gender and drug present are independent.

(a) State the null hypothesis in terms of E(Yjk). Express this as a log-linear model, explaining your notation and any additional constraints. [6]

(b) Write down the maximal model. [4]

(c) Given that the maximum likelihood estimate of θjk in the maximal model is yjk/n and that under the null hypothesis is ejk/n, where ejk = yj.y.k/n, find the generalised likelihood ratio, Λ(y), and hence obtain the deviance given by D = −2log{Λ(y)}. [12]

(d) It was found that D = 11.737. What is your conclusion about the independence of gender and drug present? [4]

Question 5  [12 marks]. Suppose that the survival time T > 0 of a patient has probability density function f (t ) and distribution function F(t ).

(a) Define the survivor function S(t) and the hazard function h(t) in terms of f(t) and F(t). [4]

(b) Compute S(t) and h(t) when T ∼ Exp(λ). [4]

(c) Explain what is meant by saying that a survival time is censored. [2]

(d) Give two reasons why censoring might occur in practice. [2]