Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit


MATH5945 Categorical data analysis

Assignment 2


1. A certain game has three possible outcomes whose odds of winning follow the ratio (1 + θ) : (1 − 2θ) : (1 + θ) where θ ∈ R. A gambler is convinced a casino has “rigged” this game and sets out to prove it by playing 200 games which results in (55, 13, 132) observed occurrences for the three outcomes.

(a) Derive a formula for  the MLE of θ, and calculate the estimate given the data.

(b) Test the gambler’s hypothesis that the casino has “rigged” this game.


2. Let the random variable X follow the geometric distribution with probability mass function

for x = 1, 2, . . . and π = P(success). A geometric distribution can be derived from independent Bernoulli trials where X is the number of trials until the first success.

(a) Show that the geometric distribution belongs to the one-parameter exponential family (m = 1). What is the natural parameter τ as a function of π?

(b) For n independent, identically distributed random variables X1, X2, . . . , Xn from a geometric distribution, show that the MLE of π is

What is the MLE of τ?

(c) Replace π with an appropriate function of τ to get A(τ ), and find the asymptotic distribution of  using methods discussed in the lectures.


3. The SAS datafile injury contains data from motor vehicle passengers injured in a crash. The dataset contains the variables:

We would like to fit a log-linear model to the four-way contingency table created from these variables. For ease in interpretation, denote these variables by S, L, B, I, respectively, in model shorthand and use numbers 1, 2 to identify the levels of a variable. For example, τ1S represents female sex.

Make this data available in SAS by creating a libname for its location on your computer and copy this file to your WORK folder using the same filename, i.e., injury.

(a) Check the goodness of fit of the following hierarchical models:

(M1) main effects only

(M2) all two-way interaction terms

(M3) all three-way interaction terms, and

(M4) all four-way interaction terms (saturated)

What is the lowest order model that reasonably fits the data? Give reasons.

(b) Based on the model chosen in part (a), perform backward selection using parti-tioned statistics to choose a “best model”. Justify your steps.

(c) Answer these questions regarding the model chosen in part (b).

i. Write out the log-linear model and its logit equivalent using Injury (I) as the response variable in symbolic form (i.e., τ notation).

ii. Using the symbolic log-linear model, what is the odds ratio of injury for an individual who wore a seatbelt compared to someone who did not?

iii. What is the estimate of and its 95% confidence interval using the esti-mated model?