Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

Categorical Data Analysis (STSCI 4110)

Fall 2022


ASSIGNMENT 1

Please show your work. It may be typed or handwritten, however you must submit the homework electronically as a PDF. You may use R for the calculations. You do need to know how to calculate binomial probabilities by hand for the first prelim.

Problem 1: Scale of Measurement [6]

Which scale of measurement is most appropriate for the following variables – nominal or ordinal. You do not need to give an explanation.

a.    Type of Housing (single-family home, duplex, apartment building, trailer home)

b.    Level of NCAA basketball tournament that a team reaches (Round of 32, Round of 16, … ., Semi- Finals, Finals)

c.     Favorite sport to watch (tennis, football, baseball, hockey, basketball, golf, track and field)

d.    Level of Bicycle Handling Skills (Beginner, Intermediate, Experienced, Expert)

e.    Frequency of anxiety and depression (never, occasionally, often always)

f.     Favorite muffin type (pumpkin, blueberry, oatmeal raisin, lemon poppy seed)

Problem 2: Multiple Choice Random Guessing

Each of 12 multiple-choice questions on an exam has four possible answers but only one correct response. For each question, a student randomly selects one response as the answer.

a.    Name the distribution of the student’s number of correct answers on the exam. (Remember to specify the values of the parameters). [2]

b.    Generate a table of probabilities containing a list of each of the 13 possible outcomes (0 to 12 answers correct). If any probability is less than 0.0001, you can just list it as “< 0.0001” . Do not use E” notation in the table. [4]

c.    Name the assumptions that go with this distribution. Do you think that they are satisfied? [4]

d.   What is the probability that the student gets 5 or more correct responses? [4]

e.    What is the probability that the student gets one or none of the questions correct? [4]

f.    What is the expected number of correct answers? [2]

Problem 3: Postal Service   An enterprising student decides to investigate the claim made by the Ithaca Post      Office, that 97% of letters mailed from Ithaca to other Ithaca addresses should arrive in one business day. Every day for 20 days, she mails a letter from a randomly chosen Ithaca mailbox and tracks how many days it takes     the letter to arrive at its destination. Out of 20 letters sent, 15 arrive in one day.

Let  = true probability that a letter arrives in one day

a.    Use R to calculate and graph the likelihood function for the observed data.  Show a graph of the likelihood function for the given outcome of 16 one-day arrivals in 20 letters sent. [12]

b.   What is the MLE of ? [2]

c.    Does the evidence support the Post Office’s claim that 97% of these letters arrive in one day? Explain. [You should make an Agresti-Coull confidence interval for , as supporting evidence.] [6]

Problem 4: Sunblock Lotion   Each of 8 volunteers used a new skin product that is purported to improve dry skin. At the end of the study 7 out of 8 patients showed improvement.  Let  = the true probability of          improvement. Our null hypothesis is H0:  = 0.60. Use a 95% significance level for all tests.

a.    Using the binomial test, do a hypothesis test at the 5% level (do all steps) and find the mid p-value for HA:  ≠ 0.6 . What is your conclusion? (Write a statistical conclusion and a “plain English” conclusion.) [6]

b.    Using the binomial test, do a hypothesis test at the 5% level (do all steps) and find the mid p-value for HA:  > 0.6 . What is your conclusion? (Write a statistical conclusion and a “plain English” conclusion.) [6]

c.    What is the Agresti-Coull 95% confidence interval for ? [4]

d.   What is the Score (inversion) 95% confidence interval for ? You should use R here or derive a closed form equation for this interval. Show your work with either of these methods. [4]

Problem 5:  Missing Sons? Male radiologists have long suspected that they tend to have fewer sons than     daughters. [One theory is that sperm-carrying Y chromosomes are more fragile, and thus susceptible to        radiation effects.]  In a random sample of “highly irradiated” male radiologists, researchers found that 31 of their 88 offspring were male.

a.    In industrialized countries, the percentage of males among all births is about 51.2 percent. Do a              hypothesis test, using the Score statistic, to test the hypothesis that the percentage of male offspring in this population is 51.2%. The alternative hypothesis should be that the true percentage of male               offspring is less than 51.2%. (Do all the steps of a hypothesis test.) Be sure in include a plain English”   conclusion. Use an a-level of 5%. [6]

b.    Calculate the Agresti-Coull confidence interval for the true proportion of male offspring among male radiologists. Does this interval support the conclusion in part (a)? [4]

Problem 6: Coverage Probabilities

Your task is to use simulation to estimate the coverage probabilities of Wald and Agresti-Coull confidence intervals in four different situations:

1)    = 0.04, n = 15

2)    = 0.04, n = 75

3)    = 0.6, n = 15

4)    = 0.6, n = 75

You should use 10,000 simulations of each of the above situations, to find the coverage probabilities. Please include your R code. Write your answers in the table below.

a)   Fill in the table below. [20]

Coverage Probabilities for 4 situations

 

n

Agresti-Coull

Wald

0.04

15

 

 

0.04

75

 

 

0.6

15

 

 

0.6

75

 

 

b)  Describe your findings, qualitatively. [4]