Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: daixieit

STAT 4620/5620 WINTER 2024

Assignment 3: Due Thursday March 7 2024

1. Suppose that you are interested in studying intravenous drug use among high school students in Canada. Drug use is characterized as a binary random variable, where 1 indicates that an individual has injected drugs within the past year and 0 that he/she has not. Covariate information related to drug use includes: infor-mation about drug use provided in school (y/n), age of student (years), employed part-time (y/n), school connectedness (Likert scale), and gender (m/f).

(a) [3pts] Propose and defend a suitable model for the aforementioned data. Be sure to write down the model equation.

(b) [2pts] Discuss any potential interactions that might be worthwhile including in your model and provide justification as to why (or why not).

(c) [1pts] Which R package(s) would you use to fit the above model?

(d) [2pts] What tools would you use to assess model fit and proceed with variable selection?

2. [10pts] Install the R Package faraway. Consider the esdcomp data that were recorded on 44 doctors working in an emergency service at a hospital to study the factors affecting the number of complaints received. Build a model for the number of complaints received, justify your choices, and report your conclusions. (250 words).

3. [10pts] The bootstrap is a general tool for assessing uncertainty. Describe the boot-strap in general and then use it to investigate a statistic of relevance to the dataset you have selected for your project. Take advantage of the functions available in the R Package bootstrap and be sure to include your references. (500 words).

4. [5pts] Cross validation is probably the simplest and most widely used method for estimating prediction error. Ideally if we had enough data, we would set aside a validation set and use it to assess the performance of our model. Since data are sometimes scarce, this may not always be possible. We finesse this problem by using K-fold cross-validation. Explain. (150 words).

5. For the analysis of count (or semicontinuous) data there are models available to deal with the common situation where there is an excessive number of zeros.

(a) [5pts] Discuss the various potential sources of zeros. (150 words).

(b) [8pts] Describe mixture and two-part models and show how their formulations handle different types of zeros. (250 words).

GUIDELINES FOR SUBMISSION:

Submit the R markdown file (.RMD), the .csv file containing your datasets, AND the result-ing knitted .PDF file to BrightSpace Assignments under Assignment 3.